About
French ML engineer. Personal notes on ML, agents, and experiments driven by evidence.
I'm a French machine learning engineer. I love penguins đ§ way more than is reasonable and I'm fine with that.
Before 2013 I mostly hacked stuff together from books and random snippets online without really knowing what ran when I pressed Run. Pretty much proto vibe coding. Since 2013 I've taken coding seriously: I want to actually understand what I'm doing (most of the time đ ).
How I learned to build things
I started with Visual Basic, moved toward C and C++, then picked up HTML and CSS. I never went deep on JavaScript beyond a bit of jQuery. PHP is where I had my object-oriented programming revelation anyway, weird as that sounds: nobody pitches PHP as the textbook OOP language (I would have bet on Java first), but that's still where it clicked for me.
Around then I started caring about security: locking down stuff I ran, figuring out how other systems behaved, doing reverse engineering sometimes, or reading exploitation threads on automated targets (physical or digital). When I crossed a line I tried responsible disclosure so people could fix things.
I've met folks who tear appliances apart, read binary and assembly, and exploit memory bugs (buffer overflows, alignment tricks) with the casual fluency I bring to plugging USB-A in on the first try. That level still feels miles above me. What I did stays technical but ordinary compared to that. I'm not trying to sound special.
Later I played with C#, Unity, and mobile-ish apps even though I barely game. Java was rough around 2016. Then Python stuck: it's still my default, though I poke at Rust now and then. I wrote toy regex-heavy chatbots before modern stacks existed, which probably explains why I still like regex.
Machine learning practice
Since around 2016 I've trained and shipped ML work with PyTorch, TensorFlow, scikit-learn, and the usual tooling. I prefer grounded engineering talk over hype.
Selected work themes
- Autonomous model-car competitions: lots of simulator rollouts and evaluation at scale on a shared cluster.
- Research on toxicity in human comments: multi-class and multi-label setups on noisy social text.
- Pierre Guillaume, Corentin DuchĂȘne, RĂ©da Dehak (2022). Hate Speech and Toxic Comment Detection using Transformers.
- Corentin DuchĂȘne, Henri Jamet, Pierre Guillaume, RĂ©da Dehak (2023). Benchmark pour la classification de commentaires toxiques sur le jeu de donnĂ©es Civil Comments. (English: toxic comment classification benchmark on the Civil Comments dataset.) Revue des Nouvelles Technologies de l'Information, Extraction et Gestion des Connaissances, RNTI-E-39, 19-30.
- Battleship-style ML: a research-heavy build around sink-the-fleet play (classic naval guessing game).
- Ubisoft: shipped homograph disambiguation for synthetic voices and parenthetical prediction in game script lines for intonation and downstream tooling.
- J. ZaĂŻdi, C. DuchĂȘne, H. SeutĂ©, M.-A. Carbonneau (2023). The La Forge Speech Synthesis System for Blizzard Challenge 2023. Proceedings of the 18th Blizzard Challenge Workshop, 75-80, doi:10.21437/Blizzard.2023-10.
- Building and operating compute clusters: Ansible-managed fleets, shared storage, scheduler experiments, Ray clusters for centralized Python work (including pushing autonomous-car agents through thousands of simulated laps).
- Research thread on adversarial ML for network-flow classification: group work on UGR'16 NetFlow captures.
- Preprocessing: distributed pipeline on a 9-node cluster (Sun Grid Engine, NFS scratch) to tally and stratified-sample billions of labeled flows dominated by benign background; normalized and encoded features for supervised training.
- Baselines: benchmarked tree-heavy scikit-learn detectors, including XGBoost, random forests, and decision trees.
- Attacks: stressed those detectors with IBM Adversarial Robustness Toolbox methods that skip plain gradient tricks (HopSkipJump, ZooAttack, tree-specific attacks) to flip predictions toward benign labels.
- Research scientist at an HR-tech company that designs, trains, and ships ML models end to end for parsing CVs and job artifacts and scoring how well candidates fit roles.
- Today: agentic stacks, LLMs, reliability tooling, evaluation harnesses.
How I decide what "better" means
Benchmarks beat vibes. When I can, I stick to something like the scientific method: ask the question, pick controls, run benchmarks or reproducible demos, then say clearly what wins in which situation. I want conclusions you can argue about with evidence, not hand-waving.
That's basically the idea behind this blog: write so future-me (and you) can see the setup, the numbers, and the limits, and honest reruns stay possible.
Reproducibility sits next to strong claims. When I publish an experiment I lay out methodology and limits as far as makes sense so someone else can retrace steps or flag gaps.
Replication matters too: rerun other people's setups when calibration counts; rerun mine months later when drift might sneak in. Landing on the same numbers or different ones both tell you something. Matching someone else's curve still checks whether your lab behaves like theirs.
Outside the keyboard
Roughly 15 years of sculpture, a broad appetite for arts, gym, cooking, photography, video editing, Lego and construction kits, hard puzzles, stretches of graphic design (logos, palettes, Illustrator posters, animated decks), trains, the sea, listening to piano and symphony orchestra music even though I never studied music, plus electronics and robots.
Why this blog exists
This site is mostly a public lab notebook. The longer rationale, disclaimers, and house rules are in Hello world. Read that post first if you want the intent spelled out.
Contact
Email [email protected] (hosted on Proton). For encrypted mail you can import my PGP public key (UID [email protected]).
Email expectations
My inbox is open for real mail. Please skip generic outreach or spam. Write something that clearly concerns me, not a blast template.
When it's quiet I usually reply within about a day. That's not a formal SLA, but real threads get 99% timely replies depending on how messy the topic is.
Few messages bother me unless they're spam, sketchy ads, or threats. I like discussion: happy to disagree politely, I enjoy explaining trade-offs, and I'm generally more comfortable writing than speaking on stage.
How I triage: low volume means faster first replies; bursts of cold outreach slow everyone down; if I'm not interested I'll say so; if you hear nothing and you're not spam, I probably forgot, so one polite bump after about a week is fine (almost never needed).