TechBio Spotlight - Molecule.one

Oct 15, 2024

A few years ago Abe Heifets and Izzy Wallach (respectively the CEO and CTO of Atomwise) pointed me towards a small Polish startup called Molecule.one that was doing interesting work in the space of chemical retrosynthesis, which succinctly put is the science of determine the chemical reaction sequence and conditions needed to synthesize a desired molecule. I deeply respect Abe and Izzy and their scientific and technical integrity and contributions to the TechBio space, so I was happy to apply the transitive property of credibility and check these guys out. I’ve been paying attention to them ever since, and for approximately the past year I’ve been advising and mentoring some of their leadership team.

To be clear, I’m no chemist. But I loved learning about chemical technologies while at Recursion, and the idea of being able to readily synthesize any compound desired always felt like a bit of a holy grail, albeit one that would be very challenging to achieve. I still think it probably is and will be for some time, but I fully believe we’ll get there some day. Beyond just the physical aspects of running multi-step synthesis at micro/nano scale along with appropriate purification, you have to start by knowing what reactions and conditions you need to execute, using which building blocks. And that’s a problem that seems to me to eventually be very tractable with sufficient data scale. And while there are research groups throughout the world working on this retrosynthesis problem, virtually all small molecule libraries are designed by first enumerating possible small molecules given available building blocks and chemical reactions in a forward manner. In reverse, it’s typically custom - a smart chemist manually figuring out how to synthesize the desired compound.

Molecule.one has been focused on leveraging large-scale chemical reaction data with deep learning to tackle this retrosynthesis problem. Their first product, RetroScore, was an effort to enable prioritization of compounds early on based on a predicted synthesis difficulty score using their models that had been trained on publicly available data, and their exclusive access to CAS chemical reaction databases. Their generative models identify multiple potential synthesis paths, choosing the most promising and evaluating its difficulty based on factors such as the number of synthesis steps, the availability and cost of starting materials, reaction feasibility and the order of reactions required.

While I thought this was all very interesting, what got me really excited over the past year was their commitment to generating their own high-throughput chemical reaction data in order to improve their models. This is something I haven’t seen in any other company or research group working in this space (though it’s certainly possible there are others), and is tied into one of my core tenets of a good TechBio platform: it must involve scaled experimental data generation. Over the past year, they’ve run tens of thousands of chemical reactions across various couplings to improve the performance and reliability of their models (their grant involves them generating 800,000 chemical reactions across 8 reaction classes). This isn’t just to improve their RetroScore product, but to ultimately drive and expand what they call SpaceM1 (their riff off Enamine’s REAL Space), a “large and unique space of molecules that have fully accurate recipes for rapid synthesis.” Built upon much larger chemical reaction data and considering multistep synthesis with an ever growing number of reaction classes, this catalog of recipes could be extremely valuable for accelerating and/or driving down the costs of chemical synthesis for drug discovery. Not only are they creating this catalog of recipes, but they are putting it to use in terms of actually selling compounds that aren’t accessible from the big vendors (e.g. Enamine, WuXi) or marketplaces (e.g. Molport, mcule), at a cheaper cost than one would find from a typical CRO handling custom orders.

As these models continue to improve, and the number of supported (and discovered) reaction classes grows, not only does the space of synthetically accessible small molecules grow, but also the ways in which it can be accessed and integrated increases. Molecule.one has initiated its efforts to handle the ambiguity that often occurs in this space through the creation of Maria, an LLM-based agent to accelerate medicinal chemists in their workflows. One can envision a future in which Molecule.one’s clients interact with its SpaceM1 through simple, human-like interactions without the latency associated with working with actual humans, and instead of needing to integrate with their APIs themselves. Eventually, they could evolve Maria to become an autonomous chemist, with both the mental (access to digital modeling APIs) and physical (access to laboratory control APIs) of a human chemist. Whether that’s achieved quickly or not is debatable, but even having a chemistry co-pilot supporting all of the synthesis work strikes me as extremely valuable, both as an internal capability as they sell compounds, and also as an external capability as they interact with customers.

Stepping forward a bit, as hardware systems advance alongside the world’s experience in complex, multistep synthesis workflows, one could envision a future in which Molecule.one’s “recipe book” and autonomous chemistry agent, Maria, are coupled with advanced physical microsynthesis technologies to unlock a level of chemical synthesis previously unseen. I’m not sure exactly where this goes (whether they become an Enamine-scale player in the space, or integrate their technologies with other pharma companies via partnerships), but a company that focuses on generating their own data at scale to train their models for not just a useful, but a critical, step of the drug discovery process - that’s something that catches my eye. Hopefully it’s one you’ll watch too.

Igneous Bio

TechBio Spotlight - Molecule.one