An extract of Yuanfang Guan’s winning code for odor prediction
A paper in this week’s edition of Science claims that computer models can predict the smell of a molecule. The paper describes the organization and outcome of an IBM Dream Challenge in which multiple laboratories competed to see whose model best predicts sensory characteristics from chemical parameters.
This crowd-sourced effort began with an olfactory dataset collected and published in 2016 by Andreas Keller and Leslie Vosshall. (Full disclosure: I previously collaborated with Keller and Vosshall on a different smell study.) They had 49 test subjects sniff and rate 476 “structurally and perceptually diverse molecules” using 19 semantic descriptors plus ratings of odor intensity and pleasantness.
In setting up the Dream Challenge, the organizers also “supplied 4884 physicochemical features of each of the molecules smelled by the subjects, including atom types, functional groups, and topological and geometrical properties that were computed using Dragon chemoinformatic software.”
There are several positive aspects to the challenge design. First, instead of recycling the decades-old Dravnieks dataset like so many other attempts at chemometric-based odor prediction, the sponsors supplied a fresh psychophysical dataset. Second, the study included a boatload of odorants, not the handful of smells found in most sensory studies. Third, the odor ratings were gathered from a relatively large number of sensory panelists. Forty-nine is not a super-robust sample size but it’s enough to encompass a lot of the person-to-person variability found in odor perception.
Here’s how the competition worked. Each team was given the molecular and sensory data for 338 molecules. They used these data to build computer models that predicted the sensory ratings from the chemical data. Sixty-nine molecules (absent the sensory data) were used by the organizers to construct a “leaderboard” to rank each team’s performance during the competition. The leaderboard sensory data were revealed to contestants late in the game to let them fine tune their models. Finally, another 69 molecules were reserved by the organizers and used to evaluate performance of the finalized models.
The models were judged on how well their predictions matched the actual sensory data using a bunch of wonky statistical procedures that look reasonable on my cursory inspection. (About the algorithmic structure of the competing models I have nothing useful to say, as “random-forest models” and the like are beyond my ken.) For the sake of argument I will assume that the statistical scorekeeping was appropriate to the task. My concern here is with the sensory methodology, the underlying assumptions, and the claims made for the predictability of odor perception.
Let’s begin with semantic descriptors. The widely used U.C. Davis Wine Aroma Wheel uses 86 terms to describe wine. The World Coffee Research Sensory Lexicon uses 85 terms to describe coffee. The Science paper uses 19 terms to describe a large set of “perceptually diverse” odorants which strikes me as a relatively paltry number. (The descriptors were: garlic, sweet, fruit, spices, bakery, grass, flower, sour, fish, musky, wood, warm, cold, acid, decayed, urinous, sweaty, burnt, and chemical.) Well, you might ask, can’t they just add more descriptors to include qualities like “minty” and “fecal” and “skunky”? It’s not that easy, as I discuss below.
The internal logic of the descriptors presents another issue. Some are quite specific (garlic), other very broad (spices), and still others are ambiguous (chemical). What are we to make of “bakery” as a smell? Is it yeasty like baking bread? Is it the smell of fresh cinnamon buns? (Or would that be “sweet”? Or “spices”?). The problem here is that words that are useful in an olfactory lexicon occur at different levels of cognitive categorization. This is reflected in the wine and coffee examples.
The Wine Aroma Wheel has twelve categories, each with one to six subcategories. For example, the Fruity category includes Citrus which consists of Lemon and Grapefruit. The higher level categories provide overall conceptual structure and are themselves useful as descriptors (e.g. a scent might be citrus-like while not smelling exactly of lemon or grapefruit).
Sensory specialists (including tea tasters, beer brewers, and perfumers) spend a lot of effort setting up lexicons that are concise and hierarchical, and which cover the relevant odor perception space. How were the 19 terms in the Science study arrived at? We do not know. How well do they cover the relevant perception space? We do no know. In fact, the authors state that “the size and dimensionality of olfactory perceptual space is unknown.”
These 19 terms are the basis on which the competing computer models were ranked. Thus a model's success at prediction is locked-in to this specific set of terms (plus intensity and pleasantness). In other words, this is not a general solution to smell prediction: it is specific to these odors and these adjectives. The authors openly admit this:
While the current models can only be used to predict the 21 attributes, the same approach could be applied to a psychophysical dataset that measured any desired sensory attribute (e.g. “rose”, “sandalwood”, or “citrus”).So if one wants to predict what molecules might smell of sandalwood or citrus, one would have to retest all 476 molecules on another 49 sensory panelists using the new list of descriptors, then re-run the computer models on the new dataset. Easy peasy, right? Alternatively one could assemble a sensory panel and have the members sniff the molecules of interest and rate them on the new attributes of interest. Every fragrance and flavor house has such a panel. That’s how they currently evaluate the aroma of new molecules: they sniff them.
Thus the Dream challenge seems to be tilting at a windmill that the fragrance and flavor industry doesn’t see. The search for new molecules is not done by searching random molecular permutations. It is driven by specific market needs, say for a less expensive sandalwood smell or for a strong-smelling but environmentally safe musk. The parameters are cost, safety, and patentability, along with stability, compatibility in formulations, and (for perfumers) novelty.
Who knows, the smell prediction algorithms of the Dream challenge may turn out to be the first step in automating the exploration of chemosensory space. However I’d be surprised if this approach turns out to be generalizable and amazed if it proves useful in applied settings.
Don’t get me wrong. I like the idea of using Big Data to understand olfaction—have a look at my papers based on the National Geographic Smell Survey. I urged Keller and Vosshall to go big in terms of odorants and the number of sensory panelists for what became our co-authored paper in BMC Neuroscience. At the same time I respect the complexity of odor perception and the effort required to map its natural history. And I think the perceptual side of the equation got short shrift in this study.
The studies discussed here are “Predicting human olfactory perception from chemical features of odor molecules,” by Andreas Keller, et al., published online February 20, 2017 in Science, and “Olfactory perception of chemically diverse molecules,” by Andreas Keller and Leslie B. Vosshall, BMC Neuroscience 17:55, 2016.