Reaction Searching with WebReactions

A major activity for synthetic chemists is to find literature precedents for a reaction query, either a particular transformation of a specified compound or a more general search for matching analogs at various levels of similarity. As reaction databases became available for the computer, the traditional searching mode has been via the structures of reactant and product.

However, reactions are very complex if described in terms of the full structures of reactant and product, and so only a few thousand entries can be easily searched in a reasonable time. This structure-based search is too cumbersome for a database of 1-2 million reactions. It will also not easily find analogs of the query reaction defined in somewhat different or more general terms.

WebReactions introduces a new concept for the retrieval of reactions from a large database in which reactions are indexed instead by the bond changes which occur. When a synthetic chemist thinks of a reaction, he envisions first the making and breaking of bonds at the reaction center as the defining nature of the reaction. Subsequently he considers the effects of surrounding groups, i.e., on rate, hindrance, or resistance to change under the reaction conditions. The WebReactions program mirrors this approach for indexing reaction entries in any database.

The solution to the problem of retrieval speed is to generalize and abstract the reaction entries first with a rough but rigorous description of the reaction change itself. This then affords a basis for an index of all the entries ordered by the groupings of this generalized format. Hence when a query is also formulated in the same terms, the first search instantly brings up all matched entries since they are all grouped together in the index.

In this way we capture very fast a relatively tiny subset which contains all relevant samples, irrespective of what structures are attached unchanged around the reaction center. This in turn provides a much smaller field for further search, so that subsequent retrieval for more detail is much faster.

In WebReactions the database entries are taxonomically indexed with these successively nested subheadings:

  • rigorous digital generalization of the reaction class and type

  • the nature of substitution surrounding the reaction center

  • the nature of entering and/or leaving groups

  • features in the reactant which remain unchanged in the reaction

A query reaction is entered by the user in as much detail as he wants. The only obligatory detail is to specify all atoms which change their bonding, i.e., the reaction center atoms.

The program then formulates the query reaction in the same terms as those above for the database entries, and so instantly locates all entries with the same features as the query. It then directly shows the number of hits on the screen.

In fact WebReactions initially moves down the nested matching criteria only as far as necessary to provide about 10-20 hits, enough for easy examination by the user. The user can subsequently fine-tune for himself the extent of similarity desired between the query and the entries found, to afford either more or less closely refined pruning of the matches. Thus tuning to a manageable number of hits is fast and facile.


Practical Hints for Using WebReactions

WebReactions does not run a reaction substructure search as conventional reaction database systems. It rather performs a customizable reaction similarity search with focus on the reaction center. As an implication you don't have to think of a substructure that is small enough to retrieve a decent amount of hits and yet specific enough to not retrieve half of the database. In WebReactions you simply draw a complete reaction, i.e. the reaction as you might draw it into a lab journal. WebReactions will detect any reaction centers for you automatically and then retrieve about a dozen most similar reactions from the database.

Query reactions should be stochiometrical, i.e. every atom on the reactant side should be present on the product side as well and vice versa. This is particularly important for carbon atoms being part of a reaction center. An exception to this rule are refunctionalization reactions. Incoming or leaving groups being attached via a hetero atom may not be drawn on one side of the reaction (e.g. bromine as reactant in a bromination, or HBr as product of an elimination).

Continue with the tutorial...

Start the search right now...