Evolution and Novel datasets | In Silico Molecular Discovery

Creating new virtual molecules and “growing them” into new virtual chemical structures, using a variety of approaches for Evolution and Novel dataset creation:

Evolutionary Molecular Design (EMD) – A machine-learning-assisted method for generating new chemical entities by iterating structural modifications

De Novo Molecular Design – An approach using evolutionary strategies to generate entirely new molecular structures optimized for specific biological or chemical properties.

Computational Mutagenesis – A strategy where molecular fragments undergo systematic changes (mutation, recombination) to explore novel chemical spaces.

Small and Retired datasets

Discovery new and valuable leads that others may overlook, enhancing drug and molecular discovery through precision-driven analysis rather than sheer data volume.

Picture of a virtual evolution molecular generated dataset

Virtual library creation

Processing and generating large collections of computationally designed molecules for use in for example drug discovery or for the discovery of new nanomaterials. This is achieved using cheminformatics tools, combinatorial chemistry, and often using AI-driven methods.

As part of this work we offer the formation of boutique curated datasets and also data cleaning of existing virtual libraries. Key steps include:

Scaffold Design – Selecting core structures based on known bioactive molecules.

Molecular Enumeration – Generating diverse analogs by modifying functional groups.

Filtering & Prioritization – Applying physicochemical and ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) filters to select drug-like candidates.

Database Storage & Screening – Organizing molecules for virtual screening against biological targets.

Pharmacophore scaffold hopping and Target hopping

Pharmacophore scaffold hopping involves modifying a molecule’s core structure (scaffold) while retaining key functional groups essential for biological activity. This strategy helps discover novel compounds with improved properties, such as enhanced potency, selectivity, or pharmacokinetics.

Target hopping refers to designing compounds that interact with different biological targets while maintaining a similar pharmacophore. This approach aids in discovering new therapeutic applications (drug repurposing) and reducing drug resistance.

Benefits of both strategies include expanding chemical diversity, overcoming patent barriers, improving drug-likeness, and identifying alternative treatments for diseases, making them essential in drug discovery and development.

Activity cliffs analysis

This is a powerful technique primarily used for finding outliers from a dataset. Examples include applying the technique to economic issues such as from finding huge price differences for very similar compounds for chemical vendor or food ingredient companies or identifying unusually high actives from a dataset of similar compounds. We also undertake related work such as 3D activity cliffs and pharmacophore activity cliffs among others. By identifying activity cliffs we can then probe further and find out why they are occurring and use this develop new insights and finds new discoveries.

Pattern recognition studies

Pattern recognition studies in drug discovery involve identifying trends and relationships within chemical and biological datasets to uncover new drug leads. They can also be used to analyse chemical ecological interactions and also aid in the study of “chemical systematics”

Techniques like t-distributed Stochastic Neighbor Embedding (t-SNE) and Self-Organizing Maps (SOMs) help visualize complex, high-dimensional data by grouping similar compounds based on structural or activity-based features. Other clustering algorithms (e.g., k-means, hierarchical clustering, etc) classify molecules into meaningful groups, aiding scaffold hopping and lead optimization.

By analyzing both existing and novel datasets, these methods reveal hidden patterns, predict bioactivity, and accelerate hit identification, ultimately guiding rational drug design and reducing experimental costs.

Similarity searches

Similarity searches involve screening public, internal, or proprietary databases to identify compounds structurally or functionally similar to a reference molecule.

This process integrates various filtering techniques, such as restricting results within a specific logP range (for lipophilicity control), exploring beyond the Rule of 5 (Ro5) for non-traditional drug-like compounds, and applying multi-component descriptors (e.g., molecular weight, hydrogen bond donors/acceptors, and topological indices) for precise selection.

These searches help identify novel scaffolds, optimize lead compounds, and support scaffold and target hopping strategies, accelerating the drug discovery process while ensuring desirable physicochemical and pharmacokinetic properties.