Benchmark Tasks

Designing Organic Photovoltaics

Our first set of benchmark objectives consists of six individual tasks that are inspired by the design of organic photovoltaics (OPVs). The development of organic solar cells (OSCs) is of wide interest because of their potential to replace the currently predominating inorganic devices and to expand upon their application domain.

The code accepts a proposed structure as SMILES string, generates initial Cartesian coordinates with Open Babel, and performs conformer search and geometry optimization with crest and xtb, respectively. Finally, a single point calculation at the GFN2-xTB level of theory provides the properties of interest, in particular the HOMO and LUMO energies, the HOMO-LUMO gap and the molecular dipole moment. The power conversion efficiency (PCE) is computed from these simulated properties based on the Scharber model.

_images/opv.png — Designing organic photovoltaics.

Example

import pandas as pd
data = pd.read_csv('./datasets/hce.csv')   # or ./dataset/unbiased_hce.csv
smiles = data['smiles'].tolist()
smi = smiles[0]

## use full xtb calculation in hce module
from tartarus import pce
pce_pcbm_sas, pce_pcdtbt_sas = pce.get_properties(smi)

Designing Organic Emitters

The next set of benchmarks is inspired by the design of purely organic emissive materials for organic lightemitting diodes (OLEDs), which received significant attention in recent years after the discovery of thermally activated delayed fluorescence (TADF) in the field.

The code accepts a proposed structure as SMILES string and starts with conformer search for the molecule. Subsequent geometry optimization via xtb provides the structure used in the final TD-DFT single point calculation with pyscf. From the output, the singlet-triplet gap, oscillator strength and vertical excitation energy is extracted and returned.

_images/emitters.png — Designing organic emitters

Example

import pandas as pd
data = pd.read_csv('./datasets/gdb13.csv')
smiles = data['smiles'].tolist()
smi = smiles[0]

## use full xtb calculation in hce module
from tartarus import tadf
st, osc, combined = tadf.get_properties(smi)

Design of Drug Molecule

Next, we also wanted to include molecular design objectives that are relevant for medicinal chemistry in TARTARUS. In recent years, deep generative models have experienced a strong increase in popularity and adoption for drug design as they promise to accelerate discovery campaigns leading to significant cost reductions, and several success stories were reported on in the literature.

The code accepts a proposed structure as SMILES string, generates initial Cartesian coordinates with Open Babel and then samples docking poses in the predefined binding sites using smina. The lowest docking score of all the sampled poses is returned as target property.

Example

import pandas as pd
data = pd.read_csv('./datasets/docking.csv')
smiles = data['smiles'].tolist()
smi = smiles[0]

## Design of Protein Ligands
from tartarus import docking
score_1syh = docking.perform_calc_single(smi, '1syh', docking_program='qvina')

Design of Chemical Reaction Substrates

Developing new chemical reactions and finding new catalysts for existing ones are important goals to drive innovations in drug and material discovery, and move towards more sustainable chemical production. Whereas, classically, the optimization of reaction parameters was largely dominated by experimental work, in recent years, the significant increase in computing power and the continuous improvement of computer algorithms enabled molecular simulations to play an increasingly important part.

The code accepts a proposed structure as SMILES string and starts with optimizations and conformer searches for the reactants and products. The SEAM optimization is used to generate a guess transition state, which is then subjected to constrained conformational sampling. From the final structures, the reaction energy and approximated SEAM activation energy are extracted and returned.

_images/reactivity.png — Design of Chemical Reaction Substrates

Example

import pandas as pd
data = pd.read_csv('./datasets/reactivity.csv')
smiles = data['smiles'].tolist()
smi = smiles[0]

## calculating binding affinity for each protein
from tartarus import reactivity
Ea, Er, sum_Ea_Er, diff_Ea_Er = reactivity.get_properties(smi)

Datasets

See below for a summary of the contents of the datasets included in TARTARUS. The datasets are available in the datasets folder.

Task	Dataset name	# of smiles	Columns in file
Designing OPV	hce.csv	24,953	Dipole moment (↑)	HOMO-LUMO gap (↑)	LUMO (↓)	Combined objective (↑)	PCE<sub>PCBM</sub> -SAS (↑)	PCE<sub>PCDTBT</sub> -SAS (↑)
Designing emitters	gdb13.csv	403,947	Singlet-triplet gap (↓)	Oscillator strength (↑)	Multi-objective (↑)
Designing drugs	docking.csv	152,296	1SYH (↓)	6Y2F (↓)	4LDE (↓)
Designing chemical reaction substrates	reactivity.csv	60,828	Activation energy ΔE<sup>‡</sup> (↓)	Reaction energy ΔE<sub>r</sub> (↓)	ΔE<sup>‡</sup> + ΔE<sub>r</sub> (↓)	ΔE<sup>‡</sup> + ΔE<sub>r</sub> (↓)