Fingerprinter API
ToxPrint fingerprinter: compute binary chemotype fingerprints for molecules using ToxPrint v2.0 (729 bits) or TxP_PFAS v1.0 (129 bits) definitions.
Usage:
from pyToxPrint.fingerprinter import ToxPrintFingerprinter, PFASFingerprinter
from rdkit import Chem
fp = ToxPrintFingerprinter() # loads bundled ToxPrint v2 XML
mol = Chem.MolFromSmiles("c1ccccc1")
arr, names = fp.fingerprint(mol) # numpy bool array + list of bit names
fp_pfas = PFASFingerprinter() # loads bundled TxP_PFAS XML
arr_pfas, names_pfas = fp_pfas.fingerprint(mol)
Pattern matching strategy
- Each chemotype is defined by:
A primary SMARTS pattern (substructureMatch molecule)
Zero or more exception SMARTS patterns (substructureException molecules)
- A fingerprint bit is set to 1 if:
The molecule contains a substructure match for the primary pattern, AND
The molecule does NOT contain a substructure match for any exception pattern (exception patterns are only applied when the exception molecule contains
matchingQueryAtom cross-references to the main pattern; otherwise the exception acts as a global exclusion)
Note: The exception logic is a reasonable approximation; the original ChemoTyper tool may produce slightly different results for edge cases.
- pyCSRML.fingerprinter.TOXPRINT_PATH: Path = PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/pycsrml/checkouts/latest/pyCSRML/data/toxprint_V2.0_r711.json')
Path to the bundled ToxPrint v2.0 JSON fingerprint definition. Pass this to
Fingerprinterto load ToxPrint instantly:fp = Fingerprinter(TOXPRINT_PATH)
- pyCSRML.fingerprinter.TXPPFAS_PATH: Path = PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/pycsrml/checkouts/latest/pyCSRML/data/TxP_PFAS_v1.0.4.json')
Path to the bundled TxP_PFAS v1.0.4 JSON fingerprint definition. Pass this to
Fingerprinterto load TxP_PFAS instantly:fp = Fingerprinter(TXPPFAS_PATH)
- class pyCSRML.fingerprinter.Fingerprinter(source, json_cache=None, verbose=False)[source]
Bases:
objectCompute binary chemotype fingerprints from a CSRML fingerprint definition.
The definition file can be in any of these formats:
XML (
.xml) — a CSRML XML file (ToxPrint v2 or TxP_PFAS). The parser converts the subgraph patterns to SMARTS on the fly. An optional JSON cache speeds up subsequent loads.JSON (
.json) — a pre-built spec file (see Custom fingerprints: JSON and YAML format for the schema).YAML (
.yaml/.yml) — same schema as JSON but in YAML syntax. Requirespyyaml.
- Parameters:
source (
Union[str,Path]) – Path to the fingerprint definition file (.xml, .json, .yaml, or .yml).json_cache (
Union[str,Path,None]) – Path to a JSON cache file. Only used when source is an XML file. If the cache is newer than the XML, it is loaded directly (faster).verbose (
bool) – If True, emit a warning for every pattern that fails to compile.
- fingerprint_smiles(smiles)[source]
Compute fingerprint from a SMILES string.
Returns all-zeros array if SMILES is invalid.