CSRML parser API
Parse CSRML (Chemical Subgraph Representation Markup Language) XML files (ToxPrint v2 and TxP_PFAS v1) and convert subgraph patterns to SMARTS strings.
- Key simplifications:
substructureException patterns are approximated or skipped (→ minor false positives)
matchingQueryAtom cross-references between exception and main molecules are ignored
Complex atom descriptors (atomDescriptorValue/Range, combineAtomFeatures, elementGroup) evaluate to wildcard *
query bonds (CSRML bondList OR-condition) map to ~ (any bond in SMARTS)
- pyCSRML._csrml.parse_csrml_xml(xml_path)[source]
Parse a CSRML XML file (ToxPrint or TxP_PFAS) and return a dict:
- {
‘id’: str, ‘version’: str, ‘title’: str, ‘description’: str, ‘hierarchy’: list, # nested class hierarchy ‘subgraphs’: list[dict], # ordered list of parsed subgraphs ‘subgraph_index’: dict, # id → subgraph dict
}