CSRML parser API

Parse CSRML (Chemical Subgraph Representation Markup Language) XML files (ToxPrint v2 and TxP_PFAS v1) and convert subgraph patterns to SMARTS strings.

Key simplifications:
  • substructureException patterns are approximated or skipped (→ minor false positives)

  • matchingQueryAtom cross-references between exception and main molecules are ignored

  • Complex atom descriptors (atomDescriptorValue/Range, combineAtomFeatures, elementGroup) evaluate to wildcard *

  • query bonds (CSRML bondList OR-condition) map to ~ (any bond in SMARTS)

pyCSRML._csrml.parse_csrml_xml(xml_path)[source]

Parse a CSRML XML file (ToxPrint or TxP_PFAS) and return a dict:

{

‘id’: str, ‘version’: str, ‘title’: str, ‘description’: str, ‘hierarchy’: list, # nested class hierarchy ‘subgraphs’: list[dict], # ordered list of parsed subgraphs ‘subgraph_index’: dict, # id → subgraph dict

}

Return type:

dict

Parameters:

xml_path (str)

pyCSRML._csrml.ordered_bit_list(parsed)[source]

Return the ordered list of subgraph IDs (fingerprint bit order) following the class hierarchy.

Falls back to the subgraphs list order if no hierarchy is present.

Return type:

list[str]

Parameters:

parsed (dict)