Timing Benchmarks ================= pyCSRML fingerprinting speed is measured on five molecule-size-stratified benchmark sets extracted from the CLinventory. Each set contains 500 molecules; timing is the median of 5 repetitions of :meth:`~pyCSRML.Fingerprinter.fingerprint_batch`. Benchmark sets -------------- Sets are generated from the CLinventory by ``scripts/create_size_benchmarks.py`` and stored in ``tests/test_data/size_benchmarks/``. .. list-table:: :header-rows: 1 :widths: 20 25 15 * - Set - Heavy-atom range - Molecules * - ``bench_tiny`` - 1 – 10 - 500 * - ``bench_small`` - 11 – 20 - 500 * - ``bench_medium`` - 21 – 35 - 500 * - ``bench_large`` - 36 – 60 - 500 * - ``bench_xlarge`` - 61 + - 500 pyCSRML timing results ---------------------- Measured on **Snapdragon X Elite X1E78100** (ARM64, 12 cores, ~32 GB RAM), Python 3.14.2, RDKit 2025.09.3, NumPy 2.3.5. .. list-table:: :header-rows: 1 :widths: 20 20 25 20 * - Set - Heavy atoms - ToxPrint v2 (ms/mol) - TxP_PFAS v1 (ms/mol) * - ``bench_tiny`` - 1 – 10 - 3.76 - 0.73 * - ``bench_small`` - 11 – 20 - 5.47 - 1.01 * - ``bench_medium`` - 21 – 35 - 8.23 - 1.53 * - ``bench_large`` - 36 – 60 - 12.32 - 2.19 * - ``bench_xlarge`` - 61 + - 23.20 - 4.46 The TxP_PFAS v1 fingerprinter (129 bits) is roughly 5× faster than ToxPrint v2 (729 bits) across all size bins. Both fingerprinters scale approximately linearly with heavy-atom count: ToxPrint v2 ranges from 3.76 ms/mol (tiny) to 23.2 ms/mol (xlarge), a ~6× increase over the full size range. Baseline file: ``tests/test_data/size_benchmarks/pycsrml_timing_baseline.json``. Reproducing the benchmarks -------------------------- .. code-block:: bash # Create benchmark sets (one-time; requires CLinventory CSV) python scripts/create_size_benchmarks.py # Time pyCSRML (saves pycsrml_timing_baseline.json) python scripts/benchmark_pycsrml_timing.py # 5 reps (default) python scripts/benchmark_pycsrml_timing.py --reps 3 # fewer reps, faster # Run regression tests (require zips from ChemoTyper) pytest tests/test_benchmark_regression.py -v -m slow Timing regression tests (``tests/test_benchmark_regression.py``) fail if any set runs more than 30 % slower than the saved baseline. They skip gracefully until ``pycsrml_timing_baseline.json`` exists. System information ------------------ Full details are recorded in ``tests/test_data/size_benchmarks/SYSTEM_INFO.md``. .. list-table:: :header-rows: 1 :widths: 30 45 * - Property - Value * - Host - ZenbookA14 * - OS - Windows 11 * - CPU - Snapdragon X Elite X1E78100 — Qualcomm Oryon * - Architecture - ARM64 * - Physical cores - 12 * - RAM - ~32 GB * - Python - 3.14.2 * - RDKit - 2025.09.3 * - NumPy - 2.3.5 * - pyCSRML - 0.1.0 (editable install)