matchms.Fingerprints module
- class matchms.Fingerprints.Fingerprints(fingerprint_generator, *, ignore_stereochemistry: bool = False, count: bool = False, folded: bool = True, return_csr: bool = False, invalid_policy: str = 'raise', **config_kwargs)[source]
Bases:
objectCompute and store an InChIKey-to-fingerprint mapping for a collection of spectra.
This class is a container for molecular fingerprints keyed by InChIKey. Fingerprints are computed for unique compounds only and stored either as a dense NumPy array or as a SciPy CSR sparse matrix.
Compared to the older implementation, this refactor is designed for larger scale use cases and delegates fingerprint computation to
chemap.Example
import numpy as np from rdkit.Chem import rdFingerprintGenerator from matchms import Fingerprints, Spectrum spectrum_1 = Spectrum( mz=np.array([100, 150, 200.]), intensities=np.array([0.7, 0.2, 0.1]), metadata={ "inchikey": "OTMSDBZUPAUEDD-UHFFFAOYSA-N", "smiles": "CC", "precursor_mz": 150.0, }, ) spectrum_2 = Spectrum( mz=np.array([100, 150, 200.]), intensities=np.array([0.7, 0.2, 0.1]), metadata={ "inchikey": "UGFAIRIUMAVXCW-UHFFFAOYSA-N", "smiles": "[C-]#[O+]", "precursor_mz": 150.0, }, ) spectra = [spectrum_1, spectrum_2] generator = rdFingerprintGenerator.GetMorganGenerator(radius=2, fpSize=256) fpgen = Fingerprints( fingerprint_generator=generator, count=False, folded=True, return_csr=False, ) fpgen.compute_fingerprints(spectra) print(fpgen.fingerprint_count) print(type(fpgen.get_fingerprint_by_inchikey("OTMSDBZUPAUEDD-UHFFFAOYSA-N")))
Should output
2 <class 'numpy.ndarray'>
- fingerprints
The computed fingerprints as either a NumPy array or SciPy CSR matrix.
- inchikeys
Ordered list of unique InChIKeys corresponding to fingerprint rows.
- fingerprint_count
Number of unique fingerprints currently stored.
- config
Dictionary with configuration used for fingerprint computation.
- to_dataframe
DataFrame containing InChIKeys and fingerprints.
- __init__(fingerprint_generator, *, ignore_stereochemistry: bool = False, count: bool = False, folded: bool = True, return_csr: bool = False, invalid_policy: str = 'raise', **config_kwargs)[source]
- Parameters:
fingerprint_generator – A chemap-compatible fingerprint generator, for example an RDKit fingerprint generator or a scikit-fingerprints object.
ignore_stereochemistry – If True, the first 14 characters of the InChIKey are used.
count – Whether count fingerprints should be computed.
folded – Whether fingerprints should be folded.
return_csr – If True, fingerprints are stored as a SciPy CSR matrix. Otherwise they are stored as a dense NumPy array.
invalid_policy – Policy passed to chemap for invalid molecular inputs.
**config_kwargs – Additional keyword arguments passed into
FingerprintConfig.
- compute_fingerprint(spectrum: Spectrum)[source]
Compute one fingerprint for a given spectrum.
This does not add the fingerprint to the internal storage. It only computes and returns the fingerprint.
- Parameters:
spectrum – A spectrum for which a fingerprint is to be calculated.
- Returns:
Fingerprint row, or None if fingerprint could not be computed.
- Return type:
Optional[np.ndarray | scipy.sparse.csr_matrix]
- compute_fingerprints(spectra: list[Spectrum])[source]
Compute fingerprints for a list of spectra.
Fingerprints are computed only for unique compounds, keyed by InChIKey. Existing stored fingerprints are replaced.
- Parameters:
spectra – List of spectra.
- property fingerprints: ndarray | csr_matrix | None
Return the stored fingerprint matrix.
- get_fingerprint_by_inchikey(inchikey: str)[source]
Get fingerprint by InChIKey.
- Parameters:
inchikey – InChIKey of a compound.
- Returns:
The corresponding fingerprint row, or None if not present.
- Return type:
Optional[np.ndarray | scipy.sparse.csr_matrix]
- get_fingerprint_by_spectrum(spectrum: Spectrum)[source]
Get fingerprint by spectrum.
- Parameters:
spectrum – Spectrum with an InChIKey.
- Returns:
The corresponding fingerprint row, or None if not present.
- Return type:
Optional[np.ndarray | scipy.sparse.csr_matrix]
- property to_dataframe: DataFrame
Return fingerprints as a pandas DataFrame indexed by InChIKey.