matchms.Fingerprints module

class matchms.Fingerprints.Fingerprints(fingerprint_algorithm: str = 'daylight', fingerprint_method: str = 'bit', nbits: int = 2048, ignore_stereochemistry: bool = False, **kwargs)[source]

Bases: object

Computes and stores inchikey-fingerprint mapping for a list of spectra,

For example

from matchms import Fingerprints
from matchms import Spectrum
import numpy as np

spectrum_1 = Spectrum(mz=np.array([100, 150, 200.]),
                      intensities=np.array([0.7, 0.2, 0.1]),
                      metadata={"inchikey": "OTMSDBZUPAUEDD-UHFFFAOYSA-N",
                                "smiles":"CC",
                                "precursor_mz": 150.0})
spectrum_2 = Spectrum(mz=np.array([100, 150, 200.]),
                      intensities=np.array([0.7, 0.2, 0.1]),
                      metadata={"inchikey": "UGFAIRIUMAVXCW-UHFFFAOYSA-N",
                                "smiles": "[C-]#[O+]",
                                "precursor_mz": 150.0})
spectra = [spectrum_1, spectrum_2]

fpgen = Fingerprints()
fpgen.compute_fingerprints(spectra)

print(fpgen.fingerprint_count)
print(type(fpgen.get_fingerprint_by_inchikey('OTMSDBZUPAUEDD-UHFFFAOYSA-N')))

Should output

2
<class 'numpy.ndarray'>
config

The configuration for the fingerprints e.g., used algorithm, nbits, …

fingerprints

The computed fingerprints. Use after compute_fingerprints().

fingerprints_count

The number of fingerprints computed.

to_dataframe

A DataFrame containing the inchikey and fingerprint

__init__(fingerprint_algorithm: str = 'daylight', fingerprint_method: str = 'bit', nbits: int = 2048, ignore_stereochemistry: bool = False, **kwargs)[source]
Parameters:
  • fingerprint_algorithm – The fingerprint algorithm to use. Available options: daylight, morgan1, morgan2, morgan3.

  • fingerprint_method – The fingerprint method to use. Available options: bit, sparse_bit, count, sparse_count.

  • nbits – The number of bits or fingerprint size. Defaults to 2048.

  • ignore_stereochemistry – Determines which inchikey version will be used. If set to true the first 14 chars of the inchikey are used.

compute_fingerprint(spectrum: Spectrum) ndarray | None[source]

Computes a single fingerprint for a given spectrum.

Parameters:
  • spectrum – A spectrum for which a fingerprint is to be calculated.

  • Return

  • --------------

  • Optional[np.ndarray] – The corresponding fingerprint.

compute_fingerprints(spectra: list[Spectrum])[source]

Computes fingerprints for a list of spectra.

This will first create a dict with unique spectra and then computes fingerprints for all mols. Only valid fingerprints will be added to the mapping. Query specific fingerprints by using get_fingerprint_by_spectrum() or get_fingerprint_by_inchikey()

Parameters:

spectra – List of Spectrum

get_fingerprint_by_inchikey(inchikey: str) ndarray | None[source]

Get fingerprint by inchikey.

Parameters:
  • inchikey – Inchikey of a spectrum.

  • Return

  • --------------

  • Optional[np.ndarray] – The corresponding fingerprint.

get_fingerprint_by_spectrum(spectrum: Spectrum) ndarray | None[source]

Get fingerprint by spectrum.

Parameters:
  • spectrum – Spectrum with a inchikey.

  • Return

  • --------------

  • Optional[np.ndarray] – The corresponding fingerprint.