matchms.similarity.FingerprintSimilarity module

class matchms.similarity.FingerprintSimilarity.FingerprintSimilarity(fingerprint_generator, similarity_measure: str = 'tanimoto', set_empty_scores: float | int | str = 'nan', ignore_stereochemistry: bool = False, count: bool = False, folded: bool = True, return_csr: bool = False, invalid_policy: str = 'raise', **fingerprint_config_kwargs)[source]

Bases: BaseSimilarity

Calculate similarity between molecules based on molecular fingerprints.

Fingerprints can either be provided explicitly as Fingerprints objects or computed internally from input spectra.

This class no longer expects fingerprints to be stored directly in spectrum metadata. Instead, it uses a Fingerprints container.

Currently supported similarity measures are:

  • "cosine"

  • "tanimoto"

Notes

  • Tanimoto is used in its generalized form and therefore also works for count/weighted fingerprints.

  • Fingerprints may be stored densely (NumPy) or sparsely (CSR).

__init__(fingerprint_generator, similarity_measure: str = 'tanimoto', set_empty_scores: float | int | str = 'nan', ignore_stereochemistry: bool = False, count: bool = False, folded: bool = True, return_csr: bool = False, invalid_policy: str = 'raise', **fingerprint_config_kwargs)[source]
Parameters:
  • fingerprint_generator – A chemap-compatible fingerprint generator.

  • similarity_measure – Choose similarity measure from "cosine" or "tanimoto". The default is "tanimoto".

  • set_empty_scores – Define what should be returned instead of a similarity score in cases where fingerprints are missing. The default is "nan", which will return np.nan in such cases.

  • ignore_stereochemistry – Passed to internally created Fingerprints objects.

  • count – Passed to internally created Fingerprints objects.

  • folded – Passed to internally created Fingerprints objects.

  • return_csr – Passed to internally created Fingerprints objects.

  • invalid_policy – Passed to internally created Fingerprints objects.

  • **fingerprint_config_kwargs – Additional keyword arguments passed to internally created Fingerprints objects.

property is_structured_score: bool

Return True if this similarity uses a structured score dtype.

matrix(spectra_1: Sequence[Spectrum] | None = None, spectra_2: Sequence[Spectrum] | None = None, fingerprints_1: Fingerprints | None = None, fingerprints_2: Fingerprints | None = None, score_fields: Sequence[str] | None = None, progress_bar: bool = True) Scores[source]

Calculate matrix of fingerprint-based similarity scores.

Parameters:
  • spectra_1 – First collection of spectra. Used only if fingerprints_1 is not given.

  • spectra_2 – Second collection of spectra. Used only if fingerprints_2 is not given. If None and fingerprints_2 is None, compare the first input against itself.

  • fingerprints_1 – Optional precomputed Fingerprints object for the first input.

  • fingerprints_2 – Optional precomputed Fingerprints object for the second input. If None, compare the first input against itself.

  • score_fields – Requested score fields. Only ("score",) is supported.

  • progress_bar – Included for API compatibility. Not used here.

Returns:

Dense score matrix as a Scores object.

Return type:

Scores

pair(spectrum_1: Spectrum, spectrum_2: Spectrum)[source]

Pairwise fingerprint similarity is not supported in this API.

FingerprintSimilarity works on precomputed Fingerprints containers or computes fingerprints internally for collections of spectra in matrix().

Use matrix(…) instead.

score_datatype

alias of float64

sparse_matrix(spectra_1, spectra_2=None, idx_row=None, idx_col=None, score_fields=None, score_filter=None, progress_bar: bool = True)

Sparse score computation is not available for this similarity.

to_dict() dict

Return a dictionary representation of the similarity function.