matchms.similarity.FingerprintSimilarity module
- class matchms.similarity.FingerprintSimilarity.FingerprintSimilarity(fingerprint_generator, similarity_measure: str = 'tanimoto', set_empty_scores: float | int | str = 'nan', ignore_stereochemistry: bool = False, count: bool = False, folded: bool = True, return_csr: bool = False, invalid_policy: str = 'raise', **fingerprint_config_kwargs)[source]
Bases:
BaseSimilarityCalculate similarity between molecules based on molecular fingerprints.
Fingerprints can either be provided explicitly as
Fingerprintsobjects or computed internally from input spectra.This class no longer expects fingerprints to be stored directly in spectrum metadata. Instead, it uses a
Fingerprintscontainer.Currently supported similarity measures are:
"cosine""tanimoto"
Notes
Tanimoto is used in its generalized form and therefore also works for count/weighted fingerprints.
Fingerprints may be stored densely (NumPy) or sparsely (CSR).
- __init__(fingerprint_generator, similarity_measure: str = 'tanimoto', set_empty_scores: float | int | str = 'nan', ignore_stereochemistry: bool = False, count: bool = False, folded: bool = True, return_csr: bool = False, invalid_policy: str = 'raise', **fingerprint_config_kwargs)[source]
- Parameters:
fingerprint_generator – A chemap-compatible fingerprint generator.
similarity_measure – Choose similarity measure from
"cosine"or"tanimoto". The default is"tanimoto".set_empty_scores – Define what should be returned instead of a similarity score in cases where fingerprints are missing. The default is
"nan", which will returnnp.nanin such cases.ignore_stereochemistry – Passed to internally created
Fingerprintsobjects.count – Passed to internally created
Fingerprintsobjects.folded – Passed to internally created
Fingerprintsobjects.return_csr – Passed to internally created
Fingerprintsobjects.invalid_policy – Passed to internally created
Fingerprintsobjects.**fingerprint_config_kwargs – Additional keyword arguments passed to internally created
Fingerprintsobjects.
- matrix(spectra_1: Sequence[Spectrum] | None = None, spectra_2: Sequence[Spectrum] | None = None, fingerprints_1: Fingerprints | None = None, fingerprints_2: Fingerprints | None = None, score_fields: Sequence[str] | None = None, progress_bar: bool = True) Scores[source]
Calculate matrix of fingerprint-based similarity scores.
- Parameters:
spectra_1 – First collection of spectra. Used only if fingerprints_1 is not given.
spectra_2 – Second collection of spectra. Used only if fingerprints_2 is not given. If None and fingerprints_2 is None, compare the first input against itself.
fingerprints_1 – Optional precomputed Fingerprints object for the first input.
fingerprints_2 – Optional precomputed Fingerprints object for the second input. If None, compare the first input against itself.
score_fields – Requested score fields. Only
("score",)is supported.progress_bar – Included for API compatibility. Not used here.
- Returns:
Dense score matrix as a
Scoresobject.- Return type:
- pair(spectrum_1: Spectrum, spectrum_2: Spectrum)[source]
Pairwise fingerprint similarity is not supported in this API.
FingerprintSimilarity works on precomputed Fingerprints containers or computes fingerprints internally for collections of spectra in matrix().
Use matrix(…) instead.
- score_datatype
alias of
float64