matchms.similarity.FingerprintSimilarity module¶
- class matchms.similarity.FingerprintSimilarity.FingerprintSimilarity(similarity_measure: str = 'jaccard', set_empty_scores: Union[float, int, str] = 'nan')[source]¶
Bases:
matchms.similarity.BaseSimilarity.BaseSimilarity
Calculate similarity between molecules based on their fingerprints.
For this similarity measure to work, fingerprints are expected to be derived by running
add_fingerprint()
.Code example:
import numpy as np from matchms import calculate_scores from matchms import Spectrum from matchms.filtering import add_fingerprint from matchms.similarity import FingerprintSimilarity spectrum_1 = Spectrum(mz=np.array([], dtype="float"), intensities=np.array([], dtype="float"), metadata={"smiles": "CCC(C)C(C(=O)O)NC(=O)CCl"}) spectrum_2 = Spectrum(mz=np.array([], dtype="float"), intensities=np.array([], dtype="float"), metadata={"smiles": "CC(C)C(C(=O)O)NC(=O)CCl"}) spectrum_3 = Spectrum(mz=np.array([], dtype="float"), intensities=np.array([], dtype="float"), metadata={"smiles": "C(C(=O)O)(NC(=O)O)S"}) spectrums = [spectrum_1, spectrum_2, spectrum_3] # Add fingerprints spectrums = [add_fingerprint(x, nbits=256) for x in spectrums] # Specify type and calculate similarities similarity_measure = FingerprintSimilarity("jaccard") scores = calculate_scores(spectrums, spectrums, similarity_measure) print(np.round(scores.scores, 3))
Should output
[[1. 0.878 0.415] [0.878 1. 0.444] [0.415 0.444 1. ]]
- __init__(similarity_measure: str = 'jaccard', set_empty_scores: Union[float, int, str] = 'nan')[source]¶
- Parameters
similarity_measure – Chose similarity measure form “cosine”, “dice”, “jaccard”. The default is “jaccard”.
set_empty_scores – Define what should be given instead of a similarity score in cases where fingprints are missing. The default is “nan”, which will return numpy.nan’s in such cases.
- matrix(references: List[Spectrum], queries: List[Spectrum], is_symmetric: bool = False) numpy.array [source]¶
Calculate matrix of fingerprint based similarity scores.
- Parameters
references – List of reference spectrums.
queries – List of query spectrums.
- pair(reference: Spectrum, query: Spectrum) float [source]¶
Calculate fingerprint based similarity score between two spectra.
- Parameters
reference – Single reference spectrum.
query – Single query spectrum.
- score_datatype¶
alias of
numpy.float64
- sort(scores: numpy.ndarray)¶
Return array of indexes for sorted list of scores. This method can be adapted for different styles of scores.
- Parameters
scores – 1D Array of scores.
- Returns
Indexes of sorted scores.
- Return type
idx_sorted