matchms.similarity.FingerprintSimilarity module
- class matchms.similarity.FingerprintSimilarity.FingerprintSimilarity(similarity_measure: str = 'jaccard', set_empty_scores: float | int | str = 'nan')[source]
Bases:
BaseSimilarityCalculate similarity between molecules based on their fingerprints.
For this similarity measure to work, fingerprints are expected to be derived by running
add_fingerprint().Code example:
import numpy as np from matchms import calculate_scores from matchms import Spectrum from matchms.filtering import add_fingerprint from matchms.similarity import FingerprintSimilarity spectrum_1 = Spectrum( mz=np.array([], dtype="float"), intensities=np.array([], dtype="float"), metadata={"smiles": "CCC(C)C(C(=O)O)NC(=O)CCl", "precursor_mz": 200.2} ) spectrum_2 = Spectrum( mz=np.array([], dtype="float"), intensities=np.array([], dtype="float"), metadata={"smiles": "CC(C)C(C(=O)O)NC(=O)CCl", "precursor_mz": 200.2} ) spectrum_3 = Spectrum( mz=np.array([], dtype="float"), intensities=np.array([], dtype="float"), metadata={"smiles": "C(C(=O)O)(NC(=O)O)S", "precursor_mz": 200.2} ) spectra = [spectrum_1, spectrum_2, spectrum_3] # Add fingerprints spectra = [add_fingerprint(x, nbits=256) for x in spectra] # Specify type and calculate similarities similarity_measure = FingerprintSimilarity("jaccard") scores = calculate_scores(spectra, spectra, similarity_measure) print(np.round(scores.scores.to_array(), 3).tolist())
Should output
[[1.0, 0.878, 0.415], [0.878, 1.0, 0.444], [0.415, 0.444, 1.0]]
- __init__(similarity_measure: str = 'jaccard', set_empty_scores: float | int | str = 'nan')[source]
- Parameters:
similarity_measure – Chose similarity measure form “cosine”, “dice”, “jaccard”. The default is “jaccard”.
set_empty_scores – Define what should be given instead of a similarity score in cases where fingprints are missing. The default is “nan”, which will return np.nan’s in such cases.
- keep_score(score)
In the .matrix method scores will be collected in a sparse way. Overwrite this method here if values other than False or 0 should not be stored in the final collection.
- matrix(references: List[Spectrum], queries: List[Spectrum], array_type: str = 'numpy', is_symmetric: bool = False) array[source]
Calculate matrix of fingerprint based similarity scores.
- Parameters:
references – List of reference spectra.
queries – List of query spectra.
array_type – Specify the output array type. Can be “numpy” or “sparse”. Default is “numpy” and will return a numpy array. “sparse” will return a COO-sparse array
- pair(reference: Spectrum, query: Spectrum) float[source]
Calculate fingerprint based similarity score between two spectra.
- Parameters:
reference – Single reference spectrum.
query – Single query spectrum.
- score_datatype
alias of
float64
- sparse_array(references: List[Spectrum], queries: List[Spectrum], idx_row, idx_col, is_symmetric: bool = False, progress_bar: bool = True)
Optional: Provide optimized method to calculate an sparse matrix of similarity scores.
Compute similarity scores for pairs of reference and query spectra as given by the indices idx_row (references) and idx_col (queries). If no method is added here, the following naive implementation (i.e. a for-loop) is used.
- Parameters:
references – List of reference objects
queries – List of query objects
idx_row – List/array of row indices
idx_col – List/array of column indices
is_symmetric – Set to True when references and queries are identical (as for instance for an all-vs-all comparison). By using the fact that score[i,j] = score[j,i] the calculation will be about 2x faster.
progress_bar – When True a progress bar is shown. Default is True.