matchms.similarity package¶
Functions for computing spectra similarities¶
Matchms provides a number of frequently used similarity scores to compare mass spectra. This includes
scores based on comparing peak positions and intensities (
CosineGreedy
orModifiedCosine
)simple scores that only assess precursor m/z or parent mass matches (
PrecursorMzMatch
or:ParentMassMatch
)scores assessing molecular similarity if structures (SMILES, InchiKey) are given as metadata (
FingerprintSimilarity
)score for assessing matches in user-defined metadata fields which can be used to find equal entries (e.g. instrument_type) or numerical values within a specified tolerance (for instance: retention_time, collision energy…) (
MetadataMatch
)
It is also easily possible to add own custom similarity measures or import external ones (such as Spec2Vec).
- class matchms.similarity.CosineGreedy(tolerance: float = 0.1, mz_power: float = 0.0, intensity_power: float = 1.0)[source]¶
Bases:
BaseSimilarity
Calculate ‘cosine similarity score’ between two spectra.
The cosine score aims at quantifying the similarity between two mass spectra. The score is calculated by finding best possible matches between peaks of two spectra. Two peaks are considered a potential match if their m/z ratios lie within the given ‘tolerance’. The underlying peak assignment problem is here solved in a ‘greedy’ way. This can perform notably faster, but does occasionally deviate slightly from a fully correct solution (as with the Hungarian algorithm, see
CosineHungarian
). In practice this will rarely affect similarity scores notably, in particular for smaller tolerances.For example
import numpy as np from matchms import Spectrum from matchms.similarity import CosineGreedy reference = Spectrum(mz=np.array([100, 150, 200.]), intensities=np.array([0.7, 0.2, 0.1])) query = Spectrum(mz=np.array([100, 140, 190.]), intensities=np.array([0.4, 0.2, 0.1])) # Use factory to construct a similarity function cosine_greedy = CosineGreedy(tolerance=0.2) score = cosine_greedy.pair(reference, query) print(f"Cosine score is {score['score']:.2f} with {score['matches']} matched peaks")
Should output
Cosine score is 0.83 with 1 matched peaks
- __init__(tolerance: float = 0.1, mz_power: float = 0.0, intensity_power: float = 1.0)[source]¶
- Parameters:
tolerance – Peaks will be considered a match when <= tolerance apart. Default is 0.1.
mz_power – The power to raise m/z to in the cosine function. The default is 0, in which case the peak intensity products will not depend on the m/z ratios.
intensity_power – The power to raise intensity to in the cosine function. The default is 1.
- keep_score(score)¶
In the .matrix method scores will be collected in a sparse way. Overwrite this method here if values other than False or 0 should not be stored in the final collection.
- matrix(references: List[Spectrum], queries: List[Spectrum], array_type: str = 'numpy', is_symmetric: bool = False) ndarray ¶
Optional: Provide optimized method to calculate an np.array of similarity scores for given reference and query spectrums. If no method is added here, the following naive implementation (i.e. a double for-loop) is used.
- Parameters:
references – List of reference objects
queries – List of query objects
array_type – Specify the output array type. Can be “numpy” or “sparse”. Default is “numpy” and will return a numpy array. “sparse” will return a COO-sparse array.
is_symmetric – Set to True when references and queries are identical (as for instance for an all-vs-all comparison). By using the fact that score[i,j] = score[j,i] the calculation will be about 2x faster.
- pair(reference: Spectrum, query: Spectrum) Tuple[float, int] [source]¶
Calculate cosine score between two spectra.
- Parameters:
reference – Single reference spectrum.
query – Single query spectrum.
- Returns:
Tuple with cosine score and number of matched peaks.
- Return type:
Score
- sparse_array(references: List[Spectrum], queries: List[Spectrum], idx_row, idx_col, is_symmetric: bool = False)¶
Optional: Provide optimized method to calculate an sparse matrix of similarity scores.
Compute similarity scores for pairs of reference and query spectrums as given by the indices idx_row (references) and idx_col (queries). If no method is added here, the following naive implementation (i.e. a for-loop) is used.
- Parameters:
references – List of reference objects
queries – List of query objects
idx_row – List/array of row indices
idx_col – List/array of column indices
is_symmetric – Set to True when references and queries are identical (as for instance for an all-vs-all comparison). By using the fact that score[i,j] = score[j,i] the calculation will be about 2x faster.
- class matchms.similarity.CosineHungarian(tolerance: float = 0.1, mz_power: float = 0.0, intensity_power: float = 1.0)[source]¶
Bases:
BaseSimilarity
Calculate ‘cosine similarity score’ between two spectra (using Hungarian algorithm).
The cosine score aims at quantifying the similarity between two mass spectra. The score is calculated by finding best possible matches between peaks of two spectra. Two peaks are considered a potential match if their m/z ratios lie within the given ‘tolerance’. The underlying peak assignment problem is here solved using the Hungarian algorithm. This can perform notably slower than the ‘greedy’ implementation in
CosineGreedy
, but does represent a mathematically proper solution to the problem.- __init__(tolerance: float = 0.1, mz_power: float = 0.0, intensity_power: float = 1.0)[source]¶
- Parameters:
tolerance – Peaks will be considered a match when <= tolerance apart. Default is 0.1.
mz_power – The power to raise m/z to in the cosine function. The default is 0, in which case the peak intensity products will not depend on the m/z ratios.
intensity_power – The power to raise intensity to in the cosine function. The default is 1.
- keep_score(score)¶
In the .matrix method scores will be collected in a sparse way. Overwrite this method here if values other than False or 0 should not be stored in the final collection.
- matrix(references: List[Spectrum], queries: List[Spectrum], array_type: str = 'numpy', is_symmetric: bool = False) ndarray ¶
Optional: Provide optimized method to calculate an np.array of similarity scores for given reference and query spectrums. If no method is added here, the following naive implementation (i.e. a double for-loop) is used.
- Parameters:
references – List of reference objects
queries – List of query objects
array_type – Specify the output array type. Can be “numpy” or “sparse”. Default is “numpy” and will return a numpy array. “sparse” will return a COO-sparse array.
is_symmetric – Set to True when references and queries are identical (as for instance for an all-vs-all comparison). By using the fact that score[i,j] = score[j,i] the calculation will be about 2x faster.
- pair(reference: Spectrum, query: Spectrum) Tuple[float, int] [source]¶
Calculate cosine score between two spectra.
- Parameters:
reference – Single reference spectrum.
query – Single query spectrum.
Returns –
-------- –
peaks. (Tuple with cosine score and number of matched) –
- sparse_array(references: List[Spectrum], queries: List[Spectrum], idx_row, idx_col, is_symmetric: bool = False)¶
Optional: Provide optimized method to calculate an sparse matrix of similarity scores.
Compute similarity scores for pairs of reference and query spectrums as given by the indices idx_row (references) and idx_col (queries). If no method is added here, the following naive implementation (i.e. a for-loop) is used.
- Parameters:
references – List of reference objects
queries – List of query objects
idx_row – List/array of row indices
idx_col – List/array of column indices
is_symmetric – Set to True when references and queries are identical (as for instance for an all-vs-all comparison). By using the fact that score[i,j] = score[j,i] the calculation will be about 2x faster.
- class matchms.similarity.FingerprintSimilarity(similarity_measure: str = 'jaccard', set_empty_scores: float | int | str = 'nan')[source]¶
Bases:
BaseSimilarity
Calculate similarity between molecules based on their fingerprints.
For this similarity measure to work, fingerprints are expected to be derived by running
add_fingerprint()
.Code example:
import numpy as np from matchms import calculate_scores from matchms import Spectrum from matchms.filtering import add_fingerprint from matchms.similarity import FingerprintSimilarity spectrum_1 = Spectrum(mz=np.array([], dtype="float"), intensities=np.array([], dtype="float"), metadata={"smiles": "CCC(C)C(C(=O)O)NC(=O)CCl"}) spectrum_2 = Spectrum(mz=np.array([], dtype="float"), intensities=np.array([], dtype="float"), metadata={"smiles": "CC(C)C(C(=O)O)NC(=O)CCl"}) spectrum_3 = Spectrum(mz=np.array([], dtype="float"), intensities=np.array([], dtype="float"), metadata={"smiles": "C(C(=O)O)(NC(=O)O)S"}) spectrums = [spectrum_1, spectrum_2, spectrum_3] # Add fingerprints spectrums = [add_fingerprint(x, nbits=256) for x in spectrums] # Specify type and calculate similarities similarity_measure = FingerprintSimilarity("jaccard") scores = calculate_scores(spectrums, spectrums, similarity_measure) print(np.round(scores.scores.to_array(), 3))
Should output
[[1. 0.878 0.415] [0.878 1. 0.444] [0.415 0.444 1. ]]
- __init__(similarity_measure: str = 'jaccard', set_empty_scores: float | int | str = 'nan')[source]¶
- Parameters:
similarity_measure – Chose similarity measure form “cosine”, “dice”, “jaccard”. The default is “jaccard”.
set_empty_scores – Define what should be given instead of a similarity score in cases where fingprints are missing. The default is “nan”, which will return np.nan’s in such cases.
- keep_score(score)¶
In the .matrix method scores will be collected in a sparse way. Overwrite this method here if values other than False or 0 should not be stored in the final collection.
- matrix(references: List[Spectrum], queries: List[Spectrum], array_type: str = 'numpy', is_symmetric: bool = False) array [source]¶
Calculate matrix of fingerprint based similarity scores.
- Parameters:
references – List of reference spectrums.
queries – List of query spectrums.
array_type – Specify the output array type. Can be “numpy” or “sparse”. Default is “numpy” and will return a numpy array. “sparse” will return a COO-sparse array
- pair(reference: Spectrum, query: Spectrum) float [source]¶
Calculate fingerprint based similarity score between two spectra.
- Parameters:
reference – Single reference spectrum.
query – Single query spectrum.
- score_datatype¶
alias of
float64
- sparse_array(references: List[Spectrum], queries: List[Spectrum], idx_row, idx_col, is_symmetric: bool = False)¶
Optional: Provide optimized method to calculate an sparse matrix of similarity scores.
Compute similarity scores for pairs of reference and query spectrums as given by the indices idx_row (references) and idx_col (queries). If no method is added here, the following naive implementation (i.e. a for-loop) is used.
- Parameters:
references – List of reference objects
queries – List of query objects
idx_row – List/array of row indices
idx_col – List/array of column indices
is_symmetric – Set to True when references and queries are identical (as for instance for an all-vs-all comparison). By using the fact that score[i,j] = score[j,i] the calculation will be about 2x faster.
- class matchms.similarity.IntersectMz(scaling: float = 1.0)[source]¶
Bases:
BaseSimilarity
Example score for illustrating how to build custom spectra similarity score.
IntersectMz will count all exact matches of peaks and divide it by all unique peaks found in both spectrums.
Example of how matchms similarity functions can be used:
import numpy as np from matchms import Spectrum from matchms.similarity import IntersectMz spectrum_1 = Spectrum(mz=np.array([100, 150, 200.]), intensities=np.array([0.7, 0.2, 0.1])) spectrum_2 = Spectrum(mz=np.array([100, 140, 190.]), intensities=np.array([0.4, 0.2, 0.1])) # Construct a similarity function similarity_measure = IntersectMz(scaling=1.0) score = similarity_measure.pair(spectrum_1, spectrum_2) print(f"IntersectMz score is {score:.2f}")
Should output
IntersectMz score is 0.20
- __init__(scaling: float = 1.0)[source]¶
Constructor. Here, function parameters are defined.
- Parameters:
scaling – Scale scores to maximum possible score being ‘scaling’.
- keep_score(score)¶
In the .matrix method scores will be collected in a sparse way. Overwrite this method here if values other than False or 0 should not be stored in the final collection.
- matrix(references: List[Spectrum], queries: List[Spectrum], array_type: str = 'numpy', is_symmetric: bool = False) ndarray ¶
Optional: Provide optimized method to calculate an np.array of similarity scores for given reference and query spectrums. If no method is added here, the following naive implementation (i.e. a double for-loop) is used.
- Parameters:
references – List of reference objects
queries – List of query objects
array_type – Specify the output array type. Can be “numpy” or “sparse”. Default is “numpy” and will return a numpy array. “sparse” will return a COO-sparse array.
is_symmetric – Set to True when references and queries are identical (as for instance for an all-vs-all comparison). By using the fact that score[i,j] = score[j,i] the calculation will be about 2x faster.
- pair(reference: Spectrum, query: Spectrum) float [source]¶
This will calculate the similarity score between two spectra.
- score_datatype¶
alias of
float64
- sparse_array(references: List[Spectrum], queries: List[Spectrum], idx_row, idx_col, is_symmetric: bool = False)¶
Optional: Provide optimized method to calculate an sparse matrix of similarity scores.
Compute similarity scores for pairs of reference and query spectrums as given by the indices idx_row (references) and idx_col (queries). If no method is added here, the following naive implementation (i.e. a for-loop) is used.
- Parameters:
references – List of reference objects
queries – List of query objects
idx_row – List/array of row indices
idx_col – List/array of column indices
is_symmetric – Set to True when references and queries are identical (as for instance for an all-vs-all comparison). By using the fact that score[i,j] = score[j,i] the calculation will be about 2x faster.
- class matchms.similarity.MetadataMatch(field: str, matching_type: str = 'equal_match', tolerance: float = 0.1)[source]¶
Bases:
BaseSimilarity
Return True if metadata entries of a specified field match between two spectra.
This is supposed to be used to compare a wide range of possible metadata entries and use this to later select related or similar spectra.
Example to calculate scores between 2 pairs of spectrums and iterate over the scores
import numpy as np from matchms import calculate_scores from matchms import Spectrum from matchms.similarity import MetadataMatch spectrum_1 = Spectrum(mz=np.array([]), intensities=np.array([]), metadata={"instrument_type": "orbitrap", "id": 1}) spectrum_2 = Spectrum(mz=np.array([]), intensities=np.array([]), metadata={"instrument_type": "qtof", "id": 2}) spectrum_3 = Spectrum(mz=np.array([]), intensities=np.array([]), metadata={"instrument_type": "qtof", "id": 3}) spectrum_4 = Spectrum(mz=np.array([]), intensities=np.array([]), metadata={"instrument_type": "orbitrap", "id": 4}) references = [spectrum_1, spectrum_2] queries = [spectrum_3, spectrum_4] similarity_score = MetadataMatch(field="instrument_type") scores = calculate_scores(references, queries, similarity_score) for (reference, query, score) in scores: print(f"Metadata match between {reference.get('id')} and {query.get('id')}" + f" is {score}")
Should output
Metadata match between 1 and 4 is [True] Metadata match between 2 and 3 is [True]
- __init__(field: str, matching_type: str = 'equal_match', tolerance: float = 0.1)[source]¶
- Parameters:
field – Specify field name for metadata that should be compared.
matching_type – Specify how field entries should be matched. Can be one of [“equal_match”, “difference”].
tolerance – Specify tolerance below which two values are counted as match. This only applied to numerical values.
- keep_score(score)¶
In the .matrix method scores will be collected in a sparse way. Overwrite this method here if values other than False or 0 should not be stored in the final collection.
- matrix(references: List[Spectrum], queries: List[Spectrum], array_type: str = 'numpy', is_symmetric: bool = False) ndarray [source]¶
Compare parent masses between all references and queries.
- Parameters:
references – List/array of reference spectrums.
queries – List/array of Single query spectrums.
array_type – Specify the output array type. Can be “numpy” or “sparse”. Default is “numpy” and will return a numpy array. “sparse” will return a COO-sparse array.
is_symmetric – Set to True when references and queries are identical (as for instance for an all-vs-all comparison). By using the fact that score[i,j] = score[j,i] the calculation will be about 2x faster.
- pair(reference: Spectrum, query: Spectrum) float [source]¶
Compare precursor m/z between reference and query spectrum.
- Parameters:
reference – Single reference spectrum.
query – Single query spectrum.
- sparse_array(references: List[Spectrum], queries: List[Spectrum], idx_row, idx_col, is_symmetric: bool = False)¶
Optional: Provide optimized method to calculate an sparse matrix of similarity scores.
Compute similarity scores for pairs of reference and query spectrums as given by the indices idx_row (references) and idx_col (queries). If no method is added here, the following naive implementation (i.e. a for-loop) is used.
- Parameters:
references – List of reference objects
queries – List of query objects
idx_row – List/array of row indices
idx_col – List/array of column indices
is_symmetric – Set to True when references and queries are identical (as for instance for an all-vs-all comparison). By using the fact that score[i,j] = score[j,i] the calculation will be about 2x faster.
- class matchms.similarity.ModifiedCosine(tolerance: float = 0.1, mz_power: float = 0.0, intensity_power: float = 1.0)[source]¶
Bases:
BaseSimilarity
Calculate ‘modified cosine score’ between mass spectra.
The modified cosine score aims at quantifying the similarity between two mass spectra. The score is calculated by finding best possible matches between peaks of two spectra. Two peaks are considered a potential match if their m/z ratios lie within the given ‘tolerance’, or if their m/z ratios lie within the tolerance once a mass-shift is applied. The mass shift is simply the difference in precursor-m/z between the two spectra. See Watrous et al. [PNAS, 2012, https://www.pnas.org/content/109/26/E1743] for further details.
For example
import numpy as np from matchms import Spectrum from matchms.similarity import ModifiedCosine spectrum_1 = Spectrum(mz=np.array([100, 150, 200.]), intensities=np.array([0.7, 0.2, 0.1]), metadata={"precursor_mz": 100.0}) spectrum_2 = Spectrum(mz=np.array([104.9, 140, 190.]), intensities=np.array([0.4, 0.2, 0.1]), metadata={"precursor_mz": 105.0}) # Use factory to construct a similarity function modified_cosine = ModifiedCosine(tolerance=0.2) score = modified_cosine.pair(spectrum_1, spectrum_2) print(f"Modified cosine score is {score['score']:.2f} with {score['matches']} matched peaks")
Should output
Modified cosine score is 0.83 with 1 matched peaks
- __init__(tolerance: float = 0.1, mz_power: float = 0.0, intensity_power: float = 1.0)[source]¶
- Parameters:
tolerance – Peaks will be considered a match when <= tolerance apart. Default is 0.1.
mz_power – The power to raise mz to in the cosine function. The default is 0, in which case the peak intensity products will not depend on the m/z ratios.
intensity_power – The power to raise intensity to in the cosine function. The default is 1.
- keep_score(score)¶
In the .matrix method scores will be collected in a sparse way. Overwrite this method here if values other than False or 0 should not be stored in the final collection.
- matrix(references: List[Spectrum], queries: List[Spectrum], array_type: str = 'numpy', is_symmetric: bool = False) ndarray ¶
Optional: Provide optimized method to calculate an np.array of similarity scores for given reference and query spectrums. If no method is added here, the following naive implementation (i.e. a double for-loop) is used.
- Parameters:
references – List of reference objects
queries – List of query objects
array_type – Specify the output array type. Can be “numpy” or “sparse”. Default is “numpy” and will return a numpy array. “sparse” will return a COO-sparse array.
is_symmetric – Set to True when references and queries are identical (as for instance for an all-vs-all comparison). By using the fact that score[i,j] = score[j,i] the calculation will be about 2x faster.
- pair(reference: Spectrum, query: Spectrum) Tuple[float, int] [source]¶
Calculate modified cosine score between two spectra.
- Parameters:
reference – Single reference spectrum.
query – Single query spectrum.
- Return type:
Tuple with cosine score and number of matched peaks.
- sparse_array(references: List[Spectrum], queries: List[Spectrum], idx_row, idx_col, is_symmetric: bool = False)¶
Optional: Provide optimized method to calculate an sparse matrix of similarity scores.
Compute similarity scores for pairs of reference and query spectrums as given by the indices idx_row (references) and idx_col (queries). If no method is added here, the following naive implementation (i.e. a for-loop) is used.
- Parameters:
references – List of reference objects
queries – List of query objects
idx_row – List/array of row indices
idx_col – List/array of column indices
is_symmetric – Set to True when references and queries are identical (as for instance for an all-vs-all comparison). By using the fact that score[i,j] = score[j,i] the calculation will be about 2x faster.
- class matchms.similarity.NeutralLossesCosine(tolerance: float = 0.1, mz_power: float = 0.0, intensity_power: float = 1.0, ignore_peaks_above_precursor: bool = True)[source]¶
Bases:
BaseSimilarity
Calculate ‘neutral losses cosine score’ between mass spectra.
The neutral losses cosine score aims at quantifying the similarity between two mass spectra. The score is calculated by finding best possible matches between peaks of two spectra. Two peaks are considered a potential match if their m/z ratios lie within the given ‘tolerance’ once a mass-shift is applied. The mass shift is the difference in precursor-m/z between the two spectra. In general, ModifiedCosine is recommended over NeutralLossesCosine because it will on average deliver more reliable results.
- __init__(tolerance: float = 0.1, mz_power: float = 0.0, intensity_power: float = 1.0, ignore_peaks_above_precursor: bool = True)[source]¶
- Parameters:
tolerance – Peaks will be considered a match when <= tolerance apart. Default is 0.1.
mz_power – The power to raise mz to in the cosine function. The default is 0, in which case the peak intensity products will not depend on the m/z ratios.
intensity_power – The power to raise intensity to in the cosine function. The default is 1.
ignore_peaks_above_precursor – By default this is set to True, meaning that peaks with m/z values larger than the precursor-m/z will be ignored (since those would correspond to negative “neutral losses”).
- keep_score(score)¶
In the .matrix method scores will be collected in a sparse way. Overwrite this method here if values other than False or 0 should not be stored in the final collection.
- matrix(references: List[Spectrum], queries: List[Spectrum], array_type: str = 'numpy', is_symmetric: bool = False) ndarray ¶
Optional: Provide optimized method to calculate an np.array of similarity scores for given reference and query spectrums. If no method is added here, the following naive implementation (i.e. a double for-loop) is used.
- Parameters:
references – List of reference objects
queries – List of query objects
array_type – Specify the output array type. Can be “numpy” or “sparse”. Default is “numpy” and will return a numpy array. “sparse” will return a COO-sparse array.
is_symmetric – Set to True when references and queries are identical (as for instance for an all-vs-all comparison). By using the fact that score[i,j] = score[j,i] the calculation will be about 2x faster.
- pair(reference: Spectrum, query: Spectrum) Tuple[float, int] [source]¶
Calculate neutral losses cosine score between two spectra.
- Parameters:
reference – Single reference spectrum.
query – Single query spectrum.
- Return type:
Tuple with cosine score and number of matched peaks.
- sparse_array(references: List[Spectrum], queries: List[Spectrum], idx_row, idx_col, is_symmetric: bool = False)¶
Optional: Provide optimized method to calculate an sparse matrix of similarity scores.
Compute similarity scores for pairs of reference and query spectrums as given by the indices idx_row (references) and idx_col (queries). If no method is added here, the following naive implementation (i.e. a for-loop) is used.
- Parameters:
references – List of reference objects
queries – List of query objects
idx_row – List/array of row indices
idx_col – List/array of column indices
is_symmetric – Set to True when references and queries are identical (as for instance for an all-vs-all comparison). By using the fact that score[i,j] = score[j,i] the calculation will be about 2x faster.
- class matchms.similarity.ParentMassMatch(tolerance: float = 0.1)[source]¶
Bases:
BaseSimilarity
Return True if spectrums match in parent mass (within tolerance), and False otherwise.
Example to calculate scores between 2 spectrums and iterate over the scores
import numpy as np from matchms import calculate_scores from matchms import Spectrum from matchms.similarity import ParentMassMatch spectrum_1 = Spectrum(mz=np.array([]), intensities=np.array([]), metadata={"id": "1", "parent_mass": 100}) spectrum_2 = Spectrum(mz=np.array([]), intensities=np.array([]), metadata={"id": "2", "parent_mass": 110}) spectrum_3 = Spectrum(mz=np.array([]), intensities=np.array([]), metadata={"id": "3", "parent_mass": 103}) spectrum_4 = Spectrum(mz=np.array([]), intensities=np.array([]), metadata={"id": "4", "parent_mass": 111}) references = [spectrum_1, spectrum_2] queries = [spectrum_3, spectrum_4] similarity_score = ParentMassMatch(tolerance=5.0) scores = calculate_scores(references, queries, similarity_score) for (reference, query, score) in scores: print(f"Parentmass match between {reference.get('id')} and {query.get('id')}" + f" is {score}")
Should output
Parentmass match between 1 and 3 is [1.0] Parentmass match between 2 and 4 is [1.0]
- __init__(tolerance: float = 0.1)[source]¶
- Parameters:
tolerance – Specify tolerance below which two masses are counted as match.
- keep_score(score)¶
In the .matrix method scores will be collected in a sparse way. Overwrite this method here if values other than False or 0 should not be stored in the final collection.
- matrix(references: List[Spectrum], queries: List[Spectrum], array_type: str = 'numpy', is_symmetric: bool = False) ndarray [source]¶
Compare parent masses between all references and queries.
- Parameters:
references – List/array of reference spectrums.
queries – List/array of Single query spectrums.
array_type – Specify the output array type. Can be “numpy” or “sparse”. Default is “numpy” and will return a numpy array. “sparse” will return a COO-sparse array.
is_symmetric – Set to True when references and queries are identical (as for instance for an all-vs-all comparison). By using the fact that score[i,j] = score[j,i] the calculation will be about 2x faster.
- pair(reference: Spectrum, query: Spectrum) float [source]¶
Compare parent masses between reference and query spectrum.
- Parameters:
reference – Single reference spectrum.
query – Single query spectrum.
- sparse_array(references: List[Spectrum], queries: List[Spectrum], idx_row, idx_col, is_symmetric: bool = False)¶
Optional: Provide optimized method to calculate an sparse matrix of similarity scores.
Compute similarity scores for pairs of reference and query spectrums as given by the indices idx_row (references) and idx_col (queries). If no method is added here, the following naive implementation (i.e. a for-loop) is used.
- Parameters:
references – List of reference objects
queries – List of query objects
idx_row – List/array of row indices
idx_col – List/array of column indices
is_symmetric – Set to True when references and queries are identical (as for instance for an all-vs-all comparison). By using the fact that score[i,j] = score[j,i] the calculation will be about 2x faster.
- class matchms.similarity.PrecursorMzMatch(tolerance: float = 0.1, tolerance_type: str = 'Dalton')[source]¶
Bases:
BaseSimilarity
Return True if spectrums match in precursor m/z (within tolerance), and False otherwise. The match within tolerance can be calculated based on an absolute m/z difference (tolerance_type=”Dalton”) or based on a relative difference in ppm (tolerance_type=”ppm”).
Example to calculate scores between 2 pairs of spectrums and iterate over the scores
import numpy as np from matchms import calculate_scores from matchms import Spectrum from matchms.similarity import PrecursorMzMatch spectrum_1 = Spectrum(mz=np.array([]), intensities=np.array([]), metadata={"id": "1", "precursor_mz": 100}) spectrum_2 = Spectrum(mz=np.array([]), intensities=np.array([]), metadata={"id": "2", "precursor_mz": 110}) spectrum_3 = Spectrum(mz=np.array([]), intensities=np.array([]), metadata={"id": "3", "precursor_mz": 103}) spectrum_4 = Spectrum(mz=np.array([]), intensities=np.array([]), metadata={"id": "4", "precursor_mz": 111}) references = [spectrum_1, spectrum_2] queries = [spectrum_3, spectrum_4] similarity_score = PrecursorMzMatch(tolerance=5.0, tolerance_type="Dalton") scores = calculate_scores(references, queries, similarity_score) for (reference, query, score) in scores: print(f"Precursor m/z match between {reference.get('id')} and {query.get('id')}" + f" is {score}")
Should output
Precursor m/z match between 1 and 3 is [1.0] Precursor m/z match between 2 and 4 is [1.0]
- __init__(tolerance: float = 0.1, tolerance_type: str = 'Dalton')[source]¶
- Parameters:
tolerance – Specify tolerance below which two m/z are counted as match.
tolerance_type – Chose between fixed tolerance in Dalton (=”Dalton”) or a relative difference in ppm (=”ppm”).
- keep_score(score)¶
In the .matrix method scores will be collected in a sparse way. Overwrite this method here if values other than False or 0 should not be stored in the final collection.
- matrix(references: List[Spectrum], queries: List[Spectrum], array_type: str = 'numpy', is_symmetric: bool = False) ndarray [source]¶
Compare parent masses between all references and queries.
- Parameters:
references – List/array of reference spectrums.
queries – List/array of Single query spectrums.
array_type – Specify the output array type. Can be “numpy” or “sparse”. Default is “numpy” and will return a numpy array. “sparse” will return a COO-sparse array.
is_symmetric – Set to True when references and queries are identical (as for instance for an all-vs-all comparison). By using the fact that score[i,j] = score[j,i] the calculation will be about 2x faster.
- pair(reference: Spectrum, query: Spectrum) float [source]¶
Compare precursor m/z between reference and query spectrum.
- Parameters:
reference – Single reference spectrum.
query – Single query spectrum.
- sparse_array(references: List[Spectrum], queries: List[Spectrum], idx_row, idx_col, is_symmetric: bool = False)¶
Optional: Provide optimized method to calculate an sparse matrix of similarity scores.
Compute similarity scores for pairs of reference and query spectrums as given by the indices idx_row (references) and idx_col (queries). If no method is added here, the following naive implementation (i.e. a for-loop) is used.
- Parameters:
references – List of reference objects
queries – List of query objects
idx_row – List/array of row indices
idx_col – List/array of column indices
is_symmetric – Set to True when references and queries are identical (as for instance for an all-vs-all comparison). By using the fact that score[i,j] = score[j,i] the calculation will be about 2x faster.
Submodules¶
- matchms.similarity.BaseSimilarity module
- matchms.similarity.CosineGreedy module
- matchms.similarity.CosineHungarian module
- matchms.similarity.FingerprintSimilarity module
- matchms.similarity.IntersectMz module
- matchms.similarity.MetadataMatch module
- matchms.similarity.ModifiedCosine module
- matchms.similarity.NeutralLossesCosine module
- matchms.similarity.ParentMassMatch module
- matchms.similarity.PrecursorMzMatch module
- matchms.similarity.spectrum_similarity_functions module
- matchms.similarity.vector_similarity_functions module