matchms package¶
- class matchms.Fragments(mz=None, intensities=None)[source]¶
Bases:
object
Stores arrays of intensities and M/z values, with some checks on their internal consistency.
For example
import numpy as np from matchms import Fragments mz = np.array([10, 20, 30], dtype="float") intensities = np.array([100, 20, 300], dtype="float") peaks = Fragments(mz=mz, intensities=intensities) print(peaks[2])
Should output
[ 30. 300.]
- mz¶
Numpy array of m/z values.
- intensities¶
Numpy array of peak intensity values.
- property intensities¶
getter method for intensities private variable
- property mz¶
getter method for mz private variable
- property to_numpy¶
getter method to return stacked numpy array of both peak mz and intensities
- class matchms.Metadata(metadata: Optional[dict] = None, matchms_key_style: bool = True)[source]¶
Bases:
object
Class to handle spectrum metadata in matchms.
Metadata entries will be stored as PickyDict dictionary in metadata.data. Unlike normal Python dictionaries, not all key names will be accepted. Key names will be forced to be lower-case to avoid confusions between key such as “Precursor_MZ” and “precursor_mz”.
To avoid the default harmonization of the metadata dictionary use the option matchms_key_style=False.
Code example:
metadata = Metadata({"Precursor_MZ": 201.5, "Compound Name": "SuperStuff"}) print(metadata["precursor_mz"]) # => 201.5 print(metadata["compound_name"]) # => SuperStuff
Or if the matchms default metadata harmonization should not take place:
metadata = Metadata({"Precursor_MZ": 201.5, "Compound Name": "SuperStuff"}, matchms_key_style=False) print(metadata["precursor_mz"]) # => 201.5 print(metadata["compound_name"]) # => None (now you need to use "compound name")
- __init__(metadata: Optional[dict] = None, matchms_key_style: bool = True)[source]¶
- Parameters
metadata – Spectrum metadata as a dictionary.
matchms_key_style – Set to False if metadata harmonization to default keys is not desired. The default is True.
- class matchms.Scores(references: Union[List[object], Tuple[object], ndarray], queries: Union[List[object], Tuple[object], ndarray], similarity_function: BaseSimilarity, is_symmetric: bool = False)[source]¶
Bases:
object
Contains reference and query spectrums and the scores between them.
The scores can be retrieved as a matrix with the
Scores.scores
attribute. The reference spectrum, query spectrum, score pairs can also be iterated over in query then reference order.Example to calculate scores between 2 spectrums and iterate over the scores
import numpy as np from matchms import calculate_scores from matchms import Spectrum from matchms.similarity import CosineGreedy spectrum_1 = Spectrum(mz=np.array([100, 150, 200.]), intensities=np.array([0.7, 0.2, 0.1]), metadata={'id': 'spectrum1'}) spectrum_2 = Spectrum(mz=np.array([100, 140, 190.]), intensities=np.array([0.4, 0.2, 0.1]), metadata={'id': 'spectrum2'}) spectrum_3 = Spectrum(mz=np.array([110, 140, 195.]), intensities=np.array([0.6, 0.2, 0.1]), metadata={'id': 'spectrum3'}) spectrum_4 = Spectrum(mz=np.array([100, 150, 200.]), intensities=np.array([0.6, 0.1, 0.6]), metadata={'id': 'spectrum4'}) references = [spectrum_1, spectrum_2] queries = [spectrum_3, spectrum_4] similarity_measure = CosineGreedy() scores = calculate_scores(references, queries, similarity_measure) for (reference, query, score) in scores: print(f"Cosine score between {reference.get('id')} and {query.get('id')}" + f" is {score['score']:.2f} with {score['matches']} matched peaks")
Should output
Cosine score between spectrum1 and spectrum3 is 0.00 with 0 matched peaks Cosine score between spectrum1 and spectrum4 is 0.80 with 3 matched peaks Cosine score between spectrum2 and spectrum3 is 0.14 with 1 matched peaks Cosine score between spectrum2 and spectrum4 is 0.61 with 1 matched peaks
- __init__(references: Union[List[object], Tuple[object], ndarray], queries: Union[List[object], Tuple[object], ndarray], similarity_function: BaseSimilarity, is_symmetric: bool = False)[source]¶
- Parameters
references – List of reference objects
queries – List of query objects
similarity_function – Expected input is an object based on
BaseSimilarity
. It is expected to provide a .pair() and .matrix() method for computing similarity scores between references and queries.is_symmetric – Set to True when references and queries are identical (as for instance for an all-vs-all comparison). By using the fact that score[i,j] = score[j,i] the calculation will be about 2x faster. Default is False.
- calculate() Scores [source]¶
Calculate the similarity between all reference objects v all query objects using the most suitable available implementation of the given similarity_function. Advised method to calculate similarity scores is
calculate_scores()
.Deprecated since version 0.6.0: Calculate scores via calculate_scores() function.
- property scores: ndarray¶
Scores as numpy array
For example
import numpy as np from matchms import calculate_scores, Scores, Spectrum from matchms.similarity import IntersectMz spectrum_1 = Spectrum(mz=np.array([100, 150, 200.]), intensities=np.array([0.7, 0.2, 0.1])) spectrum_2 = Spectrum(mz=np.array([100, 140, 190.]), intensities=np.array([0.4, 0.2, 0.1])) spectrums = [spectrum_1, spectrum_2] scores = calculate_scores(spectrums, spectrums, IntersectMz()).scores print(scores[0].dtype) print(scores.shape) print(scores)
Should output
float64 (2, 2) [[1. 0.2] [0.2 1. ]]
- scores_by_query(query: Union[List[object], Tuple[object], ndarray], sort: bool = False) ndarray [source]¶
Return all scores for the given query spectrum.
For example
import numpy as np from matchms import calculate_scores, Scores, Spectrum from matchms.similarity import CosineGreedy spectrum_1 = Spectrum(mz=np.array([100, 150, 200.]), intensities=np.array([0.7, 0.2, 0.1]), metadata={'id': 'spectrum1'}) spectrum_2 = Spectrum(mz=np.array([100, 140, 190.]), intensities=np.array([0.4, 0.2, 0.1]), metadata={'id': 'spectrum2'}) spectrum_3 = Spectrum(mz=np.array([110, 140, 195.]), intensities=np.array([0.6, 0.2, 0.1]), metadata={'id': 'spectrum3'}) spectrum_4 = Spectrum(mz=np.array([100, 150, 200.]), intensities=np.array([0.6, 0.1, 0.6]), metadata={'id': 'spectrum4'}) references = [spectrum_1, spectrum_2, spectrum_3] queries = [spectrum_2, spectrum_3, spectrum_4] scores = calculate_scores(references, queries, CosineGreedy()) selected_scores = scores.scores_by_query(spectrum_4, sort=True) print([x[1]["score"].round(3) for x in selected_scores])
Should output
[0.796, 0.613, 0.0]
- Parameters
query – Single query Spectrum.
sort – Set to True to obtain the scores in a sorted way (relying on the
sort()
function from the given similarity_function).
- scores_by_reference(reference: Union[List[object], Tuple[object], ndarray], sort: bool = False) ndarray [source]¶
Return all scores for the given reference spectrum.
- Parameters
reference – Single reference Spectrum.
sort – Set to True to obtain the scores in a sorted way (relying on the
sort()
function from the given similarity_function).
- class matchms.Spectrum(mz: array, intensities: array, metadata: Optional[dict] = None, metadata_harmonization: bool = True)[source]¶
Bases:
object
Container for a collection of peaks, losses and metadata.
Spectrum peaks are stored as
Fragments
object which can be addressed calling spectrum.peaks and contains m/z values and the respective peak intensities.Spectrum metadata is stored as
Metadata
object which can be addressed by spectrum.metadata.Code example
import numpy as np from matchms import Scores, Spectrum from matchms.similarity import CosineGreedy spectrum = Spectrum(mz=np.array([100, 150, 200.]), intensities=np.array([0.7, 0.2, 0.1]), metadata={'id': 'spectrum1', "peak_comments": {200.: "the peak at 200 m/z"}}) print(spectrum.peaks.mz[0]) print(spectrum.peaks.intensities[0]) print(spectrum.get('id')) print(spectrum.peak_comments.get(200))
Should output
100.0 0.7 spectrum1 the peak at 200 m/z
- losses¶
Losses of spectrum, the difference between the precursor and all peaks.
Can be filled with
from matchms import Fragments spectrum.losess = Fragments(mz=np.array([50.]), intensities=np.array([0.1]))
- Type
Fragments or None
- __init__(mz: array, intensities: array, metadata: Optional[dict] = None, metadata_harmonization: bool = True)[source]¶
- Parameters
mz – Array of m/z for the peaks
intensities – Array of intensities for the peaks
metadata – Dictionary with for example the scan number of precursor m/z.
metadata_harmonization (bool, optional) – Set to False if default metadata filters should not be applied. The default is True.
- get(key: str, default=None)[source]¶
Retrieve value from
metadata
dict. Shorthand forval = self.metadata[key]
- metadata_hash()[source]¶
Return a (truncated) sha256-based hash which is generated based on the spectrum metadata. Spectra with same metadata results in same metadata_hash.
- plot(figsize=(8, 6), dpi=200, **kwargs)[source]¶
Plot to visually inspect a spectrum run
spectrum.plot()
Example of a spectrum plotted using
spectrum.plot()
..¶
- matchms.calculate_scores(references: Union[List[object], Tuple[object], ndarray], queries: Union[List[object], Tuple[object], ndarray], similarity_function: BaseSimilarity, is_symmetric: bool = False) Scores [source]¶
Calculate the similarity between all reference objects versus all query objects.
Example to calculate scores between 2 spectrums and iterate over the scores
import numpy as np from matchms import calculate_scores, Spectrum from matchms.similarity import CosineGreedy spectrum_1 = Spectrum(mz=np.array([100, 150, 200.]), intensities=np.array([0.7, 0.2, 0.1]), metadata={'id': 'spectrum1'}) spectrum_2 = Spectrum(mz=np.array([100, 140, 190.]), intensities=np.array([0.4, 0.2, 0.1]), metadata={'id': 'spectrum2'}) spectrums = [spectrum_1, spectrum_2] scores = calculate_scores(spectrums, spectrums, CosineGreedy()) for (reference, query, score) in scores: print(f"Cosine score between {reference.get('id')} and {query.get('id')}" + f" is {score['score']:.2f} with {score['matches']} matched peaks")
Should output
Cosine score between spectrum1 and spectrum1 is 1.00 with 3 matched peaks Cosine score between spectrum1 and spectrum2 is 0.83 with 1 matched peaks Cosine score between spectrum2 and spectrum1 is 0.83 with 1 matched peaks Cosine score between spectrum2 and spectrum2 is 1.00 with 3 matched peaks
- Parameters
references – List of reference objects
queries – List of query objects
similarity_function – Function which accepts a reference + query object and returns a score or tuple of scores
is_symmetric – Set to True when references and queries are identical (as for instance for an all-vs-all comparison). By using the fact that score[i,j] = score[j,i] the calculation will be about 2x faster. Default is False.
- Return type
- matchms.set_matchms_logger_level(loglevel: str, logger_name='matchms')[source]¶
Update logging level to given loglevel.
- Parameters
loglevels – Can be ‘DEBUG’, ‘INFO’, ‘WARNING’, ‘ERROR’, ‘CRITICAL’.
logger_name – Default is “matchms”. Change if logger name should be different.
Subpackages¶
- matchms.exporting package
- matchms.filtering package
- Functions for processing mass spectra
- Submodules
- matchms.filtering.SpeciesString module
- matchms.filtering.add_compound_name module
- matchms.filtering.add_fingerprint module
- matchms.filtering.add_losses module
- matchms.filtering.add_parent_mass module
- matchms.filtering.add_precursor_mz module
- matchms.filtering.add_retention module
- matchms.filtering.clean_compound_name module
- matchms.filtering.correct_charge module
- matchms.filtering.default_filters module
- matchms.filtering.derive_adduct_from_name module
- matchms.filtering.derive_formula_from_name module
- matchms.filtering.derive_inchi_from_smiles module
- matchms.filtering.derive_inchikey_from_inchi module
- matchms.filtering.derive_ionmode module
- matchms.filtering.derive_smiles_from_inchi module
- matchms.filtering.harmonize_undefined_inchi module
- matchms.filtering.harmonize_undefined_inchikey module
- matchms.filtering.harmonize_undefined_smiles module
- matchms.filtering.interpret_pepmass module
- matchms.filtering.load_adducts module
- matchms.filtering.make_charge_int module
- matchms.filtering.make_charge_scalar module
- matchms.filtering.make_ionmode_lowercase module
- matchms.filtering.normalize_intensities module
- matchms.filtering.reduce_to_number_of_peaks module
- matchms.filtering.remove_peaks_around_precursor_mz module
- matchms.filtering.remove_peaks_outside_top_k module
- matchms.filtering.repair_inchi_inchikey_smiles module
- matchms.filtering.require_minimum_number_of_peaks module
- matchms.filtering.require_minimum_of_high_peaks module
- matchms.filtering.require_precursor_below_mz module
- matchms.filtering.require_precursor_mz module
- matchms.filtering.select_by_intensity module
- matchms.filtering.select_by_mz module
- matchms.filtering.select_by_relative_intensity module
- matchms.filtering.set_ionmode_na_when_missing module
- matchms.importing package
- matchms.networking package
- matchms.plotting package
- matchms.similarity package
- Functions for computing spectra similarities
- Submodules
- matchms.similarity.BaseSimilarity module
- matchms.similarity.CosineGreedy module
- matchms.similarity.CosineHungarian module
- matchms.similarity.FingerprintSimilarity module
- matchms.similarity.IntersectMz module
- matchms.similarity.MetadataMatch module
- matchms.similarity.ModifiedCosine module
- matchms.similarity.NeutralLossesCosine module
- matchms.similarity.ParentMassMatch module
- matchms.similarity.PrecursorMzMatch module
- matchms.similarity.spectrum_similarity_functions module
- matchms.similarity.vector_similarity_functions module
Submodules¶
- matchms.Fragments module
- matchms.Metadata module
- matchms.Scores module
- matchms.Spectrum module
- matchms.Spikes module
- matchms.calculate_scores module
- matchms.constants module
- matchms.hashing module
- matchms.logging_functions module
- matchms.metadata_utils module
- matchms.typing module
- matchms.utils module