matchms package

class matchms.Fragments(mz=None, intensities=None)[source]

Bases: object

Stores arrays of intensities and M/z values, with some checks on their internal consistency.

For example

import numpy as np
from matchms import Fragments

mz = np.array([10, 20, 30], dtype="float")
intensities = np.array([100, 20, 300], dtype="float")

peaks = Fragments(mz=mz, intensities=intensities)
print(peaks[2])

Should output

[ 30. 300.]
mz

Numpy array of m/z values.

intensities

Numpy array of peak intensity values.

__init__(mz=None, intensities=None)[source]
property intensities

getter method for intensities private variable

property mz

getter method for mz private variable

property to_numpy

getter method to return stacked numpy array of both peak mz and intensities

class matchms.Metadata(metadata: Optional[dict] = None, matchms_key_style: bool = True)[source]

Bases: object

Class to handle spectrum metadata in matchms.

Metadata entries will be stored as PickyDict dictionary in metadata.data. Unlike normal Python dictionaries, not all key names will be accepted. Key names will be forced to be lower-case to avoid confusions between key such as “Precursor_MZ” and “precursor_mz”.

To avoid the default harmonization of the metadata dictionary use the option matchms_key_style=False.

Code example:

metadata = Metadata({"Precursor_MZ": 201.5, "Compound Name": "SuperStuff"})
print(metadata["precursor_mz"])  # => 201.5
print(metadata["compound_name"])  # => SuperStuff

Or if the matchms default metadata harmonization should not take place:

metadata = Metadata({"Precursor_MZ": 201.5, "Compound Name": "SuperStuff"},
                    matchms_key_style=False)
print(metadata["precursor_mz"])  # => 201.5
print(metadata["compound_name"])  # => None (now you need to use "compound name")
__init__(metadata: Optional[dict] = None, matchms_key_style: bool = True)[source]
Parameters
  • metadata – Spectrum metadata as a dictionary.

  • matchms_key_style – Set to False if metadata harmonization to default keys is not desired. The default is True.

get(key: str, default=None)[source]

Retrieve value from metadata dict.

harmonize_metadata()[source]

Runs default harmonization of metadata.

Method harmonized metadata field names which includes setting them to lower-case and runing a series of regex replacements followed by default field name replacements (such as precursor_mass –> precursor_mz).

items()[source]

Retrieve all items (key, value pairs) of metadata dict.

keys()[source]

Retrieve all keys of metadata dict.

set(key: str, value)[source]

Set value in metadata dict.

values()[source]

Retrieve all values of metadata dict.

class matchms.Scores(references: Union[List[object], Tuple[object], ndarray], queries: Union[List[object], Tuple[object], ndarray], similarity_function: BaseSimilarity, is_symmetric: bool = False)[source]

Bases: object

Contains reference and query spectrums and the scores between them.

The scores can be retrieved as a matrix with the Scores.scores attribute. The reference spectrum, query spectrum, score pairs can also be iterated over in query then reference order.

Example to calculate scores between 2 spectrums and iterate over the scores

import numpy as np
from matchms import calculate_scores
from matchms import Spectrum
from matchms.similarity import CosineGreedy

spectrum_1 = Spectrum(mz=np.array([100, 150, 200.]),
                      intensities=np.array([0.7, 0.2, 0.1]),
                      metadata={'id': 'spectrum1'})
spectrum_2 = Spectrum(mz=np.array([100, 140, 190.]),
                      intensities=np.array([0.4, 0.2, 0.1]),
                      metadata={'id': 'spectrum2'})
spectrum_3 = Spectrum(mz=np.array([110, 140, 195.]),
                      intensities=np.array([0.6, 0.2, 0.1]),
                      metadata={'id': 'spectrum3'})
spectrum_4 = Spectrum(mz=np.array([100, 150, 200.]),
                      intensities=np.array([0.6, 0.1, 0.6]),
                      metadata={'id': 'spectrum4'})
references = [spectrum_1, spectrum_2]
queries = [spectrum_3, spectrum_4]

similarity_measure = CosineGreedy()
scores = calculate_scores(references, queries, similarity_measure)

for (reference, query, score) in scores:
    print(f"Cosine score between {reference.get('id')} and {query.get('id')}" +
          f" is {score['score']:.2f} with {score['matches']} matched peaks")

Should output

Cosine score between spectrum1 and spectrum3 is 0.00 with 0 matched peaks
Cosine score between spectrum1 and spectrum4 is 0.80 with 3 matched peaks
Cosine score between spectrum2 and spectrum3 is 0.14 with 1 matched peaks
Cosine score between spectrum2 and spectrum4 is 0.61 with 1 matched peaks
__init__(references: Union[List[object], Tuple[object], ndarray], queries: Union[List[object], Tuple[object], ndarray], similarity_function: BaseSimilarity, is_symmetric: bool = False)[source]
Parameters
  • references – List of reference objects

  • queries – List of query objects

  • similarity_function – Expected input is an object based on BaseSimilarity. It is expected to provide a .pair() and .matrix() method for computing similarity scores between references and queries.

  • is_symmetric – Set to True when references and queries are identical (as for instance for an all-vs-all comparison). By using the fact that score[i,j] = score[j,i] the calculation will be about 2x faster. Default is False.

calculate() Scores[source]

Calculate the similarity between all reference objects v all query objects using the most suitable available implementation of the given similarity_function. Advised method to calculate similarity scores is calculate_scores().

Deprecated since version 0.6.0: Calculate scores via calculate_scores() function.

property scores: ndarray

Scores as numpy array

For example

import numpy as np
from matchms import calculate_scores, Scores, Spectrum
from matchms.similarity import IntersectMz

spectrum_1 = Spectrum(mz=np.array([100, 150, 200.]),
                      intensities=np.array([0.7, 0.2, 0.1]))
spectrum_2 = Spectrum(mz=np.array([100, 140, 190.]),
                      intensities=np.array([0.4, 0.2, 0.1]))
spectrums = [spectrum_1, spectrum_2]

scores = calculate_scores(spectrums, spectrums, IntersectMz()).scores

print(scores[0].dtype)
print(scores.shape)
print(scores)

Should output

float64
(2, 2)
[[1.  0.2]
 [0.2 1. ]]
scores_by_query(query: Union[List[object], Tuple[object], ndarray], sort: bool = False) ndarray[source]

Return all scores for the given query spectrum.

For example

import numpy as np
from matchms import calculate_scores, Scores, Spectrum
from matchms.similarity import CosineGreedy

spectrum_1 = Spectrum(mz=np.array([100, 150, 200.]),
                      intensities=np.array([0.7, 0.2, 0.1]),
                      metadata={'id': 'spectrum1'})
spectrum_2 = Spectrum(mz=np.array([100, 140, 190.]),
                      intensities=np.array([0.4, 0.2, 0.1]),
                      metadata={'id': 'spectrum2'})
spectrum_3 = Spectrum(mz=np.array([110, 140, 195.]),
                      intensities=np.array([0.6, 0.2, 0.1]),
                      metadata={'id': 'spectrum3'})
spectrum_4 = Spectrum(mz=np.array([100, 150, 200.]),
                      intensities=np.array([0.6, 0.1, 0.6]),
                      metadata={'id': 'spectrum4'})
references = [spectrum_1, spectrum_2, spectrum_3]
queries = [spectrum_2, spectrum_3, spectrum_4]

scores = calculate_scores(references, queries, CosineGreedy())
selected_scores = scores.scores_by_query(spectrum_4, sort=True)
print([x[1]["score"].round(3) for x in selected_scores])

Should output

[0.796, 0.613, 0.0]
Parameters
  • query – Single query Spectrum.

  • sort – Set to True to obtain the scores in a sorted way (relying on the sort() function from the given similarity_function).

scores_by_reference(reference: Union[List[object], Tuple[object], ndarray], sort: bool = False) ndarray[source]

Return all scores for the given reference spectrum.

Parameters
  • reference – Single reference Spectrum.

  • sort – Set to True to obtain the scores in a sorted way (relying on the sort() function from the given similarity_function).

class matchms.Spectrum(mz: array, intensities: array, metadata: Optional[dict] = None, metadata_harmonization: bool = True)[source]

Bases: object

Container for a collection of peaks, losses and metadata.

Spectrum peaks are stored as Fragments object which can be addressed calling spectrum.peaks and contains m/z values and the respective peak intensities.

Spectrum metadata is stored as Metadata object which can be addressed by spectrum.metadata.

Code example

import numpy as np
from matchms import Scores, Spectrum
from matchms.similarity import CosineGreedy

spectrum = Spectrum(mz=np.array([100, 150, 200.]),
                    intensities=np.array([0.7, 0.2, 0.1]),
                    metadata={'id': 'spectrum1',
                              "peak_comments": {200.: "the peak at 200 m/z"}})

print(spectrum.peaks.mz[0])
print(spectrum.peaks.intensities[0])
print(spectrum.get('id'))
print(spectrum.peak_comments.get(200))

Should output

100.0
0.7
spectrum1
the peak at 200 m/z
peaks

Peaks of spectrum

Type

Fragments

losses

Losses of spectrum, the difference between the precursor and all peaks.

Can be filled with

from matchms import Fragments
spectrum.losess = Fragments(mz=np.array([50.]), intensities=np.array([0.1]))
Type

Fragments or None

metadata

Dict of metadata with for example the scan number of precursor m/z.

Type

dict

__init__(mz: array, intensities: array, metadata: Optional[dict] = None, metadata_harmonization: bool = True)[source]
Parameters
  • mz – Array of m/z for the peaks

  • intensities – Array of intensities for the peaks

  • metadata – Dictionary with for example the scan number of precursor m/z.

  • metadata_harmonization (bool, optional) – Set to False if default metadata filters should not be applied. The default is True.

clone()[source]

Return a deepcopy of the spectrum instance.

get(key: str, default=None)[source]

Retrieve value from metadata dict. Shorthand for

val = self.metadata[key]
metadata_hash()[source]

Return a (truncated) sha256-based hash which is generated based on the spectrum metadata. Spectra with same metadata results in same metadata_hash.

plot(figsize=(8, 6), dpi=200, **kwargs)[source]

Plot to visually inspect a spectrum run spectrum.plot()

spectrum plotting function

Example of a spectrum plotted using spectrum.plot() ..

plot_against(other_spectrum, figsize=(8, 6), dpi=200, **spectrum_kws)[source]

Compare two spectra visually in a mirror plot.

To visually compare the peaks of two spectra run spectrum.plot_against(other_spectrum)

spectrum mirror plot function

Example of a mirror plot comparing two spectra spectrum.plot_against() ..

set(key: str, value)[source]

Set value in metadata dict. Shorthand for

self.metadata[key] = val
spectrum_hash()[source]

Return a (truncated) sha256-based hash which is generated based on the spectrum peaks (mz:intensity pairs). Spectra with same peaks will results in same spectrum_hash.

matchms.calculate_scores(references: Union[List[object], Tuple[object], ndarray], queries: Union[List[object], Tuple[object], ndarray], similarity_function: BaseSimilarity, is_symmetric: bool = False) Scores[source]

Calculate the similarity between all reference objects versus all query objects.

Example to calculate scores between 2 spectrums and iterate over the scores

import numpy as np
from matchms import calculate_scores, Spectrum
from matchms.similarity import CosineGreedy

spectrum_1 = Spectrum(mz=np.array([100, 150, 200.]),
                      intensities=np.array([0.7, 0.2, 0.1]),
                      metadata={'id': 'spectrum1'})
spectrum_2 = Spectrum(mz=np.array([100, 140, 190.]),
                      intensities=np.array([0.4, 0.2, 0.1]),
                      metadata={'id': 'spectrum2'})
spectrums = [spectrum_1, spectrum_2]

scores = calculate_scores(spectrums, spectrums, CosineGreedy())

for (reference, query, score) in scores:
    print(f"Cosine score between {reference.get('id')} and {query.get('id')}" +
          f" is {score['score']:.2f} with {score['matches']} matched peaks")

Should output

Cosine score between spectrum1 and spectrum1 is 1.00 with 3 matched peaks
Cosine score between spectrum1 and spectrum2 is 0.83 with 1 matched peaks
Cosine score between spectrum2 and spectrum1 is 0.83 with 1 matched peaks
Cosine score between spectrum2 and spectrum2 is 1.00 with 3 matched peaks
Parameters
  • references – List of reference objects

  • queries – List of query objects

  • similarity_function – Function which accepts a reference + query object and returns a score or tuple of scores

  • is_symmetric – Set to True when references and queries are identical (as for instance for an all-vs-all comparison). By using the fact that score[i,j] = score[j,i] the calculation will be about 2x faster. Default is False.

Return type

Scores

matchms.set_matchms_logger_level(loglevel: str, logger_name='matchms')[source]

Update logging level to given loglevel.

Parameters
  • loglevels – Can be ‘DEBUG’, ‘INFO’, ‘WARNING’, ‘ERROR’, ‘CRITICAL’.

  • logger_name – Default is “matchms”. Change if logger name should be different.

Subpackages

Submodules