matchms package

class matchms.Scores(references: Union[List[object], Tuple[object], numpy.ndarray], queries: Union[List[object], Tuple[object], numpy.ndarray], similarity_function: matchms.similarity.BaseSimilarity.BaseSimilarity, is_symmetric: bool = False)[source]

Bases: object

Contains reference and query spectrums and the scores between them.

The scores can be retrieved as a matrix with the Scores.scores attribute. The reference spectrum, query spectrum, score pairs can also be iterated over in query then reference order.

Example to calculate scores between 2 spectrums and iterate over the scores

import numpy as np
from matchms import calculate_scores
from matchms import Spectrum
from matchms.similarity import CosineGreedy

spectrum_1 = Spectrum(mz=np.array([100, 150, 200.]),
                      intensities=np.array([0.7, 0.2, 0.1]),
                      metadata={'id': 'spectrum1'})
spectrum_2 = Spectrum(mz=np.array([100, 140, 190.]),
                      intensities=np.array([0.4, 0.2, 0.1]),
                      metadata={'id': 'spectrum2'})
spectrum_3 = Spectrum(mz=np.array([110, 140, 195.]),
                      intensities=np.array([0.6, 0.2, 0.1]),
                      metadata={'id': 'spectrum3'})
spectrum_4 = Spectrum(mz=np.array([100, 150, 200.]),
                      intensities=np.array([0.6, 0.1, 0.6]),
                      metadata={'id': 'spectrum4'})
references = [spectrum_1, spectrum_2]
queries = [spectrum_3, spectrum_4]

similarity_measure = CosineGreedy()
scores = calculate_scores(references, queries, similarity_measure)

for (reference, query, score) in scores:
    print(f"Cosine score between {reference.get('id')} and {query.get('id')}" +
          f" is {score['score']:.2f} with {score['matches']} matched peaks")

Should output

Cosine score between spectrum1 and spectrum3 is 0.00 with 0 matched peaks
Cosine score between spectrum1 and spectrum4 is 0.80 with 3 matched peaks
Cosine score between spectrum2 and spectrum3 is 0.14 with 1 matched peaks
Cosine score between spectrum2 and spectrum4 is 0.61 with 1 matched peaks
__init__(references: Union[List[object], Tuple[object], numpy.ndarray], queries: Union[List[object], Tuple[object], numpy.ndarray], similarity_function: matchms.similarity.BaseSimilarity.BaseSimilarity, is_symmetric: bool = False)[source]
Parameters
  • references – List of reference objects

  • queries – List of query objects

  • similarity_function – Expected input is an object based on BaseSimilarity. It is expected to provide a .pair() and .matrix() method for computing similarity scores between references and queries.

  • is_symmetric – Set to True when references and queries are identical (as for instance for an all-vs-all comparison). By using the fact that score[i,j] = score[j,i] the calculation will be about 2x faster. Default is False.

calculate()matchms.Scores.Scores[source]

Calculate the similarity between all reference objects v all query objects using the most suitable available implementation of the given similarity_function. Advised method to calculate similarity scores is calculate_scores().

Deprecated since version 0.6.0: Calculate scores via calculate_scores() function.

property scores

Scores as numpy array

For example

import numpy as np
from matchms import calculate_scores, Scores, Spectrum
from matchms.similarity import IntersectMz

spectrum_1 = Spectrum(mz=np.array([100, 150, 200.]),
                      intensities=np.array([0.7, 0.2, 0.1]))
spectrum_2 = Spectrum(mz=np.array([100, 140, 190.]),
                      intensities=np.array([0.4, 0.2, 0.1]))
spectrums = [spectrum_1, spectrum_2]

scores = calculate_scores(spectrums, spectrums, IntersectMz()).scores

print(scores[0].dtype)
print(scores.shape)
print(scores)

Should output

float64
(2, 2)
[[1.  0.2]
 [0.2 1. ]]
scores_by_query(query: Union[List[object], Tuple[object], numpy.ndarray], sort: bool = False)numpy.ndarray[source]

Return all scores for the given query spectrum.

For example

import numpy as np
from matchms import calculate_scores, Scores, Spectrum
from matchms.similarity import CosineGreedy

spectrum_1 = Spectrum(mz=np.array([100, 150, 200.]),
                      intensities=np.array([0.7, 0.2, 0.1]),
                      metadata={'id': 'spectrum1'})
spectrum_2 = Spectrum(mz=np.array([100, 140, 190.]),
                      intensities=np.array([0.4, 0.2, 0.1]),
                      metadata={'id': 'spectrum2'})
spectrum_3 = Spectrum(mz=np.array([110, 140, 195.]),
                      intensities=np.array([0.6, 0.2, 0.1]),
                      metadata={'id': 'spectrum3'})
spectrum_4 = Spectrum(mz=np.array([100, 150, 200.]),
                      intensities=np.array([0.6, 0.1, 0.6]),
                      metadata={'id': 'spectrum4'})
references = [spectrum_1, spectrum_2, spectrum_3]
queries = [spectrum_2, spectrum_3, spectrum_4]

scores = calculate_scores(references, queries, CosineGreedy())
selected_scores = scores.scores_by_query(spectrum_4, sort=True)
print([x[1]["score"].round(3) for x in selected_scores])

Should output

[0.796, 0.613, 0.0]
Parameters
  • query – Single query Spectrum.

  • sort – Set to True to obtain the scores in a sorted way (relying on the sort() function from the given similarity_function).

scores_by_reference(reference: Union[List[object], Tuple[object], numpy.ndarray], sort: bool = False)numpy.ndarray[source]

Return all scores for the given reference spectrum.

Parameters
  • reference – Single reference Spectrum.

  • sort – Set to True to obtain the scores in a sorted way (relying on the sort() function from the given similarity_function).

class matchms.Spectrum(mz: numpy.array, intensities: numpy.array, metadata: Optional[dict] = None)[source]

Bases: object

Container for a collection of peaks, losses and metadata

For example

import numpy as np
from matchms import Scores, Spectrum
from matchms.similarity import CosineGreedy

spectrum = Spectrum(mz=np.array([100, 150, 200.]),
                      intensities=np.array([0.7, 0.2, 0.1]),
                      metadata={'id': 'spectrum1',
                                "peak_comments": {200.: "the peak at 200 m/z"}})

print(spectrum.peaks.mz[0])
print(spectrum.peaks.intensities[0])
print(spectrum.get('id'))
print(spectrum.peak_comments.get(200))

Should output

100.0
0.7
spectrum1
the peak at 200 m/z
peaks

Peaks of spectrum

Type

Spikes

losses

Losses of spectrum, the difference between the precursor and all peaks.

Can be filled with

from matchms import Spikes
spectrum.losess = Spikes(mz=np.array([50.]), intensities=np.array([0.1]))
Type

Spikes or None

metadata

Dict of metadata with for example the scan number of precursor m/z.

Type

dict

__init__(mz: numpy.array, intensities: numpy.array, metadata: Optional[dict] = None)[source]
Parameters
  • mz – Array of m/z for the peaks

  • intensities – Array of intensities for the peaks

  • metadata – Dictionary with for example the scan number of precursor m/z.

clone()[source]

Return a deepcopy of the spectrum instance.

get(key: str, default=None)[source]

Retrieve value from metadata dict. Shorthand for

val = self.metadata[key]
metadata_hash()[source]

Return a (truncated) sha256-based hash which is generated based on the spectrum metadata. Spectra with same metadata results in same metadata_hash.

plot(figsize=(8, 6), dpi=200, **kwargs)[source]

Plot to visually inspect a spectrum run spectrum.plot()

spectrum plotting function

Example of a spectrum plotted using spectrum.plot() ..

plot_against(other_spectrum, figsize=(8, 6), dpi=200, **spectrum_kws)[source]

Compare two spectra visually in a mirror plot.

To visually compare the peaks of two spectra run spectrum.plot_against(other_spectrum)

spectrum mirror plot function

Example of a mirror plot comparing two spectra spectrum.plot_against() ..

set(key: str, value)[source]

Set value in metadata dict. Shorthand for

self.metadata[key] = val
spectrum_hash()[source]

Return a (truncated) sha256-based hash which is generated based on the spectrum peaks (mz:intensity pairs). Spectra with same peaks will results in same spectrum_hash.

class matchms.Spikes(mz=None, intensities=None)[source]

Bases: object

Stores arrays of intensities and M/z values, with some checks on their internal consistency.

For example

import numpy as np
from matchms import Spikes

mz = np.array([10, 20, 30], dtype="float")
intensities = np.array([100, 20, 300], dtype="float")

peaks = Spikes(mz=mz, intensities=intensities)
print(peaks[2])

Should output

[ 30. 300.]
mz

Numpy array of m/z values.

intensities

Numpy array of peak intensity values.

__init__(mz=None, intensities=None)[source]

Initialize self. See help(type(self)) for accurate signature.

property intensities

getter method for intensities private variable

property mz

getter method for mz private variable

property to_numpy

getter method to return stacked numpy array of both peak mz and intensities

matchms.calculate_scores(references: Union[List[object], Tuple[object], numpy.ndarray], queries: Union[List[object], Tuple[object], numpy.ndarray], similarity_function: matchms.similarity.BaseSimilarity.BaseSimilarity, is_symmetric: bool = False)matchms.Scores.Scores[source]

Calculate the similarity between all reference objects versus all query objects.

Example to calculate scores between 2 spectrums and iterate over the scores

import numpy as np
from matchms import calculate_scores, Spectrum
from matchms.similarity import CosineGreedy

spectrum_1 = Spectrum(mz=np.array([100, 150, 200.]),
                      intensities=np.array([0.7, 0.2, 0.1]),
                      metadata={'id': 'spectrum1'})
spectrum_2 = Spectrum(mz=np.array([100, 140, 190.]),
                      intensities=np.array([0.4, 0.2, 0.1]),
                      metadata={'id': 'spectrum2'})
spectrums = [spectrum_1, spectrum_2]

scores = calculate_scores(spectrums, spectrums, CosineGreedy())

for (reference, query, score) in scores:
    print(f"Cosine score between {reference.get('id')} and {query.get('id')}" +
          f" is {score['score']:.2f} with {score['matches']} matched peaks")

Should output

Cosine score between spectrum1 and spectrum1 is 1.00 with 3 matched peaks
Cosine score between spectrum1 and spectrum2 is 0.83 with 1 matched peaks
Cosine score between spectrum2 and spectrum1 is 0.83 with 1 matched peaks
Cosine score between spectrum2 and spectrum2 is 1.00 with 3 matched peaks
Parameters
  • references – List of reference objects

  • queries – List of query objects

  • similarity_function – Function which accepts a reference + query object and returns a score or tuple of scores

  • is_symmetric – Set to True when references and queries are identical (as for instance for an all-vs-all comparison). By using the fact that score[i,j] = score[j,i] the calculation will be about 2x faster. Default is False.

Returns

Return type

Scores

matchms.set_matchms_logger_level(loglevel: str, logger_name='matchms')[source]

Update logging level to given loglevel.

Parameters
  • loglevels – Can be ‘DEBUG’, ‘INFO’, ‘WARNING’, ‘ERROR’, ‘CRITICAL’.

  • logger_name – Default is “matchms”. Change if logger name should be different.

Subpackages