matchms.networking package

Functions for creating and analysing spectral networks

class matchms.networking.SimilarityNetwork(identifier_key: str = 'spectrum_id', top_n: int = 20, max_links: int = 10, score_cutoff: float = 0.7, link_method: str = 'single', keep_unconnected_nodes: bool = True)[source]

Bases: object

Create a spectral network from spectrum similarities.

For example

import numpy as np
from matchms import Spectrum, calculate_scores
from matchms.similarity import ModifiedCosineGreedy
from matchms.networking import SimilarityNetwork

spectrum_1 = Spectrum(mz=np.array([100, 150, 200.]),
                      intensities=np.array([0.7, 0.2, 0.1]),
                      metadata={"precursor_mz": 100.0,
                                "test_id": "one"})
spectrum_2 = Spectrum(mz=np.array([104.9, 140, 190.]),
                      intensities=np.array([0.4, 0.2, 0.1]),
                      metadata={"precursor_mz": 105.0,
                                "test_id": "two"})

# Use factory to construct a similarity function
modified_cosine = ModifiedCosineGreedy(tolerance=0.2)
spectra = [spectrum_1, spectrum_2]
scores = calculate_scores(spectra, spectra, modified_cosine)
ms_network = SimilarityNetwork(identifier_key="test_id")
ms_network.create_network(scores, score_name="ModifiedCosineGreedy_score")

nodes = list(ms_network.graph.nodes())
nodes.sort()
print(nodes)

Should output

['one', 'two']
__init__(identifier_key: str = 'spectrum_id', top_n: int = 20, max_links: int = 10, score_cutoff: float = 0.7, link_method: str = 'single', keep_unconnected_nodes: bool = True)[source]
Parameters:
  • identifier_key – Metadata key for unique identifier for each spectrum in scores. Will also be used for the naming the network nodes. Default is ‘spectrum_id’.

  • top_n – Consider edge between spectrumA and spectrumB if score falls into top_n for spectrumA or spectrumB (link_method=”single”), or into top_n for spectrumA and spectrumB (link_method=”mutual”). From those potential links, only max_links will be kept, so top_n must be >= max_links.

  • max_links – Maximum number of links to add per node. Default = 10. Due to incoming links, total number of links per node can be higher. The links are populated by looping over the query spectra. Important side note: The max_links restriction is strict which means that if scores around max_links are equal still only max_links will be added which can results in some random variations (sorting spectra with equal scores results in a random order of such elements).

  • score_cutoff – Threshold for given similarities. Edges/Links will only be made for similarities > score_cutoff. Default = 0.7.

  • link_method – Chose between ‘single’ and ‘mutual’. ‘single will add all links based on individual nodes. ‘mutual’ will only add links if that link appears in the given top-n list for both nodes.

  • keep_unconnected_nodes – If set to True (default) all spectra will be included as nodes even if they have no connections/edges of other spectra. If set to False all nodes without connections will be removed.

create_network(scores: <module 'matchms.Scores' from '/home/docs/checkouts/readthedocs.org/user_builds/matchms/checkouts/stable/matchms/Scores.py'>, score_name: str = None)[source]

Function to create network from given top-n similarity values. Expects that similarities given in scores are from an all-vs-all comparison including all possible pairs.

Parameters:

scores – Matchms Scores object containing all spectra and pair similarities for generating a network.

export_to_file(filename: str, graph_format: str = 'graphml')[source]

Save the network to a file with chosen format.

Parameters:
  • filename – Path to file to write to.

  • graph_format – Format, in which to store the network. Supported formats are: “cyjs”, “gexf”, “gml”, “graphml”, “json”. Default is “graphml”.

export_to_graphml(filename: str)[source]

Save the network as .graphml file.

Parameters:

filename – Specify filename for exporting the graph.

graph: Graph | None

NetworkX graph. Set after calling create_network()

matchms.networking.get_top_hits(scores: <module 'matchms.Scores' from '/home/docs/checkouts/readthedocs.org/user_builds/matchms/checkouts/stable/matchms/Scores.py'>, identifier_key: str = 'spectrum_id', top_n: int = 25, search_by: str = 'queries', score_name: str = None, ignore_diagonal: bool = False) Tuple[dict, dict][source]

Get top_n highest scores (and indices) for every entry.

Parameters:
  • scores – Matchms Scores object containing all similarities.

  • identifier_key – Metadata key for unique intentifier for each spectrum in scores. Will also be used for the naming the network nodes. Default is ‘spectrum_id’.

  • top_n – Return the indexes and scores for the top_n highest scores. Scores between a spectrum with itself (diagonal of scores.scores) will not be taken into account.

  • search_by – Chose between ‘queries’ or ‘references’ which decides if the top_n matches for every spectrum in scores.queries or in scores.references will be collected and returned.

  • score_name – Name of the score that should be used (if scores contains multiple different scores).

  • ignore_diagonal – Set to True if scores.scores is symmetric (i.e. if references and queries were the same) and if scores between spectra with themselves should be excluded.

Submodules