matchms.networking package

Functions for creating and analysing spectral networks

class matchms.networking.SimilarityNetwork(identifier_key: str = 'spectrum_id', top_n: int = 20, max_links: int = 10, score_cutoff: float = 0.7, link_method: str = 'single', keep_unconnected_nodes: bool = True)[source]

Bases: object

Create a spectral network from spectrum similarities.

For example

import numpy as np
from matchms import Spectrum, calculate_scores
from matchms.similarity import ModifiedCosineGreedy
from matchms.networking import SimilarityNetwork

spectrum_1 = Spectrum(mz=np.array([100, 150, 200.]),
                      intensities=np.array([0.7, 0.2, 0.1]),
                      metadata={"precursor_mz": 100.0,
                                "test_id": "one"})
spectrum_2 = Spectrum(mz=np.array([104.9, 140, 190.]),
                      intensities=np.array([0.4, 0.2, 0.1]),
                      metadata={"precursor_mz": 105.0,
                                "test_id": "two"})

# Use factory to construct a similarity function
modified_cosine = ModifiedCosineGreedy(tolerance=0.2)
spectra = [spectrum_1, spectrum_2]
scores = calculate_scores(spectra, spectra, modified_cosine)
ms_network = SimilarityNetwork(identifier_key="test_id")
ms_network.create_network(scores, score_name="ModifiedCosineGreedy_score")

nodes = list(ms_network.graph.nodes())
nodes.sort()
print(nodes)

Should output

['one', 'two']

__init__(identifier_key: str = 'spectrum_id', top_n: int = 20, max_links: int = 10, score_cutoff: float = 0.7, link_method: str = 'single', keep_unconnected_nodes: bool = True)[source]

Parameters:

identifier_key – Metadata key for unique identifier for each spectrum in scores. Will also be used for the naming the network nodes. Default is ‘spectrum_id’.
top_n – Consider edge between spectrumA and spectrumB if score falls into top_n for spectrumA or spectrumB (link_method=”single”), or into top_n for spectrumA and spectrumB (link_method=”mutual”). From those potential links, only max_links will be kept, so top_n must be >= max_links.
max_links – Maximum number of links to add per node. Default = 10. Due to incoming links, total number of links per node can be higher. The links are populated by looping over the query spectra. Important side note: The max_links restriction is strict which means that if scores around max_links are equal still only max_links will be added which can results in some random variations (sorting spectra with equal scores results in a random order of such elements).
score_cutoff – Threshold for given similarities. Edges/Links will only be made for similarities > score_cutoff. Default = 0.7.
link_method – Chose between ‘single’ and ‘mutual’. ‘single will add all links based on individual nodes. ‘mutual’ will only add links if that link appears in the given top-n list for both nodes.
keep_unconnected_nodes – If set to True (default) all spectra will be included as nodes even if they have no connections/edges of other spectra. If set to False all nodes without connections will be removed.

create_network(scores: <module 'matchms.Scores' from '/home/docs/checkouts/readthedocs.org/user_builds/matchms/checkouts/stable/matchms/Scores.py'>, score_name: str = None)[source]

Function to create network from given top-n similarity values. Expects that similarities given in scores are from an all-vs-all comparison including all possible pairs.

Parameters:: scores – Matchms Scores object containing all spectra and pair similarities for generating a network.

export_to_file(filename: str, graph_format: str = 'graphml')[source]

Save the network to a file with chosen format.

Parameters:

filename – Path to file to write to.
graph_format – Format, in which to store the network. Supported formats are: “cyjs”, “gexf”, “gml”, “graphml”, “json”. Default is “graphml”.

export_to_graphml(filename: str)[source]

Save the network as .graphml file.

Parameters:: filename – Specify filename for exporting the graph.

graph: Graph | None: NetworkX graph. Set after calling create_network()

matchms.networking.get_top_hits(scores: <module 'matchms.Scores' from '/home/docs/checkouts/readthedocs.org/user_builds/matchms/checkouts/stable/matchms/Scores.py'>, identifier_key: str = 'spectrum_id', top_n: int = 25, search_by: str = 'queries', score_name: str = None, ignore_diagonal: bool = False) → Tuple[dict, dict][source]

Get top_n highest scores (and indices) for every entry.

Parameters:

scores – Matchms Scores object containing all similarities.
identifier_key – Metadata key for unique intentifier for each spectrum in scores. Will also be used for the naming the network nodes. Default is ‘spectrum_id’.
top_n – Return the indexes and scores for the top_n highest scores. Scores between a spectrum with itself (diagonal of scores.scores) will not be taken into account.
search_by – Chose between ‘queries’ or ‘references’ which decides if the top_n matches for every spectrum in scores.queries or in scores.references will be collected and returned.
score_name – Name of the score that should be used (if scores contains multiple different scores).
ignore_diagonal – Set to True if scores.scores is symmetric (i.e. if references and queries were the same) and if scores between spectra with themselves should be excluded.

matchms.networking package

Functions for creating and analysing spectral networks

Submodules