matchms.networking package¶
Functions for creating and analysing spectral networks¶
- class matchms.networking.SimilarityNetwork(identifier_key: str = 'spectrum_id', top_n: int = 20, max_links: int = 10, score_cutoff: float = 0.7, link_method: str = 'single', keep_unconnected_nodes: bool = True)[source]¶
Bases:
object
Create a spectral network from spectrum similarities.
For example
import numpy as np from matchms import Spectrum, calculate_scores from matchms.similarity import ModifiedCosine from matchms.networking import SimilarityNetwork spectrum_1 = Spectrum(mz=np.array([100, 150, 200.]), intensities=np.array([0.7, 0.2, 0.1]), metadata={"precursor_mz": 100.0, "test_id": "one"}) spectrum_2 = Spectrum(mz=np.array([104.9, 140, 190.]), intensities=np.array([0.4, 0.2, 0.1]), metadata={"precursor_mz": 105.0, "test_id": "two"}) # Use factory to construct a similarity function modified_cosine = ModifiedCosine(tolerance=0.2) spectrums = [spectrum_1, spectrum_2] scores = calculate_scores(spectrums, spectrums, modified_cosine) ms_network = SimilarityNetwork(identifier_key="test_id") ms_network.create_network(scores, score_name="ModifiedCosine_score") nodes = list(ms_network.graph.nodes()) nodes.sort() print(nodes)
Should output
['one', 'two']
- __init__(identifier_key: str = 'spectrum_id', top_n: int = 20, max_links: int = 10, score_cutoff: float = 0.7, link_method: str = 'single', keep_unconnected_nodes: bool = True)[source]¶
- Parameters:
identifier_key – Metadata key for unique identifier for each spectrum in scores. Will also be used for the naming the network nodes. Default is ‘spectrum_id’.
top_n – Consider edge between spectrumA and spectrumB if score falls into top_n for spectrumA or spectrumB (link_method=”single”), or into top_n for spectrumA and spectrumB (link_method=”mutual”). From those potential links, only max_links will be kept, so top_n must be >= max_links.
max_links – Maximum number of links to add per node. Default = 10. Due to incoming links, total number of links per node can be higher. The links are populated by looping over the query spectrums. Important side note: The max_links restriction is strict which means that if scores around max_links are equal still only max_links will be added which can results in some random variations (sorting spectra with equal scores results in a random order of such elements).
score_cutoff – Threshold for given similarities. Edges/Links will only be made for similarities > score_cutoff. Default = 0.7.
link_method – Chose between ‘single’ and ‘mutual’. ‘single will add all links based on individual nodes. ‘mutual’ will only add links if that link appears in the given top-n list for both nodes.
keep_unconnected_nodes – If set to True (default) all spectra will be included as nodes even if they have no connections/edges of other spectra. If set to False all nodes without connections will be removed.
- create_network(scores: <module 'matchms.Scores' from '/home/docs/checkouts/readthedocs.org/user_builds/matchms/checkouts/latest/readthedocs/../matchms/Scores.py'>, score_name: str | None = None)[source]¶
Function to create network from given top-n similarity values. Expects that similarities given in scores are from an all-vs-all comparison including all possible pairs.
- Parameters:
scores – Matchms Scores object containing all spectrums and pair similarities for generating a network.
- export_to_file(filename: str, graph_format: str = 'graphml')[source]¶
Save the network to a file with chosen format.
- Parameters:
filename – Path to file to write to.
graph_format – Format, in which to store the network graph. Supported formats are: “cyjs”, “gexf”, “gml”, “graphml”, “json”. Default is “graphml”.