matchms.networking.SimilarityNetwork module

class matchms.networking.SimilarityNetwork.SimilarityNetwork(identifier_key: str = 'spectrum_id', top_n: int = 20, max_links: int = 10, score_cutoff: float = 0.7, link_method: str = 'single', keep_unconnected_nodes: bool = True)[source]

Bases: object

Create a spectal network from spectrum similarities.

For example

import numpy as np
from matchms import Spectrum, calculate_scores
from matchms.similarity import ModifiedCosine
from matchms.networking import SimilarityNetwork

spectrum_1 = Spectrum(mz=np.array([100, 150, 200.]),
                      intensities=np.array([0.7, 0.2, 0.1]),
                      metadata={"precursor_mz": 100.0,
                                "test_id": "one"})
spectrum_2 = Spectrum(mz=np.array([104.9, 140, 190.]),
                      intensities=np.array([0.4, 0.2, 0.1]),
                      metadata={"precursor_mz": 105.0,
                                "test_id": "two"})

# Use factory to construct a similarity function
modified_cosine = ModifiedCosine(tolerance=0.2)
spectrums = [spectrum_1, spectrum_2]
scores = calculate_scores(spectrums, spectrums, modified_cosine)
ms_network = SimilarityNetwork(identifier_key="test_id")
ms_network.create_network(scores)

nodes = list(ms_network.graph.nodes())
nodes.sort()
print(nodes)

Should output

['one', 'two']
__init__(identifier_key: str = 'spectrum_id', top_n: int = 20, max_links: int = 10, score_cutoff: float = 0.7, link_method: str = 'single', keep_unconnected_nodes: bool = True)[source]
Parameters
  • identifier_key – Metadata key for unique intentifier for each spectrum in scores. Will also be used for the naming the network nodes. Default is ‘spectrum_id’.

  • top_n – Consider edge between spectrumA and spectrumB if score falls into top_n for spectrumA or spectrumB (link_method=”single”), or into top_n for spectrumA and spectrumB (link_method=”mutual”). From those potential links, only max_links will be kept, so top_n must be >= max_links.

  • max_links – Maximum number of links to add per node. Default = 10. Due to incoming links, total number of links per node can be higher. The links are populated by looping over the query spectrums. Important side note: The max_links restriction is strict which means that if scores around max_links are equal still only max_links will be added which can results in some random variations (sorting spectra with equal scores restuls in a random order of such elements).

  • score_cutoff – Threshold for given similarities. Edges/Links will only be made for similarities > score_cutoff. Default = 0.7.

  • link_method – Chose between ‘single’ and ‘mutual’. ‘single will add all links based on individual nodes. ‘mutual’ will only add links if that link appears in the given top-n list for both nodes.

  • keep_unconnected_nodes – If set to True (default) all spectra will be included as nodes even if they have no connections/edges of other spectra. If set to False all nodes without connections will be removed.

create_network(scores: <module 'matchms.Scores' from '/home/docs/checkouts/readthedocs.org/user_builds/matchms/checkouts/latest/readthedocs/../matchms/Scores.py'>)[source]

Function to create network from given top-n similarity values. Expects that similarities given in scores are from an all-vs-all comparison including all possible pairs.

Parameters

scores – Matchms Scores object containing all spectrums and pair similarities for generating a network.

export_to_graphml(filename: str)[source]

Save the network as .graphml file.

Parameters

filename – Specify filename for exporting the graph.

graph: Optional[networkx.classes.graph.Graph]

NetworkX graph. Set after calling create_network()