matchms.similarity.BaseSimilarity module

class matchms.similarity.BaseSimilarity.BaseSimilarity[source]

Bases: ABC

Similarity function base class.

When building a custom similarity measure, inherit from this class and implement the desired methods.

is_commutative: Whether the similarity function is commutative, meaning that the order of spectra does not matter: similarity(A, B) == similarity(B, A). Default is True.

score_datatype: NumPy dtype of a single score value. Examples are np.float64 for scalar scores or a structured dtype such as np.dtype([("score", np.float64), ("matches", np.int64)]) for multi-field scores.

score_fields: Names of the score fields. For scalar scores this should usually be ("score",). For structured scores, this should match the dtype field names, for instance ("score", "matches").

property is_structured_score: bool: Return True if this similarity uses a structured score dtype.

matrix(spectra_1: Sequence[Spectrum], spectra_2: Sequence[Spectrum] | None = None, score_fields: Sequence[str] | None = None, progress_bar: bool = True)[source]

Calculate a dense similarity matrix.

Parameters:

spectra_1 – First collection of spectra.
spectra_2 – Second collection of spectra. If None, compare spectra_1 against itself. For commutative similarities this automatically uses a symmetric optimization.
score_fields – Score fields to return. - None means return all available fields. - For scalar scores, only ("score",) is valid. - For structured scores, this can be a subset such as ("score",).
progress_bar – When True, show a progress bar. Default is True.

Returns:

Dense score result wrapped in a Scores container.

Return type:

Scores

abstractmethod pair(spectrum_1: Spectrum, spectrum_2: Spectrum)[source]

Calculate the similarity for one pair of spectra.

Parameters:

spectrum_1 – First spectrum.
spectrum_2 – Second spectrum.

Returns:

Similarity result for one pair. The returned value should be compatible with self.score_datatype.

Return type:

score

Examples

Scalar score:: return np.asarray(score, dtype=self.score_datatype)
Structured score:: return np.asarray((score, matches), dtype=self.score_datatype)

score_datatype: alias of float64

sparse_matrix(spectra_1, spectra_2=None, idx_row=None, idx_col=None, score_fields=None, score_filter=None, progress_bar: bool = True)[source]: Sparse score computation is not available for this similarity.

to_dict() → dict[source]: Return a dictionary representation of the similarity function.

class matchms.similarity.BaseSimilarity.BaseSimilarityWithSparse[source]

Bases: BaseSimilarity

Base similarity class with a default sparse implementation.

This class extends BaseSimilarity by providing a default implementation of sparse_matrix() that applies a score filter to the dense results.

Subclasses can override keep_score() to define the default filtering behavior, and users can also pass a custom score_filter=… to sparse_matrix() for per-call control.

property is_structured_score: bool: Return True if this similarity uses a structured score dtype.

keep_score(score) → bool[source]

Return whether a score should be retained in sparse outputs.

This defines the default sparse retention behavior. Users can override it per call via score_filter=....

Default behavior: - scalar score: keep if score != 0 - structured score: keep if all fields are non-zero

matrix(spectra_1: Sequence[Spectrum], spectra_2: Sequence[Spectrum] | None = None, score_fields: Sequence[str] | None = None, progress_bar: bool = True)

Calculate a dense similarity matrix.

Parameters:

spectra_1 – First collection of spectra.
spectra_2 – Second collection of spectra. If None, compare spectra_1 against itself. For commutative similarities this automatically uses a symmetric optimization.
score_fields – Score fields to return. - None means return all available fields. - For scalar scores, only ("score",) is valid. - For structured scores, this can be a subset such as ("score",).
progress_bar – When True, show a progress bar. Default is True.

Returns:

Dense score result wrapped in a Scores container.

Return type:

Scores

abstractmethod pair(spectrum_1: Spectrum, spectrum_2: Spectrum)

Calculate the similarity for one pair of spectra.

Parameters:

spectrum_1 – First spectrum.
spectrum_2 – Second spectrum.

Returns:

Similarity result for one pair. The returned value should be compatible with self.score_datatype.

Return type:

score

Examples

Scalar score:: return np.asarray(score, dtype=self.score_datatype)
Structured score:: return np.asarray((score, matches), dtype=self.score_datatype)

score_datatype: alias of float64

sparse_matrix(spectra_1: Sequence[Spectrum], spectra_2: Sequence[Spectrum] | None = None, idx_row: ArrayLike | None = None, idx_col: ArrayLike | None = None, score_fields: Sequence[str] | None = None, score_filter: Callable[[ndarray], bool] | None = None, progress_bar: bool = True)[source]

Calculate sparse similarity results.

Filtering is applied to the full score before score field projection.

Parameters:

spectra_1 – First collection of spectra.
spectra_2 – Second collection of spectra. If None, compare spectra_1 against itself.
idx_row – Row indices of pairs to compute. If None and idx_col is also None, all pairwise comparisons are considered and only retained scores are stored.
idx_col – Column indices of pairs to compute. Must have the same shape as idx_row.
score_fields – Score fields to return. - None means return all available fields. - For scalar scores, only ("score",) is valid. - For structured scores, this can be a subset such as ("score",).
score_filter – Optional callable receiving the full score and returning whether it should be retained. If None, keep_score() is used.
progress_bar – When True, show a progress bar.

Returns:

Sparse score result wrapped in a Scores container.

Return type:

Scores

to_dict() → dict: Return a dictionary representation of the similarity function.