matchms.utils module

class matchms.utils.ChemMock(seq)[source]

Bases: collections.UserString

__call__(*args, **kwargs)[source]

Call self as a function.

__init__(seq)

Initialize self. See help(type(self)) for accurate signature.

count(value)integer return number of occurrences of value
index(value[, start[, stop]])integer return first index of value.

Raises ValueError if the value is not present.

Supporting start and stop arguments is optional, but recommended.

maketrans(y=None, z=None, /)

Return a translation table usable for str.translate().

If there is only one argument, it must be a dictionary mapping Unicode ordinals (integers) or characters to Unicode ordinals, strings or None. Character keys will be then converted to ordinals. If there are two arguments, they must be strings of equal length, and in the resulting dictionary, each character in x will be mapped to the character at the same position in y. If there is a third argument, it must be a string, whose characters will be mapped to None in the result.

matchms.utils.clean_adduct(adduct: str)str[source]

Clean adduct and make it consistent in style. Will transform adduct strings of type ‘M+H+’ to ‘[M+H]+’.

Parameters

adduct – Input adduct string to be cleaned/edited.

matchms.utils.convert_inchi_to_inchikey(inchi: str)Optional[str][source]

Convert inchi to inchikey using rdkit.

matchms.utils.convert_inchi_to_smiles(inchi: str)Optional[str][source]

Convert inchi to smiles using rdkit.

matchms.utils.convert_smiles_to_inchi(smiles: str)Optional[str][source]

Convert smiles to inchi using rdkit.

matchms.utils.derive_fingerprint_from_inchi(inchi: str, fingerprint_type: str, nbits: int)numpy.ndarray[source]

Calculate molecule fingerprint based on given inchi (using rdkit). Requires conda package rdkit to be installed.

Parameters
  • inchi – Input InChI to derive fingerprint from.

  • fingerprint_type – Determine method for deriving molecular fingerprints. Supported choices are ‘daylight’, ‘morgan1’, ‘morgan2’, ‘morgan3’.

  • nbits – Dimension or number of bits of generated fingerprint.

Returns

fingerprint – Molecular fingerprint.

Return type

numpy.array

matchms.utils.derive_fingerprint_from_smiles(smiles: str, fingerprint_type: str, nbits: int)numpy.ndarray[source]

Calculate molecule fingerprint based on given smiles or inchi (using rdkit). Requires conda package rdkit to be installed.

Parameters
  • smiles – Input smiles to derive fingerprint from.

  • fingerprint_type – Determine method for deriving molecular fingerprints. Supported choices are ‘daylight’, ‘morgan1’, ‘morgan2’, ‘morgan3’.

  • nbits – Dimension or number of bits of generated fingerprint.

Returns

Molecular fingerprint.

Return type

fingerprint

matchms.utils.filter_none(iterable: Iterable)Iterable[source]

Filter iterable to remove ‘None’ elements.

Args:

iterable (Iterable): Iterable to filter.

Returns:

Iterable: Filtered iterable.

matchms.utils.get_common_keys(first: List[str], second: List[str])List[str][source]

Get common elements of two sets of strings in a case insensitive way.

Args:

first (List[str]): First list of strings. second (List[str]): List of strings to search for matches.

Returns:

List[str]: List of common elements without regarding case of first list.

matchms.utils.get_first_common_element(first: Iterable[str], second: Iterable[str])str[source]

Get first common element from two lists. Returns ‘None’ if there are no common elements.

matchms.utils.is_valid_inchi(inchi: str)bool[source]

Return True if input string is valid InChI.

This functions test if string can be read by rdkit as InChI. Requires conda package rdkit to be installed.

Parameters

inchi – Input string to test if it has format of InChI.

matchms.utils.is_valid_inchikey(inchikey: str)bool[source]

Return True if string has format of inchikey.

Parameters

inchikey – Input string to test if it format of an inchikey.

matchms.utils.is_valid_smiles(smiles: str)bool[source]

Return True if input string is valid smiles.

This functions test if string can be read by rdkit as smiles. Requires conda package rdkit to be installed.

Parameters

smiles – Input string to test if it can be imported as smiles.

matchms.utils.looks_like_adduct(adduct)[source]

Return True if input string has expected format of an adduct.

matchms.utils.mol_converter(mol_input: str, input_type: str, output_type: str)Optional[str][source]

Convert molecular representations using rdkit.

Convert from “smiles” or “inchi” to “inchi”, “smiles”, or “inchikey”. Requires conda package rdkit to be installed.

Parameters
  • mol_input – Input data in “inchi” or “smiles” molecular representation.

  • input_type – Define input type: “smiles” for smiles and “inchi” for inchi.

  • output_type – Define output type: “smiles”, “inchi”, or “inchikey”.

  • Returns

  • --------

  • string in output type or None when conversion failure occurs. (Mol) –

matchms.utils.mol_to_fingerprint(mol: ‘’, fingerprint_type: str, nbits: int)numpy.ndarray[source]

Convert rdkit mol (molecule) to molecular fingerprint. Requires conda package rdkit to be installed.

Parameters
  • mol – Input rdkit molecule.

  • fingerprint_type – Determine method for deriving molecular fingerprints. Supported choices are ‘daylight’, ‘morgan1’, ‘morgan2’, ‘morgan3’.

  • nbits – Dimension or number of bits of generated fingerprint.

Returns

Molecular fingerprint.

Return type

fingerprint