FeaturizeIcoxxlist#

class lobsterpy.featurize.core.FeaturizeIcoxxlist(path_to_icoxxlist, path_to_structure, path_to_grosspop=None, bin_width=0.02, interactions_tol=0.001, max_length=6.0, min_length=0.0, n_electrons_scaling=False, normalization='formula_units', are_cobis=False, are_coops=False)[source]#

Bases: object

Class to Featurize ICOXXLIST.lobster as Bond weighted distribution function (BWDF).

Parameters:
  • path_to_icoxxlist (str | Path) – path to ICOXXLIST.lobster

  • path_to_structure (str | Path) – path to structure file (e.g., CONTCAR (preferred), POSCAR)

  • path_to_grosspop (str | Path | None) – path to GROSSPOP.lobster

  • bin_width (float) – bin width for the BWDF

  • interactions_tol (float) – tolerance for interactions

  • max_length (float) – maximum bond length for BWDF computation

  • min_length (float) – minimum bond length for BWDF computation

  • n_electrons_scaling (bool) – bool indicating if ICOXX values should be scaled by number of electrons. Only for testing purposes. Should not affect the results in any meaningful way.

  • normalization (Literal['formula_units', 'area', 'counts', 'none']) – normalization strategy for BWDF

  • are_cobis (bool) – bool indicating if file contains COBI/ICOBI data

  • are_coops (bool) – bool indicating if file contains COOP/ICOOP data

get_icoxx_neighbors_data(site_index=None)[source]#

Get the neighbors data with icoxx values for a structure.

Uses a distance based neighbor list as reference to map the neighbor’s data.

Parameters:

site_index (int | None) – index of the site for which neighbors data is returned. Default is None (All sites).

Returns:

Neighbors data as a dictionary with the following information

  • ”ref_rdf_data”: radial distribution function (RDF) data

  • ”input_icoxx_list”: complete ICOXXLIST.lobster data in the form of list of tuples

  • ”mapped_icoxx_data”: ICOXX values mapped to RDF data

  • ”missing_interactions”: list of interactions that are present in RDF data but not in ICOXX data

  • ”wasserstein_dist_to_rdf”: wasserstein distance computed between ref_rdf_data and mapped_icoxx_data.

Return type:

dict

calc_bwdf()[source]#

Compute BWDF from ICOXXLIST.lobster data.

Returns:

BWDF as a dictionary for each atom pair and entire structure

  • ”A-B”: BWDF for atom pair A-B, e.g., “Na-Cl”: {“icoxx_binned”: np.array, “icoxx_counts”: np.array}

  • ”summed”: BWDF for entire structure, e.g., “summed”: {“icoxx_binned”: np.array, “icoxx_counts”: np.array}

  • ”centers”: bin centers for BWDF

  • ”edges”: bin edges for BWDF

  • ”bin_width”: bin width

  • ”wasserstein_dist_to_rdf”: wasserstein distance between RDF and ICOXX data

Return type:

dict

calc_site_asymmetry_index(site_index)[source]#

Compute the asymmetry index for a site using bond strengths as weights.

Parameters:

site_index (int) – index of the site for which the asymmetry index needs to be computed

Return type:

float

References

    1. Belli, E. Zurek, I. Errea, 2025, DOI 10.48550/arXiv.2501.14420

Returns:

Asymmetry index for the site

Parameters:

site_index (int)

Return type:

float

calc_site_bwdf(site_index)[source]#

Compute BWDF from ICOXXLIST.lobster data for a site.

Parameters:

site_index (int) – index of the site for which BWDF needs to be computed

Returns:

BWDF as a dictionary for the site in the following format

  • ”X”: BWDF for the site X, e.g., “0”: {“icoxx_binned”: np.array, “icoxx_counts”: np.array}

  • ”centers”: bin centers for BWDF

  • ”edges”: bin edges for BWDF

  • ”bin_width”: bin width

  • ”wasserstein_dist_to_rdf”: wasserstein distance between RDF and ICOXX data

Return type:

dict

calc_label_bwdf(bond_label)[source]#

Compute BWDF from ICOXXLIST.lobster data for a bond label.

Parameters:

bond_label (str) – bond label for which BWDF needs to be computed

Returns:

BWDF as a dictionary for the bond label in the following format

  • ”X”: BWDF for the bond label, e.g., “20”: {“icoxx_binned”: np.array, “icoxx_counts”: np.array}

  • ”centers”: bin centers for BWDF

  • ”edges”: bin edges for BWDF

  • ”bin_width”: bin width

  • ”wasserstein_dist_to_rdf”: wasserstein distance between RDF and ICOXX data

Return type:

dict

get_asymmetry_index_stats_df(ids=None)[source]#

Return a pandas dataframe with asymmetry index statistical information as columns.

Parameters:

ids (str | None) – set the index name in the pandas dataframe. Default is None.

Returns:

A pandas dataframe object with asymmetry index statistical information as columns. Columns include sum, mean, std, min, and max.

Return type:

DataFrame

get_binned_bwdf_df(ids=None)[source]#

Return a pandas dataframe with computed BWDF features as columns.

Parameters:

ids (str | None) – set index name in the pandas dataframe. Default is None.

Returns:

A pandas dataframe object with BWDF as columns. Each column contains sum of icoxx values corresponding to bins.

Return type:

DataFrame

get_site_df(site_index, ids=None)[source]#

Return a pandas dataframe with computed BWDF features for a site as columns.

Parameters:
  • site_index (int) – index of the site in a structure for which BWDF needs to be computed

  • ids (str | None) – set the index name in the pandas dataframe. Default is None.

Returns:

A pandas dataframe object with BWDF as columns for a site. Each column contains sum of icoxx values corresponding to bins.

Return type:

DataFrame

get_site_bwdf_stats_df(ids=None)[source]#

Return a pandas datafram with mean and std from sitewise BWDFs.

Parameters:

ids (str | None) – set the index name in the pandas dataframe. Default is None.

Returns:

A pandas dataframe object with BWDF statistical information as columns. The columns include the mean and standard deviation calculated from the sitewise BWDFs stats (i.e., sum, mean, minimum, maximum, std, skewness, and kurtosis).

Return type:

DataFrame

get_pair_bwdf_stats_df(ids=None)[source]#

Return a pandas dataframe with statistical info from pairwise BWDFs.

Parameters:

ids (str | None) – set the index name in the pandas dataframe. Default is None.

Returns:

A pandas dataframe object with BWDF statistical information as columns. The columns include the mean and standard deviation calculated from the pairwise BWDFs stats (i.e., sum, mean, minimum, maximum, std, skewness, and kurtosis).

Return type:

DataFrame

get_summed_bwdf_stats_df(ids=None)[source]#

Return a pandas dataframe with statistical info from BWDF as columns.

Parameters:

ids (str | None) – set the index name in the pandas dataframe. Default is None.

Returns:

A pandas dataframe object with BWDF statistical information as columns. Columns include sum, mean, std, min, max, skew, kurtosis, weighted mean and weighted std.

Return type:

DataFrame

get_stats_df(ids=None, stats_type='summed')[source]#

Convenience method to get a pandas dataframe with statistical info from BWDF as columns.

Parameters:
  • ids (str | None) – set the index name in the pandas dataframe. Default is None.

  • stats_type (Literal['atompair', 'site', 'summed', 'all']) –

    type of BWDF stats to be returned. Default is “summed”.

    • ”atompair”: compute stats from unique atom pairs BWDFs.

    • ”site”: compute stats from site BWDFs.

    • ”summed”: compute stats from structure BWDFs.

    • ”all”: concatenated dataframe from atompair, site and summed options.

Returns:

A pandas dataframe object with BWDF statistical information as columns.

Return type:

DataFrame

get_sorted_bwdf_df(ids=None)[source]#

Return a pandas dataframe with BWDF values sorted by distances, ascending.

Parameters:

ids (str | None) – set the index name in the pandas dataframe. Default is None.

Returns:

A pandas dataframe object with binned BWDF values sorted by distance.

Return type:

DataFrame

get_sorted_dist_df(ids=None, mode='negative')[source]#

Return a pandas dataframe with distances sorted by BWDF values (either only positive or negative), sorted descending by absolute values.

Parameters:
  • ids (str | None) – set the index name in the pandas dataframe. Default is None

  • mode (Literal['positive', 'negative']) – must be in (“positive”, “negative”), defines whether BWDF values above or below zero are considered for distance featurization.

Returns:

A pandas dataframe object with binned distances sorted by BWDF values.

Return type:

DataFrame