FeaturizeIcoxxlist#
- class lobsterpy.featurize.core.FeaturizeIcoxxlist(path_to_icoxxlist, path_to_structure, path_to_grosspop=None, bin_width=0.02, interactions_tol=0.001, max_length=6.0, min_length=0.0, n_electrons_scaling=False, normalization='formula_units', are_cobis=False, are_coops=False)[source]#
Bases:
object
Class to Featurize ICOXXLIST.lobster as Bond weighted distribution function (BWDF).
- Parameters:
path_to_icoxxlist (str | Path) – path to ICOXXLIST.lobster
path_to_structure (str | Path) – path to structure file (e.g., CONTCAR (preferred), POSCAR)
path_to_grosspop (str | Path | None) – path to GROSSPOP.lobster
bin_width (float) – bin width for the BWDF
interactions_tol (float) – tolerance for interactions
max_length (float) – maximum bond length for BWDF computation
min_length (float) – minimum bond length for BWDF computation
n_electrons_scaling (bool) – bool indicating if ICOXX values should be scaled by number of electrons. Only for testing purposes. Should not affect the results in any meaningful way.
normalization (Literal['formula_units', 'area', 'counts', 'none']) – normalization strategy for BWDF
are_cobis (bool) – bool indicating if file contains COBI/ICOBI data
are_coops (bool) – bool indicating if file contains COOP/ICOOP data
- get_icoxx_neighbors_data(site_index=None)[source]#
Get the neighbors data with icoxx values for a structure.
Uses a distance based neighbor list as reference to map the neighbor’s data.
- Parameters:
site_index (int | None) – index of the site for which neighbors data is returned. Default is None (All sites).
- Returns:
Neighbors data as a dictionary with the following information
”ref_rdf_data”: radial distribution function (RDF) data
”input_icoxx_list”: complete ICOXXLIST.lobster data in the form of list of tuples
”mapped_icoxx_data”: ICOXX values mapped to RDF data
”missing_interactions”: list of interactions that are present in RDF data but not in ICOXX data
”wasserstein_dist_to_rdf”: wasserstein distance computed between ref_rdf_data and mapped_icoxx_data.
- Return type:
dict
- calc_bwdf()[source]#
Compute BWDF from ICOXXLIST.lobster data.
- Returns:
BWDF as a dictionary for each atom pair and entire structure
”A-B”: BWDF for atom pair A-B, e.g., “Na-Cl”: {“icoxx_binned”: np.array, “icoxx_counts”: np.array}
”summed”: BWDF for entire structure, e.g., “summed”: {“icoxx_binned”: np.array, “icoxx_counts”: np.array}
”centers”: bin centers for BWDF
”edges”: bin edges for BWDF
”bin_width”: bin width
”wasserstein_dist_to_rdf”: wasserstein distance between RDF and ICOXX data
- Return type:
dict
- calc_site_asymmetry_index(site_index)[source]#
Compute the asymmetry index for a site using bond strengths as weights.
- Parameters:
site_index (int) – index of the site for which the asymmetry index needs to be computed
- Return type:
float
References
Belli, E. Zurek, I. Errea, 2025, DOI 10.48550/arXiv.2501.14420
- Returns:
Asymmetry index for the site
- Parameters:
site_index (int)
- Return type:
float
- calc_site_bwdf(site_index)[source]#
Compute BWDF from ICOXXLIST.lobster data for a site.
- Parameters:
site_index (int) – index of the site for which BWDF needs to be computed
- Returns:
BWDF as a dictionary for the site in the following format
”X”: BWDF for the site X, e.g., “0”: {“icoxx_binned”: np.array, “icoxx_counts”: np.array}
”centers”: bin centers for BWDF
”edges”: bin edges for BWDF
”bin_width”: bin width
”wasserstein_dist_to_rdf”: wasserstein distance between RDF and ICOXX data
- Return type:
dict
- calc_label_bwdf(bond_label)[source]#
Compute BWDF from ICOXXLIST.lobster data for a bond label.
- Parameters:
bond_label (str) – bond label for which BWDF needs to be computed
- Returns:
BWDF as a dictionary for the bond label in the following format
”X”: BWDF for the bond label, e.g., “20”: {“icoxx_binned”: np.array, “icoxx_counts”: np.array}
”centers”: bin centers for BWDF
”edges”: bin edges for BWDF
”bin_width”: bin width
”wasserstein_dist_to_rdf”: wasserstein distance between RDF and ICOXX data
- Return type:
dict
- get_asymmetry_index_stats_df(ids=None)[source]#
Return a pandas dataframe with asymmetry index statistical information as columns.
- Parameters:
ids (str | None) – set the index name in the pandas dataframe. Default is None.
- Returns:
A pandas dataframe object with asymmetry index statistical information as columns. Columns include sum, mean, std, min, and max.
- Return type:
DataFrame
- get_binned_bwdf_df(ids=None)[source]#
Return a pandas dataframe with computed BWDF features as columns.
- Parameters:
ids (str | None) – set index name in the pandas dataframe. Default is None.
- Returns:
A pandas dataframe object with BWDF as columns. Each column contains sum of icoxx values corresponding to bins.
- Return type:
DataFrame
- get_site_df(site_index, ids=None)[source]#
Return a pandas dataframe with computed BWDF features for a site as columns.
- Parameters:
site_index (int) – index of the site in a structure for which BWDF needs to be computed
ids (str | None) – set the index name in the pandas dataframe. Default is None.
- Returns:
A pandas dataframe object with BWDF as columns for a site. Each column contains sum of icoxx values corresponding to bins.
- Return type:
DataFrame
- get_site_bwdf_stats_df(ids=None)[source]#
Return a pandas datafram with mean and std from sitewise BWDFs.
- Parameters:
ids (str | None) – set the index name in the pandas dataframe. Default is None.
- Returns:
A pandas dataframe object with BWDF statistical information as columns. The columns include the mean and standard deviation calculated from the sitewise BWDFs stats (i.e., sum, mean, minimum, maximum, std, skewness, and kurtosis).
- Return type:
DataFrame
- get_pair_bwdf_stats_df(ids=None)[source]#
Return a pandas dataframe with statistical info from pairwise BWDFs.
- Parameters:
ids (str | None) – set the index name in the pandas dataframe. Default is None.
- Returns:
A pandas dataframe object with BWDF statistical information as columns. The columns include the mean and standard deviation calculated from the pairwise BWDFs stats (i.e., sum, mean, minimum, maximum, std, skewness, and kurtosis).
- Return type:
DataFrame
- get_summed_bwdf_stats_df(ids=None)[source]#
Return a pandas dataframe with statistical info from BWDF as columns.
- Parameters:
ids (str | None) – set the index name in the pandas dataframe. Default is None.
- Returns:
A pandas dataframe object with BWDF statistical information as columns. Columns include sum, mean, std, min, max, skew, kurtosis, weighted mean and weighted std.
- Return type:
DataFrame
- get_stats_df(ids=None, stats_type='summed')[source]#
Convenience method to get a pandas dataframe with statistical info from BWDF as columns.
- Parameters:
ids (str | None) – set the index name in the pandas dataframe. Default is None.
stats_type (Literal['atompair', 'site', 'summed', 'all']) –
type of BWDF stats to be returned. Default is “summed”.
”atompair”: compute stats from unique atom pairs BWDFs.
”site”: compute stats from site BWDFs.
”summed”: compute stats from structure BWDFs.
”all”: concatenated dataframe from atompair, site and summed options.
- Returns:
A pandas dataframe object with BWDF statistical information as columns.
- Return type:
DataFrame
- get_sorted_bwdf_df(ids=None)[source]#
Return a pandas dataframe with BWDF values sorted by distances, ascending.
- Parameters:
ids (str | None) – set the index name in the pandas dataframe. Default is None.
- Returns:
A pandas dataframe object with binned BWDF values sorted by distance.
- Return type:
DataFrame
- get_sorted_dist_df(ids=None, mode='negative')[source]#
Return a pandas dataframe with distances sorted by BWDF values (either only positive or negative), sorted descending by absolute values.
- Parameters:
ids (str | None) – set the index name in the pandas dataframe. Default is None
mode (Literal['positive', 'negative']) – must be in (“positive”, “negative”), defines whether BWDF values above or below zero are considered for distance featurization.
- Returns:
A pandas dataframe object with binned distances sorted by BWDF values.
- Return type:
DataFrame