Download this example as a Jupyter notebook or a Python script.

Hierarchical plotting#

This example shows how to combine all the results of a sustainability summary query into interactive hierarchical plots.

The following supporting files are required for this example:

sustainability-bom-2412.xml

For help on constructing an XML BoM, see BoM examples.

Info:

This example uses an input file that is in the 24/12 XML BoM format. This structure requires Granta MI Restricted Substances and Sustainability Reports 2025 R2 or later.

To run this example with an older version of the reports bundle, use sustainability-bom-2301.xml instead. Some sections of this example will produce different results from the published example when this BoM is used.

Run a sustainability summary query#

[1]:

from ansys.grantami.bomanalytics import Connection, queries

MASS_UNIT = "kg"
ENERGY_UNIT = "MJ"
DISTANCE_UNIT = "km"

server_url = "http://my_grantami_server/mi_servicelayer"
cxn = Connection(server_url).with_credentials("user_name", "password").connect()

xml_file_path = "../supporting-files/sustainability-bom-2412.xml"
with open(xml_file_path) as f:
    bom = f.read()

sustainability_summary_query = (
    queries.BomSustainabilitySummaryQuery()
    .with_bom(bom)
    .with_units(mass=MASS_UNIT, energy=ENERGY_UNIT, distance=DISTANCE_UNIT)
)
sustainability_summary = cxn.run(sustainability_summary_query)

Tabulated data#

To plot data hierarchically, first create a dataframe that aggregates all data together. See the other notebooks in this section for more detail around converting these properties to dataframes.

[2]:

import pandas as pd

EE_HEADER = f"EE [{ENERGY_UNIT}]"
CC_HEADER = f"CC [{MASS_UNIT}]"


def create_dataframe_record(item, parent):
    record = {
        "Parent": parent,
        EE_HEADER: item.embodied_energy.value,
        CC_HEADER: item.climate_change.value,
    }

    if parent == "Material":
        record["Name"] = item.identity
    elif parent == "Processes":
        try:  # Joining and finishing processes
            record["Name"] = item.name
        except AttributeError:  # Primary and secondary processes
            record["Name"] = f"{item.process_name} - {item.material_identity}"
    else:
        record["Name"] = item.name
    return record


records = []
records.extend(
    [
        create_dataframe_record(item, "")
        for item in sustainability_summary.phases_summary
    ]
)
records.extend(
    [
        create_dataframe_record(item, "Material")
        for item in sustainability_summary.material_details
    ]
)
records.extend(
    [
        create_dataframe_record(item, "Transport")
        for item in sustainability_summary.transport_details
    ]
)
records.extend(
    [
        create_dataframe_record(item, "Processes")
        for item in (
            sustainability_summary.primary_processes_details +
            sustainability_summary.secondary_processes_details +
            sustainability_summary.joining_and_finishing_processes_details
        )
    ]
)

df = pd.DataFrame.from_records(records)
df.head()

[2]:

	Parent	EE [MJ]	CC [kg]	Name
0		333.680522	32.013029	Material
1		159.474427	9.026297	Processes
2		99.896940	6.967125	Transport
3	Material	153.040336	11.963111	stainless-astm-cn-7ms-cast
4	Material	117.711949	15.533614	beryllium-beralcast191-cast

A lot of the rows in the dataframe are small in the context of the overall sustainability impact of the product. Define a function to aggregate all rows that contribute less than 5% of their phase’s sustainability impact into a single row.

[3]:

def sort_and_aggregate_small_values(df: pd.DataFrame) -> pd.DataFrame:
    # Define the criterion
    total_embodied_energy = df[EE_HEADER].sum()
    criterion = df[EE_HEADER] / total_embodied_energy < 0.05

    # Find rows that meet the criterion
    small_rows = df.loc[criterion]

    # If no rows met the aggregation criterion, return the original dataframe and exit
    if len(small_rows) == 0:
        return df

    # Aggregate the rows to a new "Other" row
    df_below_5_pct = small_rows.sum(numeric_only=True).to_frame().T
    df_below_5_pct["Name"] = "Other"

    # Sort all rows that do not meet the criterion by embodied energy
    df_over_5_pct = df.loc[~(criterion)].sort_values(by=EE_HEADER, ascending=False)

    # Concatenate the rows together
    df_aggregated = pd.concat([df_over_5_pct, df_below_5_pct], ignore_index=True)
    return df_aggregated

Apply this function to each sustainability phase, and then perform some additional tidying up of the dataframe.

[4]:

# Apply the function
df_aggregated = df.groupby("Parent").apply(sort_and_aggregate_small_values, include_groups=False)

# Convert the grouped dataframe back into a dataframe with a single index
df_aggregated.reset_index(inplace=True, level="Parent", drop=False)

# Rename the "Other" rows created by the function to include the parent name in the stage name
df_aggregated["Name"] = df_aggregated.apply(
    lambda x: f"Other {x['Parent']}" if x["Name"] == "Other" else x,
    axis="columns",
)["Name"]

# Reset the top-level numeric index
df_aggregated.reset_index(inplace=True, drop=True)

# Display the result
df_aggregated.head(10)

[4]:

	Parent	EE [MJ]	CC [kg]	Name
0		333.680522	32.013029	Material
1		159.474427	9.026297	Processes
2		99.896940	6.967125	Transport
3	Material	153.040336	11.963111	stainless-astm-cn-7ms-cast
4	Material	117.711949	15.533614	beryllium-beralcast191-cast
5	Material	62.928237	4.516303	steel-1010-annealed
6	Processes	74.662587	4.505396	Primary processing, Casting - stainless-astm-c...
7	Processes	51.088224	2.488701	Primary processing, Casting - steel-1010-annealed
8	Processes	25.943981	1.655270	Primary processing, Metal extrusion, hot - ste...
9	Processes	7.779635	0.376929	Other Processes

Sunburst chart#

A sunburst chart presents hierarchical data radially.

[5]:

import plotly.graph_objects as go

fig = go.Figure(
    go.Sunburst(
        labels=df_aggregated["Name"],
        parents=df_aggregated["Parent"],
        values=df_aggregated[EE_HEADER],
        branchvalues="total",
    ),
    layout_title_text=f"Embodied Energy [{ENERGY_UNIT}]",
)
fig.show()

Icicle chart#

An icicle chart presents hierarchical data as rectangular sectors.

[6]:

fig = go.Figure(
    go.Icicle(
        labels=df_aggregated["Name"],
        parents=df_aggregated["Parent"],
        values=df_aggregated[EE_HEADER],
        branchvalues="total",
    ),
    layout_title_text=f"Embodied Energy [{ENERGY_UNIT}]",
)
fig.show()

Sankey diagram#

Sankey diagrams represent data as a network of nodes and links, with the relative sizes of these nodes and links representing their contributions to the flow of some quantity. In plotly, Sankey diagrams require nodes and links to be defined explicitly.

First, create a dataframe to store the node data. Start from a copy of the dataframe used for the previous plots.

[7]:

node_df = df_aggregated.copy()

Replace empty parent cells with a reference to a new “Product” row. The new row will be created in the next cell.

[8]:

node_df["Parent"] = df_aggregated["Parent"].replace("", "Product")

Add a new row to represent the entire product. Values for this row are computed based on the sum of all nodes that are direct children of this row.

[9]:

product_row = {
    "Name": "Product",

    # Sum the contributions for all rows which are a child of 'Product'
    EE_HEADER: sum(node_df[node_df["Parent"] == "Product"][EE_HEADER]),
    CC_HEADER: sum(node_df[node_df["Parent"] == "Product"][CC_HEADER]),
    "Parent": "",
}

# Add the row to the end of the dataframe
node_df.loc[len(node_df)] = product_row

Define colors for each node type in the Sankey diagram by mapping a built-in Plotly color swatch to node names. First, attempt to get the color for a node based on its name. If this fails, use the name of the parent node instead.

[10]:

import plotly.express as px

color_map = {
    "Product": px.colors.qualitative.Pastel1[0],
    "Material": px.colors.qualitative.Pastel1[1],
    "Transport": px.colors.qualitative.Pastel1[2],
    "Processes": px.colors.qualitative.Pastel1[3],
}


def get_node_color(x):
    name = x["Name"]
    parent = x["Parent"]

    try:
        return color_map[name]
    except KeyError:
        return color_map[parent]


node_df["Color"] = node_df.apply(get_node_color, axis=1)
node_df.head()

[10]:

	Parent	EE [MJ]	CC [kg]	Name	Color
0	Product	333.680522	32.013029	Material	rgb(179,205,227)
1	Product	159.474427	9.026297	Processes	rgb(222,203,228)
2	Product	99.896940	6.967125	Transport	rgb(204,235,197)
3	Material	153.040336	11.963111	stainless-astm-cn-7ms-cast	rgb(179,205,227)
4	Material	117.711949	15.533614	beryllium-beralcast191-cast	rgb(179,205,227)

Next, create a dataframe to store the link information.

Each row in this dataframe represents a link on the Sankey diagram. All links have a ‘source’ and a ‘target’, and nodes may function as a source, as a target, or as both.

[11]:

link_df = pd.DataFrame()

Copy the row index values from the node dataframe to the “Source” column in the new dataframe. Skip the “Product” row, since this node does not act as the source for any links.

[12]:

# Store all nodes which act as sources in a variable for repeated use
source_nodes = node_df[node_df["Name"] != "Product"]

link_df["Source"] = source_nodes.index

Create a “Target” column by using the node dataframe as a cross-reference to infer the hierarchy.

[13]:

link_df["Target"] = source_nodes["Parent"].apply(lambda x: node_df.index[node_df["Name"] == x].values[0])

The size of the link is defined as the size of the source node. The color of the link is defined as the color of the target node. Take advantage of the fact that the link and node dataframes have the same index in the same order.

[14]:

link_df["Value"] = node_df["EE [MJ]"]
link_df["Color"] = link_df["Target"].apply(lambda x: node_df.iloc[x]["Color"])
link_df.head()

[14]:

	Source	Target	Value	Color
0	0	14	333.680522	rgb(251,180,174)
1	1	14	159.474427	rgb(251,180,174)
2	2	14	99.896940	rgb(251,180,174)
3	3	0	153.040336	rgb(179,205,227)
4	4	0	117.711949	rgb(179,205,227)

Finally, create the Sankey diagram.

[15]:

fig = go.Figure(
    go.Sankey(
        valueformat = ".0f",
        valuesuffix = " MJ",
        node = dict(
            pad = 15,
            thickness = 15,
            line = dict(color = "black", width = 0.5),
            label = node_df["Name"],
            color = node_df["Color"]
        ),
        link = dict(
            source = link_df["Source"],
            target = link_df["Target"],
            value = link_df["Value"],
            color = link_df["Color"],
        )
    ),
    layout_title_text=f"Embodied Energy [{ENERGY_UNIT}]",
)
fig.show()