Download this example as a Jupyter notebook or a Python script.


Hierarchical plotting#

This example shows how to combine all the results of a sustainability summary query into interactive hierarchical plots.

The following supporting files are required for this example:

For help on constructing an XML BoM, see BoM examples.

Info:

This example uses an input file that is in the 24/12 XML BoM format. This structure requires Granta MI Restricted Substances and Sustainability Reports 2025 R2 or later.

To run this example with an older version of the reports bundle, use sustainability-bom-2301.xml instead. Some sections of this example will produce different results from the published example when this BoM is used.

Run a sustainability summary query#

[1]:
from ansys.grantami.bomanalytics import Connection, queries

MASS_UNIT = "kg"
ENERGY_UNIT = "MJ"
DISTANCE_UNIT = "km"

server_url = "http://my_grantami_server/mi_servicelayer"
cxn = Connection(server_url).with_credentials("user_name", "password").connect()

xml_file_path = "../supporting-files/sustainability-bom-2412.xml"
with open(xml_file_path) as f:
    bom = f.read()

sustainability_summary_query = (
    queries.BomSustainabilitySummaryQuery()
    .with_bom(bom)
    .with_units(mass=MASS_UNIT, energy=ENERGY_UNIT, distance=DISTANCE_UNIT)
)
sustainability_summary = cxn.run(sustainability_summary_query)

Tabulated data#

To plot data hierarchically, first create a dataframe that aggregates all data together. See the other notebooks in this section for more detail around converting these properties to dataframes.

[2]:
import pandas as pd

EE_HEADER = f"EE [{ENERGY_UNIT}]"
CC_HEADER = f"CC [{MASS_UNIT}]"


def create_dataframe_record(item, parent):
    record = {
        "Parent": parent,
        EE_HEADER: item.embodied_energy.value,
        CC_HEADER: item.climate_change.value,
    }

    if parent == "Material":
        record["Name"] = item.identity
    elif parent == "Processes":
        try:  # Joining and finishing processes
            record["Name"] = item.name
        except AttributeError:  # Primary and secondary processes
            record["Name"] = f"{item.process_name} - {item.material_identity}"
    else:
        record["Name"] = item.name
    return record


records = []
records.extend(
    [
        create_dataframe_record(item, "")
        for item in sustainability_summary.phases_summary
    ]
)
records.extend(
    [
        create_dataframe_record(item, "Material")
        for item in sustainability_summary.material_details
    ]
)
records.extend(
    [
        create_dataframe_record(item, "Transport")
        for item in sustainability_summary.transport_details
    ]
)
records.extend(
    [
        create_dataframe_record(item, "Processes")
        for item in (
            sustainability_summary.primary_processes_details +
            sustainability_summary.secondary_processes_details +
            sustainability_summary.joining_and_finishing_processes_details
        )
    ]
)

df = pd.DataFrame.from_records(records)
df.head()
[2]:
Parent EE [MJ] CC [kg] Name
0 333.680522 32.013029 Material
1 159.474427 9.026297 Processes
2 99.896940 6.967125 Transport
3 Material 153.040336 11.963111 stainless-astm-cn-7ms-cast
4 Material 117.711949 15.533614 beryllium-beralcast191-cast

A lot of the rows in the dataframe are small in the context of the overall sustainability impact of the product. Define a function to aggregate all rows that contribute less than 5% of their phase’s sustainability impact into a single row.

[3]:
def sort_and_aggregate_small_values(df: pd.DataFrame) -> pd.DataFrame:
    # Define the criterion
    total_embodied_energy = df[EE_HEADER].sum()
    criterion = df[EE_HEADER] / total_embodied_energy < 0.05

    # Find rows that meet the criterion
    small_rows = df.loc[criterion]

    # If no rows met the aggregation criterion, return the original dataframe and exit
    if len(small_rows) == 0:
        return df

    # Aggregate the rows to a new "Other" row
    df_below_5_pct = small_rows.sum(numeric_only=True).to_frame().T
    df_below_5_pct["Name"] = "Other"

    # Sort all rows that do not meet the criterion by embodied energy
    df_over_5_pct = df.loc[~(criterion)].sort_values(by=EE_HEADER, ascending=False)

    # Concatenate the rows together
    df_aggregated = pd.concat([df_over_5_pct, df_below_5_pct], ignore_index=True)
    return df_aggregated

Apply this function to each sustainability phase, and then perform some additional tidying up of the dataframe.

[4]:
# Apply the function
df_aggregated = df.groupby("Parent").apply(sort_and_aggregate_small_values, include_groups=False)

# Convert the grouped dataframe back into a dataframe with a single index
df_aggregated.reset_index(inplace=True, level="Parent", drop=False)

# Rename the "Other" rows created by the function to include the parent name in the stage name
df_aggregated["Name"] = df_aggregated.apply(
    lambda x: f"Other {x['Parent']}" if x["Name"] == "Other" else x,
    axis="columns",
)["Name"]

# Reset the top-level numeric index
df_aggregated.reset_index(inplace=True, drop=True)

# Display the result
df_aggregated.head(10)
[4]:
Parent EE [MJ] CC [kg] Name
0 333.680522 32.013029 Material
1 159.474427 9.026297 Processes
2 99.896940 6.967125 Transport
3 Material 153.040336 11.963111 stainless-astm-cn-7ms-cast
4 Material 117.711949 15.533614 beryllium-beralcast191-cast
5 Material 62.928237 4.516303 steel-1010-annealed
6 Processes 74.662587 4.505396 Primary processing, Casting - stainless-astm-c...
7 Processes 51.088224 2.488701 Primary processing, Casting - steel-1010-annealed
8 Processes 25.943981 1.655270 Primary processing, Metal extrusion, hot - ste...
9 Processes 7.779635 0.376929 Other Processes

Sunburst chart#

A sunburst chart presents hierarchical data radially.

[5]:
import plotly.graph_objects as go

fig = go.Figure(
    go.Sunburst(
        labels=df_aggregated["Name"],
        parents=df_aggregated["Parent"],
        values=df_aggregated[EE_HEADER],
        branchvalues="total",
    ),
    layout_title_text=f"Embodied Energy [{ENERGY_UNIT}]",
)
fig.show()

Icicle chart#

An icicle chart presents hierarchical data as rectangular sectors.

[6]:
fig = go.Figure(
    go.Icicle(
        labels=df_aggregated["Name"],
        parents=df_aggregated["Parent"],
        values=df_aggregated[EE_HEADER],
        branchvalues="total",
    ),
    layout_title_text=f"Embodied Energy [{ENERGY_UNIT}]",
)
fig.show()

Sankey diagram#

Sankey diagrams represent data as a network of nodes and links, with the relative sizes of these nodes and links representing their contributions to the flow of some quantity. In plotly, Sankey diagrams require nodes and links to be defined explicitly.

First, create a dataframe to store the node data. Start from a copy of the dataframe used for the previous plots.

[7]:
node_df = df_aggregated.copy()

Replace empty parent cells with a reference to a new “Product” row. The new row will be created in the next cell.

[8]:
node_df["Parent"] = df_aggregated["Parent"].replace("", "Product")

Add a new row to represent the entire product. Values for this row are computed based on the sum of all nodes that are direct children of this row.

[9]:
product_row = {
    "Name": "Product",

    # Sum the contributions for all rows which are a child of 'Product'
    EE_HEADER: sum(node_df[node_df["Parent"] == "Product"][EE_HEADER]),
    CC_HEADER: sum(node_df[node_df["Parent"] == "Product"][CC_HEADER]),
    "Parent": "",
}

# Add the row to the end of the dataframe
node_df.loc[len(node_df)] = product_row

Define colors for each node type in the Sankey diagram by mapping a built-in Plotly color swatch to node names. First, attempt to get the color for a node based on its name. If this fails, use the name of the parent node instead.

[10]:
import plotly.express as px

color_map = {
    "Product": px.colors.qualitative.Pastel1[0],
    "Material": px.colors.qualitative.Pastel1[1],
    "Transport": px.colors.qualitative.Pastel1[2],
    "Processes": px.colors.qualitative.Pastel1[3],
}


def get_node_color(x):
    name = x["Name"]
    parent = x["Parent"]

    try:
        return color_map[name]
    except KeyError:
        return color_map[parent]


node_df["Color"] = node_df.apply(get_node_color, axis=1)
node_df.head()
[10]:
Parent EE [MJ] CC [kg] Name Color
0 Product 333.680522 32.013029 Material rgb(179,205,227)
1 Product 159.474427 9.026297 Processes rgb(222,203,228)
2 Product 99.896940 6.967125 Transport rgb(204,235,197)
3 Material 153.040336 11.963111 stainless-astm-cn-7ms-cast rgb(179,205,227)
4 Material 117.711949 15.533614 beryllium-beralcast191-cast rgb(179,205,227)

Next, create a dataframe to store the link information.

Each row in this dataframe represents a link on the Sankey diagram. All links have a ‘source’ and a ‘target’, and nodes may function as a source, as a target, or as both.

[11]:
link_df = pd.DataFrame()

Copy the row index values from the node dataframe to the “Source” column in the new dataframe. Skip the “Product” row, since this node does not act as the source for any links.

[12]:
# Store all nodes which act as sources in a variable for repeated use
source_nodes = node_df[node_df["Name"] != "Product"]

link_df["Source"] = source_nodes.index

Create a “Target” column by using the node dataframe as a cross-reference to infer the hierarchy.

[13]:
link_df["Target"] = source_nodes["Parent"].apply(lambda x: node_df.index[node_df["Name"] == x].values[0])

The size of the link is defined as the size of the source node. The color of the link is defined as the color of the target node. Take advantage of the fact that the link and node dataframes have the same index in the same order.

[14]:
link_df["Value"] = node_df["EE [MJ]"]
link_df["Color"] = link_df["Target"].apply(lambda x: node_df.iloc[x]["Color"])
link_df.head()
[14]:
Source Target Value Color
0 0 14 333.680522 rgb(251,180,174)
1 1 14 159.474427 rgb(251,180,174)
2 2 14 99.896940 rgb(251,180,174)
3 3 0 153.040336 rgb(179,205,227)
4 4 0 117.711949 rgb(179,205,227)

Finally, create the Sankey diagram.

[15]:
fig = go.Figure(
    go.Sankey(
        valueformat = ".0f",
        valuesuffix = " MJ",
        node = dict(
            pad = 15,
            thickness = 15,
            line = dict(color = "black", width = 0.5),
            label = node_df["Name"],
            color = node_df["Color"]
        ),
        link = dict(
            source = link_df["Source"],
            target = link_df["Target"],
            value = link_df["Value"],
            color = link_df["Color"],
        )
    ),
    layout_title_text=f"Embodied Energy [{ENERGY_UNIT}]",
)
fig.show()