{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "20e94bfd",
   "metadata": {},
   "source": [
    "# Hierarchical plotting\n",
    "\n",
    "This example shows how to combine all the results of a sustainability summary query into interactive hierarchical\n",
    "plots.\n",
    "\n",
    "The following supporting files are required for this example:\n",
    "\n",
    "* [sustainability-bom-2412.xml](../supporting-files/sustainability-bom-2412.xml)\n",
    "\n",
    "For help on constructing an XML BoM, see [BoM examples](../6_BoMs/index.rst)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8433b873",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\">\n",
    "\n",
    "**Info:**\n",
    "\n",
    "This example uses an input file that is in the 24/12 XML BoM format. This structure requires Granta MI Restricted\n",
    "Substances and Sustainability Reports 2025 R2 or later.\n",
    "\n",
    "To run this example with an older version of the reports bundle, use\n",
    "[sustainability-bom-2301.xml](../supporting-files/sustainability-bom-2301.xml) instead. Some sections of this example\n",
    "will produce different results from the published example when this BoM is used.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "05278fc8",
   "metadata": {},
   "source": [
    "## Run a sustainability summary query"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0692c79c",
   "metadata": {},
   "outputs": [],
   "source": [
    "from ansys.grantami.bomanalytics import Connection, queries\n",
    "\n",
    "MASS_UNIT = \"kg\"\n",
    "ENERGY_UNIT = \"MJ\"\n",
    "DISTANCE_UNIT = \"km\"\n",
    "\n",
    "server_url = \"http://my_grantami_server/mi_servicelayer\"\n",
    "cxn = Connection(server_url).with_credentials(\"user_name\", \"password\").connect()\n",
    "\n",
    "xml_file_path = \"../supporting-files/sustainability-bom-2412.xml\"\n",
    "with open(xml_file_path) as f:\n",
    "    bom = f.read()\n",
    "\n",
    "sustainability_summary_query = (\n",
    "    queries.BomSustainabilitySummaryQuery()\n",
    "    .with_bom(bom)\n",
    "    .with_units(mass=MASS_UNIT, energy=ENERGY_UNIT, distance=DISTANCE_UNIT)\n",
    ")\n",
    "sustainability_summary = cxn.run(sustainability_summary_query)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7c500aef",
   "metadata": {},
   "source": [
    "## Tabulated data\n",
    "\n",
    "To plot data hierarchically, first create a dataframe that aggregates all data together. See the other notebooks in\n",
    "this section for more detail around converting these properties to dataframes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e8e569eb",
   "metadata": {
    "lines_to_end_of_cell_marker": 0,
    "lines_to_next_cell": 1
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "EE_HEADER = f\"EE [{ENERGY_UNIT}]\"\n",
    "CC_HEADER = f\"CC [{MASS_UNIT}]\"\n",
    "\n",
    "\n",
    "def create_dataframe_record(item, parent):\n",
    "    record = {\n",
    "        \"Parent\": parent,\n",
    "        EE_HEADER: item.embodied_energy.value,\n",
    "        CC_HEADER: item.climate_change.value,\n",
    "    }\n",
    "\n",
    "    if parent == \"Material\":\n",
    "        record[\"Name\"] = item.identity\n",
    "    elif parent == \"Processes\":\n",
    "        try:  # Joining and finishing processes\n",
    "            record[\"Name\"] = item.name\n",
    "        except AttributeError:  # Primary and secondary processes\n",
    "            record[\"Name\"] = f\"{item.process_name} - {item.material_identity}\"\n",
    "    else:\n",
    "        record[\"Name\"] = item.name\n",
    "    return record\n",
    "\n",
    "\n",
    "records = []\n",
    "records.extend(\n",
    "    [\n",
    "        create_dataframe_record(item, \"\")\n",
    "        for item in sustainability_summary.phases_summary\n",
    "    ]\n",
    ")\n",
    "records.extend(\n",
    "    [\n",
    "        create_dataframe_record(item, \"Material\")\n",
    "        for item in sustainability_summary.material_details\n",
    "    ]\n",
    ")\n",
    "records.extend(\n",
    "    [\n",
    "        create_dataframe_record(item, \"Transport\")\n",
    "        for item in sustainability_summary.transport_details\n",
    "    ]\n",
    ")\n",
    "records.extend(\n",
    "    [\n",
    "        create_dataframe_record(item, \"Processes\")\n",
    "        for item in (\n",
    "            sustainability_summary.primary_processes_details +\n",
    "            sustainability_summary.secondary_processes_details +\n",
    "            sustainability_summary.joining_and_finishing_processes_details\n",
    "        )\n",
    "    ]\n",
    ")\n",
    "\n",
    "df = pd.DataFrame.from_records(records)\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9085afb2",
   "metadata": {},
   "source": [
    "A lot of the rows in the dataframe are small in the context of the overall sustainability impact of the\n",
    "product. Define a function to aggregate all rows that contribute less than 5% of their phase's\n",
    "sustainability impact into a single row."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "573729a6",
   "metadata": {},
   "outputs": [],
   "source": [
    "def sort_and_aggregate_small_values(df: pd.DataFrame) -> pd.DataFrame:\n",
    "    # Define the criterion\n",
    "    total_embodied_energy = df[EE_HEADER].sum()\n",
    "    criterion = df[EE_HEADER] / total_embodied_energy < 0.05\n",
    "\n",
    "    # Find rows that meet the criterion\n",
    "    small_rows = df.loc[criterion]\n",
    "\n",
    "    # If no rows met the aggregation criterion, return the original dataframe and exit\n",
    "    if len(small_rows) == 0:\n",
    "        return df\n",
    "\n",
    "    # Aggregate the rows to a new \"Other\" row\n",
    "    df_below_5_pct = small_rows.sum(numeric_only=True).to_frame().T\n",
    "    df_below_5_pct[\"Name\"] = \"Other\"\n",
    "\n",
    "    # Sort all rows that do not meet the criterion by embodied energy\n",
    "    df_over_5_pct = df.loc[~(criterion)].sort_values(by=EE_HEADER, ascending=False)\n",
    "\n",
    "    # Concatenate the rows together\n",
    "    df_aggregated = pd.concat([df_over_5_pct, df_below_5_pct], ignore_index=True)\n",
    "    return df_aggregated"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7f6dc36e",
   "metadata": {},
   "source": [
    "Apply this function to each sustainability phase, and then perform some additional tidying up of\n",
    "the dataframe."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3ef8a415",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Apply the function\n",
    "df_aggregated = df.groupby(\"Parent\").apply(sort_and_aggregate_small_values, include_groups=False)\n",
    "\n",
    "# Convert the grouped dataframe back into a dataframe with a single index\n",
    "df_aggregated.reset_index(inplace=True, level=\"Parent\", drop=False)\n",
    "\n",
    "# Rename the \"Other\" rows created by the function to include the parent name in the stage name\n",
    "df_aggregated[\"Name\"] = df_aggregated.apply(\n",
    "    lambda x: f\"Other {x['Parent']}\" if x[\"Name\"] == \"Other\" else x,\n",
    "    axis=\"columns\",\n",
    ")[\"Name\"]\n",
    "\n",
    "# Reset the top-level numeric index\n",
    "df_aggregated.reset_index(inplace=True, drop=True)\n",
    "\n",
    "# Display the result\n",
    "df_aggregated.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d927c65e",
   "metadata": {},
   "source": [
    "## Sunburst chart"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "94764194",
   "metadata": {},
   "source": [
    "A sunburst chart presents hierarchical data radially."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "73ea6043",
   "metadata": {},
   "outputs": [],
   "source": [
    "import plotly.graph_objects as go\n",
    "\n",
    "fig = go.Figure(\n",
    "    go.Sunburst(\n",
    "        labels=df_aggregated[\"Name\"],\n",
    "        parents=df_aggregated[\"Parent\"],\n",
    "        values=df_aggregated[EE_HEADER],\n",
    "        branchvalues=\"total\",\n",
    "    ),\n",
    "    layout_title_text=f\"Embodied Energy [{ENERGY_UNIT}]\",\n",
    ")\n",
    "fig.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "74c477f8",
   "metadata": {},
   "source": [
    "### Icicle chart"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "30920bfc",
   "metadata": {},
   "source": [
    "An icicle chart presents hierarchical data as rectangular sectors."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "091cd93e",
   "metadata": {},
   "outputs": [],
   "source": [
    "fig = go.Figure(\n",
    "    go.Icicle(\n",
    "        labels=df_aggregated[\"Name\"],\n",
    "        parents=df_aggregated[\"Parent\"],\n",
    "        values=df_aggregated[EE_HEADER],\n",
    "        branchvalues=\"total\",\n",
    "    ),\n",
    "    layout_title_text=f\"Embodied Energy [{ENERGY_UNIT}]\",\n",
    ")\n",
    "fig.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f753b8bd",
   "metadata": {},
   "source": [
    "## Sankey diagram"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b2b0b3ec",
   "metadata": {},
   "source": [
    "Sankey diagrams represent data as a network of nodes and links, with the relative sizes of these nodes and links\n",
    "representing their contributions to the flow of some quantity. In plotly, Sankey diagrams require nodes and links to\n",
    "be defined explicitly.\n",
    "\n",
    "First, create a dataframe to store the node data. Start from a copy of the dataframe used for the previous plots."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "20bbe9f8",
   "metadata": {},
   "outputs": [],
   "source": [
    "node_df = df_aggregated.copy()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "46c30046",
   "metadata": {},
   "source": [
    "Replace empty parent cells with a reference to a new \"Product\" row. The new row will be created in the next cell."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d32f2519",
   "metadata": {},
   "outputs": [],
   "source": [
    "node_df[\"Parent\"] = df_aggregated[\"Parent\"].replace(\"\", \"Product\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "533522e5",
   "metadata": {},
   "source": [
    "Add a new row to represent the entire product. Values for this row are computed based on the sum of all nodes that are\n",
    "direct children of this row."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "65a91dfe",
   "metadata": {},
   "outputs": [],
   "source": [
    "product_row = {\n",
    "    \"Name\": \"Product\",\n",
    "\n",
    "    # Sum the contributions for all rows which are a child of 'Product'\n",
    "    EE_HEADER: sum(node_df[node_df[\"Parent\"] == \"Product\"][EE_HEADER]),\n",
    "    CC_HEADER: sum(node_df[node_df[\"Parent\"] == \"Product\"][CC_HEADER]),\n",
    "    \"Parent\": \"\",\n",
    "}\n",
    "\n",
    "# Add the row to the end of the dataframe\n",
    "node_df.loc[len(node_df)] = product_row"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "556ae91d",
   "metadata": {},
   "source": [
    "Define colors for each node type in the Sankey diagram by mapping a built-in Plotly color swatch to node names. First,\n",
    "attempt to get the color for a node based on its name. If this fails, use the name of the parent node instead."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "80b5e7ba",
   "metadata": {},
   "outputs": [],
   "source": [
    "import plotly.express as px\n",
    "\n",
    "color_map = {\n",
    "    \"Product\": px.colors.qualitative.Pastel1[0],\n",
    "    \"Material\": px.colors.qualitative.Pastel1[1],\n",
    "    \"Transport\": px.colors.qualitative.Pastel1[2],\n",
    "    \"Processes\": px.colors.qualitative.Pastel1[3],\n",
    "}\n",
    "\n",
    "\n",
    "def get_node_color(x):\n",
    "    name = x[\"Name\"]\n",
    "    parent = x[\"Parent\"]\n",
    "\n",
    "    try:\n",
    "        return color_map[name]\n",
    "    except KeyError:\n",
    "        return color_map[parent]\n",
    "\n",
    "\n",
    "node_df[\"Color\"] = node_df.apply(get_node_color, axis=1)\n",
    "node_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7c1224aa",
   "metadata": {},
   "source": [
    "Next, create a dataframe to store the link information.\n",
    "\n",
    "Each row in this dataframe represents a link on the Sankey diagram. All links have a 'source' and a 'target', and\n",
    "nodes may function as a source, as a target, or as both."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f2df54cc",
   "metadata": {},
   "outputs": [],
   "source": [
    "link_df = pd.DataFrame()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1fb37a8b",
   "metadata": {},
   "source": [
    "Copy the row index values from the node dataframe to the \"Source\" column in the new dataframe. Skip the \"Product\" row,\n",
    "since this node does not act as the source for any links."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5ff3e855",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Store all nodes which act as sources in a variable for repeated use\n",
    "source_nodes = node_df[node_df[\"Name\"] != \"Product\"]\n",
    "\n",
    "link_df[\"Source\"] = source_nodes.index"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5453e3ac",
   "metadata": {},
   "source": [
    "Create a \"Target\" column by using the node dataframe as a cross-reference to infer the hierarchy."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4c2de5ba",
   "metadata": {},
   "outputs": [],
   "source": [
    "link_df[\"Target\"] = source_nodes[\"Parent\"].apply(lambda x: node_df.index[node_df[\"Name\"] == x].values[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "66d550d6",
   "metadata": {},
   "source": [
    "The size of the link is defined as the size of the source node. The color of the link is defined as the color of the\n",
    "target node. Take advantage of the fact that the link and node dataframes have the same index in the same order."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a54e8135",
   "metadata": {},
   "outputs": [],
   "source": [
    "link_df[\"Value\"] = node_df[\"EE [MJ]\"]\n",
    "link_df[\"Color\"] = link_df[\"Target\"].apply(lambda x: node_df.iloc[x][\"Color\"])\n",
    "link_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2a2ccfcc",
   "metadata": {},
   "source": [
    "Finally, create the Sankey diagram."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9d992773",
   "metadata": {},
   "outputs": [],
   "source": [
    "fig = go.Figure(\n",
    "    go.Sankey(\n",
    "        valueformat = \".0f\",\n",
    "        valuesuffix = \" MJ\",\n",
    "        node = dict(\n",
    "            pad = 15,\n",
    "            thickness = 15,\n",
    "            line = dict(color = \"black\", width = 0.5),\n",
    "            label = node_df[\"Name\"],\n",
    "            color = node_df[\"Color\"]\n",
    "        ),\n",
    "        link = dict(\n",
    "            source = link_df[\"Source\"],\n",
    "            target = link_df[\"Target\"],\n",
    "            value = link_df[\"Value\"],\n",
    "            color = link_df[\"Color\"],\n",
    "        )\n",
    "    ),\n",
    "    layout_title_text=f\"Embodied Energy [{ENERGY_UNIT}]\",\n",
    ")\n",
    "fig.show()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}