diff --git a/lesson7_files/EDA_Refactored.ipynb b/lesson7_files/EDA_Refactored.ipynb
index 6571c8a..96983a9 100644
--- a/lesson7_files/EDA_Refactored.ipynb
+++ b/lesson7_files/EDA_Refactored.ipynb
@@ -4,22 +4,20 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "# E-commerce Business Analytics Dashboard\n",
+ "# E-commerce Business Intelligence Analysis\n",
"\n",
- "A comprehensive analysis of e-commerce sales data focusing on business performance metrics, customer satisfaction, and operational efficiency.\n",
+ "A comprehensive exploratory data analysis of e-commerce sales data focusing on business metrics, performance trends, and strategic insights.\n",
"\n",
"## Table of Contents\n",
- "\n",
"1. [Introduction & Business Objectives](#introduction)\n",
- "2. [Data Loading & Configuration](#data-loading)\n",
- "3. [Data Dictionary](#data-dictionary)\n",
- "4. [Data Preparation & Transformation](#data-preparation)\n",
- "5. [Business Metrics Analysis](#business-metrics)\n",
- " - [Revenue Performance Analysis](#revenue-analysis)\n",
- " - [Product Category Performance](#product-analysis)\n",
- " - [Geographic Performance Analysis](#geographic-analysis)\n",
- " - [Customer Experience Analysis](#customer-analysis)\n",
- "6. [Summary of Key Observations](#summary)\n",
+ "2. [Configuration & Setup](#configuration)\n",
+ "3. [Data Loading & Validation](#data-loading)\n",
+ "4. [Data Dictionary](#data-dictionary)\n",
+ "5. [Revenue Performance Analysis](#revenue-analysis)\n",
+ "6. [Product Category Performance](#product-analysis)\n",
+ "7. [Geographic Performance Analysis](#geographic-analysis)\n",
+ "8. [Customer Experience Analysis](#customer-analysis)\n",
+ "9. [Executive Summary & Recommendations](#summary)\n",
"\n",
"---"
]
@@ -30,71 +28,56 @@
"source": [
"## 1. Introduction & Business Objectives {#introduction}\n",
"\n",
- "This analysis provides insights into e-commerce business performance through comprehensive examination of sales data. The primary objectives are:\n",
- "\n",
- "- **Revenue Performance**: Analyze total revenue, growth trends, and order patterns\n",
- "- **Product Strategy**: Identify top-performing categories and optimization opportunities\n",
- "- **Geographic Insights**: Understand regional performance variations\n",
- "- **Customer Satisfaction**: Evaluate delivery performance and review metrics\n",
- "- **Operational Efficiency**: Assess delivery times and fulfillment quality\n",
- "\n",
- "### Analysis Configuration\n",
- "\n",
- "The analysis can be configured for different time periods by adjusting the parameters below:"
+ "This analysis provides comprehensive business intelligence insights for e-commerce operations, focusing on key performance indicators and strategic metrics.\n",
+ "\n",
+ "### Primary Business Questions:\n",
+ "- How has revenue performance changed over time?\n",
+ "- Which product categories drive the most revenue?\n",
+ "- What are the geographic patterns in sales performance?\n",
+ "- How satisfied are customers with our service?\n",
+ "- What delivery performance metrics indicate operational efficiency?\n",
+ "\n",
+ "### Key Performance Indicators (KPIs):\n",
+ "- **Revenue Metrics**: Total revenue, growth rates, average order value\n",
+ "- **Product Performance**: Category revenue distribution, top performers\n",
+ "- **Geographic Insights**: State-level performance analysis\n",
+ "- **Customer Satisfaction**: Review scores and delivery experience\n",
+ "- **Operational Efficiency**: Delivery times and fulfillment rates"
]
},
{
- "cell_type": "code",
- "execution_count": 1,
+ "cell_type": "markdown",
"metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "Analysis Period: 2023\n",
- "Comparison Period: 2022\n",
- "Month Filter: Full Year\n"
- ]
- }
- ],
"source": [
- "# Analysis Configuration\n",
- "ANALYSIS_YEAR = 2023\n",
- "COMPARISON_YEAR = 2022\n",
- "ANALYSIS_MONTH = None # Set to specific month (1-12) or None for full year\n",
- "DATA_PATH = 'ecommerce_data/'\n",
+ "## 2. Configuration & Setup {#configuration}\n",
"\n",
- "print(f\"Analysis Period: {ANALYSIS_YEAR}\")\n",
- "print(f\"Comparison Period: {COMPARISON_YEAR}\")\n",
- "if ANALYSIS_MONTH:\n",
- " print(f\"Month Filter: {ANALYSIS_MONTH}\")\n",
- "else:\n",
- " print(\"Month Filter: Full Year\")"
+ "Configure analysis parameters to customize the time period and comparison metrics."
]
},
{
- "cell_type": "markdown",
+ "cell_type": "code",
+ "execution_count": null,
"metadata": {},
+ "outputs": [],
"source": [
- "## 2. Data Loading & Configuration {#data-loading}\n",
+ "# Analysis Configuration Parameters\n",
+ "ANALYSIS_YEAR = 2023 # Primary year to analyze\n",
+ "COMPARISON_YEAR = 2022 # Comparison year for growth calculations\n",
+ "ANALYSIS_MONTH = None # Specific month (1-12) or None for full year\n",
+ "DATA_PATH = 'ecommerce_data/'\n",
"\n",
- "Loading all required datasets and initializing the analysis framework."
+ "# Display configuration\n",
+ "print(f\"Analysis Period: {ANALYSIS_YEAR}\")\n",
+ "print(f\"Comparison Period: {COMPARISON_YEAR}\")\n",
+ "print(f\"Month Filter: {'Full Year' if ANALYSIS_MONTH is None else f'Month {ANALYSIS_MONTH}'}\")\n",
+ "print(f\"Data Source: {DATA_PATH}\")"
]
},
{
"cell_type": "code",
- "execution_count": 2,
+ "execution_count": null,
"metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "Libraries imported successfully\n"
- ]
- }
- ],
+ "outputs": [],
"source": [
"# Import required libraries\n",
"import pandas as pd\n",
@@ -111,1874 +94,672 @@
"from business_metrics import BusinessMetricsCalculator, MetricsVisualizer, print_metrics_summary\n",
"\n",
"# Configure display options\n",
- "warnings.filterwarnings('ignore')\n",
- "plt.style.use('default')\n",
- "sns.set_palette(\"husl\")\n",
"pd.set_option('display.max_columns', None)\n",
- "pd.set_option('display.precision', 2)\n",
+ "pd.set_option('display.float_format', '{:.2f}'.format)\n",
+ "warnings.filterwarnings('ignore')\n",
+ "plt.style.use('seaborn-v0_8')\n",
+ "\n",
+ "print(\"Libraries and modules loaded successfully\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 3. Data Loading & Validation {#data-loading}\n",
"\n",
- "print(\"Libraries imported successfully\")"
+ "Load and validate all e-commerce datasets using the modular data loading framework."
]
},
{
"cell_type": "code",
- "execution_count": 3,
+ "execution_count": null,
"metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "Loaded orders: 10000 records\n",
- "Loaded order_items: 16047 records\n",
- "Loaded products: 6000 records\n",
- "Loaded customers: 8000 records\n",
- "Loaded reviews: 6571 records\n",
- "Loaded payments: 14091 records\n",
- "Dataset Summary:\n",
- "==================================================\n",
- "ORDERS:\n",
- " Rows: 10,000\n",
- " Columns: 11\n",
- " Memory: 2.9 MB\n",
- " Date Range: 2021-12-31 to 2024-01-01\n",
- "\n",
- "ORDER_ITEMS:\n",
- " Rows: 16,047\n",
- " Columns: 8\n",
- " Memory: 4.2 MB\n",
- "\n",
- "REVIEWS:\n",
- " Rows: 6,571\n",
- " Columns: 7\n",
- " Memory: 2.3 MB\n",
- "\n"
- ]
- }
- ],
+ "outputs": [],
"source": [
- "# Load and process all data\n",
+ "# Load and process all data using the data loader module\n",
+ "print(\"Loading e-commerce datasets...\")\n",
"loader, processed_data = load_and_process_data(DATA_PATH)\n",
"\n",
"# Display data summary\n",
- "data_summary = loader.get_data_summary()\n",
- "print(\"Dataset Summary:\")\n",
+ "summary = loader.get_data_summary()\n",
+ "print(\"\\nDataset Summary:\")\n",
"print(\"=\" * 50)\n",
- "for dataset, info in data_summary.items():\n",
- " print(f\"{dataset.upper()}:\")\n",
+ "for dataset_name, info in summary.items():\n",
+ " print(f\"{dataset_name.upper()}:\")\n",
" print(f\" Rows: {info['rows']:,}\")\n",
" print(f\" Columns: {info['columns']}\")\n",
- " print(f\" Memory: {info['memory_usage_mb']:.1f} MB\")\n",
+ " print(f\" Memory Usage: {info['memory_usage_mb']:.1f} MB\")\n",
" if info['date_range']:\n",
" print(f\" Date Range: {info['date_range']['start'].date()} to {info['date_range']['end'].date()}\")\n",
" print()"
]
},
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## 3. Data Dictionary {#data-dictionary}\n",
- "\n",
- "### Key Business Terms and Column Definitions\n",
- "\n",
- "| Column | Description | Business Impact |\n",
- "|--------|-------------|----------------|\n",
- "| **order_id** | Unique identifier for each customer order | Primary key for order-level analysis |\n",
- "| **price** | Item price excluding shipping | Core revenue metric |\n",
- "| **freight_value** | Shipping cost for the item | Additional revenue and cost analysis |\n",
- "| **order_status** | Current status of the order | Operational efficiency indicator |\n",
- "| **order_purchase_timestamp** | When the order was placed | Time-based analysis and trends |\n",
- "| **order_delivered_customer_date** | When order was delivered to customer | Delivery performance metric |\n",
- "| **product_category_name** | Product category classification | Product strategy and inventory planning |\n",
- "| **customer_state** | Customer's state location | Geographic market analysis |\n",
- "| **review_score** | Customer satisfaction rating (1-5) | Customer experience indicator |\n",
- "\n",
- "### Calculated Metrics\n",
- "\n",
- "- **Total Revenue**: Sum of all item prices for delivered orders\n",
- "- **Average Order Value (AOV)**: Average total value per order\n",
- "- **Delivery Days**: Time between order placement and delivery\n",
- "- **Revenue Growth**: Year-over-year percentage change in revenue\n",
- "- **Customer Satisfaction**: Distribution and average of review scores"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## 4. Data Preparation & Transformation {#data-preparation}\n",
- "\n",
- "Creating the comprehensive sales dataset for analysis with configurable time filters."
- ]
- },
{
"cell_type": "code",
- "execution_count": 4,
+ "execution_count": null,
"metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "Analysis Dataset Summary:\n",
- "Total Records: 7,448\n",
- "Unique Orders: 4,635\n",
- "Date Range: 2023-01-01 to 2023-12-31\n",
- "Total Revenue: $3,360,294.74\n",
- "\n",
- "Available columns: ['order_id', 'price', 'purchase_year', 'purchase_month', 'product_category_name', 'customer_state', 'review_score', 'delivery_days']\n",
- "\n",
- "Sample Data:\n"
- ]
- },
- {
- "data": {
- "text/html": [
- "
\n",
- "\n",
- "
\n",
- " \n",
- " \n",
- " \n",
- " order_id \n",
- " price \n",
- " purchase_year \n",
- " purchase_month \n",
- " product_category_name \n",
- " customer_state \n",
- " review_score \n",
- " delivery_days \n",
- " \n",
- " \n",
- " \n",
- " \n",
- " 0 \n",
- " ord_5fa044951857e02fd1347b47 \n",
- " 111.91 \n",
- " 2023 \n",
- " 4 \n",
- " grocery_gourmet_food \n",
- " TN \n",
- " 5.0 \n",
- " 6 \n",
- " \n",
- " \n",
- " 1 \n",
- " ord_5fa044951857e02fd1347b47 \n",
- " 878.42 \n",
- " 2023 \n",
- " 4 \n",
- " electronics \n",
- " TN \n",
- " 5.0 \n",
- " 6 \n",
- " \n",
- " \n",
- " 2 \n",
- " ord_43b53981d951f855231d09ec \n",
- " 749.83 \n",
- " 2023 \n",
- " 12 \n",
- " sports_outdoors \n",
- " FL \n",
- " 5.0 \n",
- " 9 \n",
- " \n",
- " \n",
- " 3 \n",
- " ord_e60b1e267fd32d93c4d0745b \n",
- " 361.54 \n",
- " 2023 \n",
- " 4 \n",
- " home_garden \n",
- " PA \n",
- " 5.0 \n",
- " 11 \n",
- " \n",
- " \n",
- " 4 \n",
- " ord_e60b1e267fd32d93c4d0745b \n",
- " 25.59 \n",
- " 2023 \n",
- " 4 \n",
- " grocery_gourmet_food \n",
- " PA \n",
- " 5.0 \n",
- " 11 \n",
- " \n",
- " \n",
- "
\n",
- "
"
- ],
- "text/plain": [
- " order_id price purchase_year purchase_month \\\n",
- "0 ord_5fa044951857e02fd1347b47 111.91 2023 4 \n",
- "1 ord_5fa044951857e02fd1347b47 878.42 2023 4 \n",
- "2 ord_43b53981d951f855231d09ec 749.83 2023 12 \n",
- "3 ord_e60b1e267fd32d93c4d0745b 361.54 2023 4 \n",
- "4 ord_e60b1e267fd32d93c4d0745b 25.59 2023 4 \n",
- "\n",
- " product_category_name customer_state review_score delivery_days \n",
- "0 grocery_gourmet_food TN 5.0 6 \n",
- "1 electronics TN 5.0 6 \n",
- "2 sports_outdoors FL 5.0 9 \n",
- "3 home_garden PA 5.0 11 \n",
- "4 grocery_gourmet_food PA 5.0 11 "
- ]
- },
- "metadata": {},
- "output_type": "display_data"
- }
- ],
+ "outputs": [],
"source": [
- "# Create sales dataset for analysis period\n",
+ "# Create comprehensive sales dataset for analysis\n",
+ "print(\"Creating comprehensive sales dataset...\")\n",
"sales_data = loader.create_sales_dataset(\n",
- " year_filter=ANALYSIS_YEAR,\n",
+ " year_filter=None, # Load all years for comparison\n",
" month_filter=ANALYSIS_MONTH,\n",
- " status_filter='delivered'\n",
+ " status_filter='delivered' # Focus on completed orders\n",
")\n",
"\n",
- "print(f\"Analysis Dataset Summary:\")\n",
- "print(f\"Total Records: {len(sales_data):,}\")\n",
- "print(f\"Unique Orders: {sales_data['order_id'].nunique():,}\")\n",
- "print(f\"Date Range: {sales_data['order_purchase_timestamp'].min().date()} to {sales_data['order_purchase_timestamp'].max().date()}\")\n",
- "print(f\"Total Revenue: ${sales_data['price'].sum():,.2f}\")\n",
- "\n",
- "# Display sample of the dataset - only show available columns\n",
- "available_columns = ['order_id', 'price', 'purchase_year', 'purchase_month']\n",
- "optional_columns = ['product_category_name', 'customer_state', 'review_score', 'delivery_days']\n",
+ "print(f\"Sales dataset created with {len(sales_data):,} records\")\n",
+ "print(f\"Date range: {sales_data['order_purchase_timestamp'].min().date()} to {sales_data['order_purchase_timestamp'].max().date()}\")\n",
"\n",
- "# Add optional columns if they exist\n",
- "for col in optional_columns:\n",
- " if col in sales_data.columns:\n",
- " available_columns.append(col)\n",
+ "# Display sample data\n",
+ "print(\"\\nSample sales data:\")\n",
+ "display(sales_data.head())\n",
"\n",
- "print(f\"\\nAvailable columns: {available_columns}\")\n",
- "print(\"\\nSample Data:\")\n",
- "display(sales_data[available_columns].head())"
+ "print(f\"\\nData quality check:\")\n",
+ "print(f\"Unique orders: {sales_data['order_id'].nunique():,}\")\n",
+ "print(f\"Unique products: {sales_data['product_id'].nunique():,}\")\n",
+ "print(f\"Unique customers: {sales_data['customer_id'].nunique():,}\")\n",
+ "print(f\"Average order value: ${sales_data.groupby('order_id')['price'].sum().mean():.2f}\")"
]
},
{
- "cell_type": "code",
- "execution_count": 5,
+ "cell_type": "markdown",
"metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "Comparison Dataset (2022):\n",
- "Total Records: 7,641\n",
- "Unique Orders: 4,749\n",
- "Total Revenue: $3,445,076.96\n"
- ]
- }
- ],
"source": [
- "# Create comparison dataset if comparison year is specified\n",
- "comparison_data = None\n",
- "if COMPARISON_YEAR:\n",
- " comparison_data = loader.create_sales_dataset(\n",
- " year_filter=COMPARISON_YEAR,\n",
- " month_filter=ANALYSIS_MONTH,\n",
- " status_filter='delivered'\n",
- " )\n",
- " \n",
- " print(f\"Comparison Dataset ({COMPARISON_YEAR}):\")\n",
- " print(f\"Total Records: {len(comparison_data):,}\")\n",
- " print(f\"Unique Orders: {comparison_data['order_id'].nunique():,}\")\n",
- " print(f\"Total Revenue: ${comparison_data['price'].sum():,.2f}\")\n",
+ "## 4. Data Dictionary {#data-dictionary}\n",
"\n",
- "# Create combined dataset for year-over-year analysis\n",
- "if COMPARISON_YEAR:\n",
- " combined_data = loader.create_sales_dataset(\n",
- " month_filter=ANALYSIS_MONTH,\n",
- " status_filter='delivered'\n",
- " )\n",
- " # Filter to only include analysis and comparison years\n",
- " combined_data = combined_data[\n",
- " combined_data['purchase_year'].isin([ANALYSIS_YEAR, COMPARISON_YEAR])\n",
- " ]\n",
- "else:\n",
- " combined_data = sales_data"
+ "Understanding key columns and business terms used throughout the analysis."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
- "## 5. Business Metrics Analysis {#business-metrics}\n",
+ "### Key Business Metrics\n",
+ "\n",
+ "| Metric | Definition | Business Significance |\n",
+ "|--------|------------|----------------------|\n",
+ "| **Revenue** | Sum of delivered order item prices | Core business performance indicator |\n",
+ "| **Average Order Value (AOV)** | Total revenue divided by number of orders | Customer spending behavior metric |\n",
+ "| **Revenue Growth Rate** | Year-over-year percentage change in revenue | Business expansion indicator |\n",
+ "| **Customer Satisfaction** | Average review score (1-5 scale) | Service quality measurement |\n",
+ "| **Delivery Performance** | Average days from purchase to delivery | Operational efficiency indicator |\n",
+ "\n",
+ "### Data Schema\n",
+ "\n",
+ "| Column | Description | Data Type |\n",
+ "|--------|-------------|----------|\n",
+ "| `order_id` | Unique identifier for each order | String |\n",
+ "| `customer_id` | Unique identifier for each customer | String |\n",
+ "| `product_id` | Unique identifier for each product | String |\n",
+ "| `price` | Item price in USD | Float |\n",
+ "| `order_purchase_timestamp` | Date and time of order placement | Datetime |\n",
+ "| `order_status` | Current status of the order | String |\n",
+ "| `product_category_name` | Product category classification | String |\n",
+ "| `customer_state` | Customer's state location | String |\n",
+ "| `review_score` | Customer satisfaction rating (1-5) | Integer |\n",
+ "| `delivery_days` | Days from order to delivery | Integer |\n",
+ "\n",
+ "### Order Status Categories\n",
+ "- **delivered**: Successfully completed orders (primary focus)\n",
+ "- **shipped**: Orders in transit\n",
+ "- **processing**: Orders being prepared\n",
+ "- **canceled**: Canceled orders\n",
+ "- **pending**: Orders awaiting processing\n",
+ "- **returned**: Returned orders"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 5. Revenue Performance Analysis {#revenue-analysis}\n",
"\n",
- "Comprehensive analysis of key business performance indicators."
+ "Comprehensive analysis of revenue trends, growth patterns, and key financial metrics."
]
},
{
"cell_type": "code",
- "execution_count": 6,
+ "execution_count": null,
"metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "============================================================\n",
- "BUSINESS METRICS SUMMARY - 2023\n",
- "============================================================\n",
- "\n",
- "REVENUE PERFORMANCE:\n",
- " Total Revenue: $3,360,294.74\n",
- " Total Orders: 4,635\n",
- " Average Order Value: $724.98\n",
- " Revenue Growth: -2.5%\n",
- " Order Growth: -2.4%\n",
- "\n",
- "CUSTOMER SATISFACTION:\n",
- " Average Review Score: 4.10/5.0\n",
- " High Satisfaction (4+): 51.6%\n",
- "\n",
- "DELIVERY PERFORMANCE:\n",
- " Average Delivery Time: 8.0 days\n",
- " Fast Delivery (≤3 days): 7.2%\n",
- "============================================================\n"
- ]
- }
- ],
+ "outputs": [],
"source": [
- "# Initialize metrics calculator\n",
- "metrics_calc = BusinessMetricsCalculator(combined_data)\n",
+ "# Initialize business metrics calculator\n",
+ "metrics_calculator = BusinessMetricsCalculator(sales_data)\n",
"\n",
- "# Generate comprehensive report\n",
- "business_report = metrics_calc.generate_comprehensive_report(\n",
+ "# Calculate comprehensive revenue metrics\n",
+ "print(f\"Calculating revenue metrics for {ANALYSIS_YEAR}...\")\n",
+ "revenue_metrics = metrics_calculator.calculate_revenue_metrics(\n",
" current_year=ANALYSIS_YEAR,\n",
" previous_year=COMPARISON_YEAR\n",
")\n",
"\n",
- "# Print executive summary\n",
- "print_metrics_summary(business_report)"
+ "# Display revenue performance summary\n",
+ "print(\"\\nREVENUE PERFORMANCE SUMMARY\")\n",
+ "print(\"=\" * 40)\n",
+ "print(f\"Analysis Period: {ANALYSIS_YEAR}\")\n",
+ "print(f\"Total Revenue: ${revenue_metrics['total_revenue']:,.2f}\")\n",
+ "print(f\"Total Orders: {revenue_metrics['total_orders']:,}\")\n",
+ "print(f\"Average Order Value: ${revenue_metrics['average_order_value']:.2f}\")\n",
+ "print(f\"Total Items Sold: {revenue_metrics['total_items_sold']:,}\")\n",
+ "\n",
+ "if COMPARISON_YEAR:\n",
+ " print(f\"\\nCOMPARISON TO {COMPARISON_YEAR}:\")\n",
+ " print(f\"Revenue Growth: {revenue_metrics['revenue_growth_rate']:.2f}%\")\n",
+ " print(f\"Order Growth: {revenue_metrics['order_growth_rate']:.2f}%\")\n",
+ " print(f\"AOV Growth: {revenue_metrics['aov_growth_rate']:.2f}%\")\n",
+ "\n",
+ "print(\"\\n\" + \"=\"*40)"
]
},
{
- "cell_type": "markdown",
+ "cell_type": "code",
+ "execution_count": null,
"metadata": {},
+ "outputs": [],
"source": [
- "### 5.1 Revenue Performance Analysis {#revenue-analysis}\n",
+ "# Monthly revenue trend analysis\n",
+ "monthly_trends = metrics_calculator.calculate_monthly_trends(ANALYSIS_YEAR)\n",
+ "\n",
+ "print(f\"Monthly Revenue Trends for {ANALYSIS_YEAR}:\")\n",
+ "display(monthly_trends)\n",
"\n",
- "Analyzing overall revenue trends, growth patterns, and key performance indicators."
+ "# Calculate average monthly growth\n",
+ "avg_monthly_growth = monthly_trends['revenue_growth'].mean()\n",
+ "print(f\"\\nAverage Monthly Growth Rate: {avg_monthly_growth:.2f}%\")"
]
},
{
"cell_type": "code",
- "execution_count": 7,
+ "execution_count": null,
"metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "DETAILED REVENUE ANALYSIS - 2023\n",
- "==================================================\n",
- "Total Revenue: $3,360,294.74\n",
- "Total Orders: 4,635\n",
- "Total Items Sold: 7,448\n",
- "Average Order Value: $724.98\n",
- "\n",
- "YEAR-OVER-YEAR COMPARISON:\n",
- "Revenue Growth: -2.46%\n",
- "Order Growth: -2.40%\n",
- "AOV Growth: -0.06%\n",
- "\n",
- "⚠️ Negative revenue growth requires attention\n"
- ]
- }
- ],
+ "outputs": [],
"source": [
- "# Revenue metrics deep dive\n",
- "revenue_metrics = business_report['revenue_metrics']\n",
- "\n",
- "print(f\"DETAILED REVENUE ANALYSIS - {ANALYSIS_YEAR}\")\n",
- "print(\"=\" * 50)\n",
- "print(f\"Total Revenue: ${revenue_metrics['total_revenue']:,.2f}\")\n",
- "print(f\"Total Orders: {revenue_metrics['total_orders']:,}\")\n",
- "print(f\"Total Items Sold: {revenue_metrics['total_items_sold']:,}\")\n",
- "print(f\"Average Order Value: ${revenue_metrics['average_order_value']:,.2f}\")\n",
- "\n",
- "if COMPARISON_YEAR and 'revenue_growth_rate' in revenue_metrics:\n",
- " print(f\"\\nYEAR-OVER-YEAR COMPARISON:\")\n",
- " print(f\"Revenue Growth: {revenue_metrics['revenue_growth_rate']:+.2f}%\")\n",
- " print(f\"Order Growth: {revenue_metrics['order_growth_rate']:+.2f}%\")\n",
- " print(f\"AOV Growth: {revenue_metrics['aov_growth_rate']:+.2f}%\")\n",
- " \n",
- " # Growth interpretation\n",
- " if revenue_metrics['revenue_growth_rate'] > 0:\n",
- " print(\"\\n✅ Positive revenue growth indicates business expansion\")\n",
- " else:\n",
- " print(\"\\n⚠️ Negative revenue growth requires attention\")"
+ "# Create revenue trend visualization\n",
+ "fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(15, 12))\n",
+ "\n",
+ "# Monthly revenue trend\n",
+ "ax1.plot(monthly_trends['month'], monthly_trends['revenue'], \n",
+ " marker='o', linewidth=3, markersize=8, color='#1f77b4')\n",
+ "ax1.set_title(f'Monthly Revenue Trend - {ANALYSIS_YEAR}', fontsize=16, fontweight='bold', pad=20)\n",
+ "ax1.set_xlabel('Month', fontsize=12)\n",
+ "ax1.set_ylabel('Revenue ($)', fontsize=12)\n",
+ "ax1.grid(True, alpha=0.3)\n",
+ "ax1.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'${x:,.0f}'))\n",
+ "\n",
+ "# Add data labels\n",
+ "for i, v in enumerate(monthly_trends['revenue']):\n",
+ " ax1.annotate(f'${v:,.0f}', (monthly_trends['month'].iloc[i], v), \n",
+ " textcoords=\"offset points\", xytext=(0,10), ha='center', fontsize=9)\n",
+ "\n",
+ "# Monthly growth rate\n",
+ "colors = ['green' if x > 0 else 'red' for x in monthly_trends['revenue_growth'].dropna()]\n",
+ "ax2.bar(monthly_trends['month'].iloc[1:], monthly_trends['revenue_growth'].dropna(), color=colors, alpha=0.7)\n",
+ "ax2.set_title(f'Month-over-Month Revenue Growth Rate - {ANALYSIS_YEAR}', fontsize=16, fontweight='bold', pad=20)\n",
+ "ax2.set_xlabel('Month', fontsize=12)\n",
+ "ax2.set_ylabel('Growth Rate (%)', fontsize=12)\n",
+ "ax2.grid(True, alpha=0.3)\n",
+ "ax2.axhline(y=0, color='black', linestyle='-', alpha=0.5)\n",
+ "\n",
+ "plt.tight_layout()\n",
+ "plt.show()"
]
},
{
- "cell_type": "code",
- "execution_count": 8,
+ "cell_type": "markdown",
"metadata": {},
- "outputs": [
- {
- "data": {
- "image/png": "",
- "text/plain": [
- ""
- ]
- },
- "metadata": {},
- "output_type": "display_data"
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "\n",
- "MONTHLY PERFORMANCE INSIGHTS:\n",
- "Best Revenue Month: Month 9 ($303,793)\n",
- "Lowest Revenue Month: Month 4 ($253,795)\n",
- "Average Monthly Growth: -0.39%\n",
- "Revenue Volatility (Std Dev): $17,285\n"
- ]
- }
- ],
"source": [
- "# Monthly revenue trend visualization\n",
- "visualizer = MetricsVisualizer(business_report)\n",
- "revenue_fig = visualizer.plot_revenue_trend(figsize=(14, 8))\n",
- "plt.show()\n",
- "\n",
- "# Monthly trends analysis\n",
- "monthly_trends = business_report['monthly_trends']\n",
- "print(f\"\\nMONTHLY PERFORMANCE INSIGHTS:\")\n",
- "print(f\"Best Revenue Month: Month {monthly_trends.loc[monthly_trends['revenue'].idxmax(), 'month']} (${monthly_trends['revenue'].max():,.0f})\")\n",
- "print(f\"Lowest Revenue Month: Month {monthly_trends.loc[monthly_trends['revenue'].idxmin(), 'month']} (${monthly_trends['revenue'].min():,.0f})\")\n",
- "print(f\"Average Monthly Growth: {monthly_trends['revenue_growth'].mean():.2f}%\")\n",
- "print(f\"Revenue Volatility (Std Dev): ${monthly_trends['revenue'].std():,.0f}\")"
+ "### Revenue Performance Insights\n",
+ "\n",
+ "**Key Findings:**\n",
+ "- Revenue performance shows seasonal patterns with variations throughout the year\n",
+ "- Month-over-month growth rates indicate business volatility and trend directions\n",
+ "- Average order value trends reflect customer spending behavior changes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
- "### 5.2 Product Category Performance {#product-analysis}\n",
+ "## 6. Product Category Performance {#product-analysis}\n",
"\n",
- "Understanding which product categories drive the most revenue and identifying growth opportunities."
+ "Analysis of product category revenue distribution, top performers, and market share insights."
]
},
{
"cell_type": "code",
- "execution_count": 9,
+ "execution_count": null,
"metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "TOP PRODUCT CATEGORIES - 2023\n",
- "==================================================\n",
- "electronics $ 1,401,359 ( 41.7%)\n",
- "home_garden $ 862,653 ( 25.7%)\n",
- "sports_outdoors $ 278,845 ( 8.3%)\n",
- "automotive $ 247,707 ( 7.4%)\n",
- "clothing_shoes_jewelry $ 232,745 ( 6.9%)\n",
- "toys_games $ 70,164 ( 2.1%)\n",
- "health_personal_care $ 65,370 ( 1.9%)\n",
- "tools_home_improvement $ 54,280 ( 1.6%)\n",
- "beauty_personal_care $ 49,213 ( 1.5%)\n",
- "books_media $ 38,559 ( 1.1%)\n"
- ]
- },
- {
- "data": {
- "image/png": "",
- "text/plain": [
- ""
- ]
- },
- "metadata": {},
- "output_type": "display_data"
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "\n",
- "CATEGORY INSIGHTS:\n",
- "Total Product Categories: 13\n",
- "Top 5 Categories Revenue Share: 90.0%\n",
- "Market Concentration: High\n"
- ]
- }
- ],
+ "outputs": [],
"source": [
- "# Product category analysis\n",
- "if 'error' not in business_report['product_performance']:\n",
- " product_data = business_report['product_performance']\n",
- " \n",
- " print(f\"TOP PRODUCT CATEGORIES - {ANALYSIS_YEAR}\")\n",
+ "# Analyze product category performance\n",
+ "print(f\"Analyzing product performance for {ANALYSIS_YEAR}...\")\n",
+ "product_performance = metrics_calculator.analyze_product_performance(ANALYSIS_YEAR, top_n=13)\n",
+ "\n",
+ "if 'error' not in product_performance:\n",
+ " print(\"\\nTOP 10 PRODUCT CATEGORIES BY REVENUE:\")\n",
" print(\"=\" * 50)\n",
+ " top_categories = product_performance['top_categories'].head(10)\n",
" \n",
- " top_categories = product_data['top_categories'].head(10)\n",
" for idx, row in top_categories.iterrows():\n",
- " print(f\"{row['product_category_name']:<25} ${row['total_revenue']:>10,.0f} ({row['revenue_share']:>5.1f}%)\")\n",
+ " print(f\"{row['product_category_name'].replace('_', ' ').title():}\")\n",
+ " print(f\" Revenue: ${row['total_revenue']:,.2f} ({row['revenue_share']:.1f}% of total)\")\n",
+ " print(f\" Items Sold: {row['items_sold']:,}\")\n",
+ " print(f\" Avg Item Price: ${row['avg_item_price']:.2f}\")\n",
+ " print(f\" Unique Orders: {row['unique_orders']:,}\")\n",
+ " print()\n",
" \n",
- " # Category performance visualization\n",
- " category_fig = visualizer.plot_category_performance(top_n=10, figsize=(14, 10))\n",
- " plt.show()\n",
+ " # Display full category table\n",
+ " print(\"COMPLETE CATEGORY PERFORMANCE:\")\n",
+ " display(product_performance['all_categories'])\n",
+ "else:\n",
+ " print(\"Product category data not available\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Product category revenue visualization\n",
+ "if 'error' not in product_performance:\n",
+ " top_10_categories = product_performance['top_categories'].head(10)\n",
+ " \n",
+ " fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(20, 8))\n",
+ " \n",
+ " # Horizontal bar chart for top categories\n",
+ " colors = plt.cm.Blues(np.linspace(0.4, 0.8, len(top_10_categories)))\n",
+ " bars = ax1.barh(range(len(top_10_categories)), top_10_categories['total_revenue'], color=colors)\n",
+ " ax1.set_yticks(range(len(top_10_categories)))\n",
+ " ax1.set_yticklabels([cat.replace('_', ' ').title() for cat in top_10_categories['product_category_name']])\n",
+ " ax1.set_xlabel('Revenue ($)', fontsize=12)\n",
+ " ax1.set_title(f'Top 10 Product Categories by Revenue - {ANALYSIS_YEAR}', fontsize=14, fontweight='bold')\n",
+ " ax1.xaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'${x:,.0f}'))\n",
+ " \n",
+ " # Add value labels\n",
+ " for i, v in enumerate(top_10_categories['total_revenue']):\n",
+ " ax1.text(v + max(top_10_categories['total_revenue']) * 0.01, i, f'${v:,.0f}', \n",
+ " va='center', fontsize=10)\n",
+ " \n",
+ " # Revenue share pie chart\n",
+ " top_5_for_pie = top_10_categories.head(5)\n",
+ " other_revenue = top_10_categories.iloc[5:]['total_revenue'].sum()\n",
+ " \n",
+ " pie_data = list(top_5_for_pie['total_revenue']) + [other_revenue]\n",
+ " pie_labels = list(top_5_for_pie['product_category_name'].str.replace('_', ' ').str.title()) + ['Other Categories']\n",
" \n",
- " # Category insights\n",
- " total_categories = len(product_data['all_categories'])\n",
- " top_5_share = top_categories.head(5)['revenue_share'].sum()\n",
+ " wedges, texts, autotexts = ax2.pie(pie_data, labels=pie_labels, autopct='%1.1f%%', \n",
+ " colors=plt.cm.Set3(np.linspace(0, 1, len(pie_data))))\n",
+ " ax2.set_title(f'Revenue Share by Top Categories - {ANALYSIS_YEAR}', fontsize=14, fontweight='bold')\n",
" \n",
- " print(f\"\\nCATEGORY INSIGHTS:\")\n",
- " print(f\"Total Product Categories: {total_categories}\")\n",
- " print(f\"Top 5 Categories Revenue Share: {top_5_share:.1f}%\")\n",
- " print(f\"Market Concentration: {'High' if top_5_share > 70 else 'Moderate' if top_5_share > 50 else 'Low'}\")\n",
+ " plt.tight_layout()\n",
+ " plt.show()\n",
"else:\n",
- " print(\"Product category data not available for analysis\")"
+ " print(\"Cannot create product visualizations - category data not available\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
- "### 5.3 Geographic Performance Analysis {#geographic-analysis}\n",
+ "### Product Category Insights\n",
"\n",
- "Analyzing sales performance across different geographic regions to identify market opportunities."
+ "**Key Findings:**\n",
+ "- Product category performance reveals market segment strengths and opportunities\n",
+ "- Revenue concentration indicates dependency on top-performing categories\n",
+ "- Average item prices vary significantly across categories, indicating different market positioning"
]
},
{
- "cell_type": "code",
- "execution_count": 10,
+ "cell_type": "markdown",
"metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "GEOGRAPHIC PERFORMANCE - 2023\n",
- "==================================================\n",
- "TOP 10 STATES BY REVENUE:\n",
- "CA $ 537,881 ( 769 orders, AOV: $ 699)\n",
- "TX $ 403,303 ( 561 orders, AOV: $ 719)\n",
- "FL $ 310,196 ( 431 orders, AOV: $ 720)\n",
- "NY $ 253,112 ( 384 orders, AOV: $ 659)\n",
- "IL $ 185,931 ( 260 orders, AOV: $ 715)\n",
- "PA $ 173,261 ( 225 orders, AOV: $ 770)\n",
- "OH $ 157,620 ( 200 orders, AOV: $ 788)\n",
- "GA $ 146,415 ( 190 orders, AOV: $ 771)\n",
- "MI $ 132,326 ( 210 orders, AOV: $ 630)\n",
- "NC $ 128,315 ( 177 orders, AOV: $ 725)\n"
- ]
- },
- {
- "data": {
- "application/vnd.plotly.v1+json": {
- "config": {
- "plotlyServerURL": "https://plot.ly"
- },
- "data": [
- {
- "coloraxis": "coloraxis",
- "geo": "geo",
- "hovertemplate": "state=%{location} Revenue ($)=%{z} ",
- "locationmode": "USA-states",
- "locations": [
- "CA",
- "TX",
- "FL",
- "NY",
- "IL",
- "PA",
- "OH",
- "GA",
- "MI",
- "NC",
- "NJ",
- "VA",
- "MA",
- "AZ",
- "WA",
- "MD",
- "IN",
- "MO",
- "WI",
- "TN"
- ],
- "name": "",
- "type": "choropleth",
- "z": {
- "bdata": "rkfhejFqIEFcj8L1nZ0YQcP1KFzP7hJBzczMzL7lDkHrUbgeV7IGQQrXo3BlJgVBKVyPwp09A0EzMzMzdd8BQc3MzMwsJwBBcT0K16tT/0Bcj8L12Bz/QHE9CtePsflASOF6FHIs+UBmZmZm0nD4QIXrUbiqOfdAFa5H4TYp9UAUrkfh2hD1QDMzMzOzr/JA9ihcjx6E8kDsUbgezXLyQA==",
- "dtype": "f8"
- }
- }
- ],
- "layout": {
- "coloraxis": {
- "colorbar": {
- "title": {
- "text": "Revenue ($)"
- }
- },
- "colorscale": [
- [
- 0,
- "rgb(247,251,255)"
- ],
- [
- 0.125,
- "rgb(222,235,247)"
- ],
- [
- 0.25,
- "rgb(198,219,239)"
- ],
- [
- 0.375,
- "rgb(158,202,225)"
- ],
- [
- 0.5,
- "rgb(107,174,214)"
- ],
- [
- 0.625,
- "rgb(66,146,198)"
- ],
- [
- 0.75,
- "rgb(33,113,181)"
- ],
- [
- 0.875,
- "rgb(8,81,156)"
- ],
- [
- 1,
- "rgb(8,48,107)"
- ]
- ]
- },
- "geo": {
- "center": {},
- "domain": {
- "x": [
- 0,
- 1
- ],
- "y": [
- 0,
- 1
- ]
- },
- "scope": "usa",
- "showcoastlines": true,
- "showframe": false
- },
- "legend": {
- "tracegroupgap": 0
- },
- "template": {
- "data": {
- "bar": [
- {
- "error_x": {
- "color": "#2a3f5f"
- },
- "error_y": {
- "color": "#2a3f5f"
- },
- "marker": {
- "line": {
- "color": "#E5ECF6",
- "width": 0.5
- },
- "pattern": {
- "fillmode": "overlay",
- "size": 10,
- "solidity": 0.2
- }
- },
- "type": "bar"
- }
- ],
- "barpolar": [
- {
- "marker": {
- "line": {
- "color": "#E5ECF6",
- "width": 0.5
- },
- "pattern": {
- "fillmode": "overlay",
- "size": 10,
- "solidity": 0.2
- }
- },
- "type": "barpolar"
- }
- ],
- "carpet": [
- {
- "aaxis": {
- "endlinecolor": "#2a3f5f",
- "gridcolor": "white",
- "linecolor": "white",
- "minorgridcolor": "white",
- "startlinecolor": "#2a3f5f"
- },
- "baxis": {
- "endlinecolor": "#2a3f5f",
- "gridcolor": "white",
- "linecolor": "white",
- "minorgridcolor": "white",
- "startlinecolor": "#2a3f5f"
- },
- "type": "carpet"
- }
- ],
- "choropleth": [
- {
- "colorbar": {
- "outlinewidth": 0,
- "ticks": ""
- },
- "type": "choropleth"
- }
- ],
- "contour": [
- {
- "colorbar": {
- "outlinewidth": 0,
- "ticks": ""
- },
- "colorscale": [
- [
- 0,
- "#0d0887"
- ],
- [
- 0.1111111111111111,
- "#46039f"
- ],
- [
- 0.2222222222222222,
- "#7201a8"
- ],
- [
- 0.3333333333333333,
- "#9c179e"
- ],
- [
- 0.4444444444444444,
- "#bd3786"
- ],
- [
- 0.5555555555555556,
- "#d8576b"
- ],
- [
- 0.6666666666666666,
- "#ed7953"
- ],
- [
- 0.7777777777777778,
- "#fb9f3a"
- ],
- [
- 0.8888888888888888,
- "#fdca26"
- ],
- [
- 1,
- "#f0f921"
- ]
- ],
- "type": "contour"
- }
- ],
- "contourcarpet": [
- {
- "colorbar": {
- "outlinewidth": 0,
- "ticks": ""
- },
- "type": "contourcarpet"
- }
- ],
- "heatmap": [
- {
- "colorbar": {
- "outlinewidth": 0,
- "ticks": ""
- },
- "colorscale": [
- [
- 0,
- "#0d0887"
- ],
- [
- 0.1111111111111111,
- "#46039f"
- ],
- [
- 0.2222222222222222,
- "#7201a8"
- ],
- [
- 0.3333333333333333,
- "#9c179e"
- ],
- [
- 0.4444444444444444,
- "#bd3786"
- ],
- [
- 0.5555555555555556,
- "#d8576b"
- ],
- [
- 0.6666666666666666,
- "#ed7953"
- ],
- [
- 0.7777777777777778,
- "#fb9f3a"
- ],
- [
- 0.8888888888888888,
- "#fdca26"
- ],
- [
- 1,
- "#f0f921"
- ]
- ],
- "type": "heatmap"
- }
- ],
- "histogram": [
- {
- "marker": {
- "pattern": {
- "fillmode": "overlay",
- "size": 10,
- "solidity": 0.2
- }
- },
- "type": "histogram"
- }
- ],
- "histogram2d": [
- {
- "colorbar": {
- "outlinewidth": 0,
- "ticks": ""
- },
- "colorscale": [
- [
- 0,
- "#0d0887"
- ],
- [
- 0.1111111111111111,
- "#46039f"
- ],
- [
- 0.2222222222222222,
- "#7201a8"
- ],
- [
- 0.3333333333333333,
- "#9c179e"
- ],
- [
- 0.4444444444444444,
- "#bd3786"
- ],
- [
- 0.5555555555555556,
- "#d8576b"
- ],
- [
- 0.6666666666666666,
- "#ed7953"
- ],
- [
- 0.7777777777777778,
- "#fb9f3a"
- ],
- [
- 0.8888888888888888,
- "#fdca26"
- ],
- [
- 1,
- "#f0f921"
- ]
- ],
- "type": "histogram2d"
- }
- ],
- "histogram2dcontour": [
- {
- "colorbar": {
- "outlinewidth": 0,
- "ticks": ""
- },
- "colorscale": [
- [
- 0,
- "#0d0887"
- ],
- [
- 0.1111111111111111,
- "#46039f"
- ],
- [
- 0.2222222222222222,
- "#7201a8"
- ],
- [
- 0.3333333333333333,
- "#9c179e"
- ],
- [
- 0.4444444444444444,
- "#bd3786"
- ],
- [
- 0.5555555555555556,
- "#d8576b"
- ],
- [
- 0.6666666666666666,
- "#ed7953"
- ],
- [
- 0.7777777777777778,
- "#fb9f3a"
- ],
- [
- 0.8888888888888888,
- "#fdca26"
- ],
- [
- 1,
- "#f0f921"
- ]
- ],
- "type": "histogram2dcontour"
- }
- ],
- "mesh3d": [
- {
- "colorbar": {
- "outlinewidth": 0,
- "ticks": ""
- },
- "type": "mesh3d"
- }
- ],
- "parcoords": [
- {
- "line": {
- "colorbar": {
- "outlinewidth": 0,
- "ticks": ""
- }
- },
- "type": "parcoords"
- }
- ],
- "pie": [
- {
- "automargin": true,
- "type": "pie"
- }
- ],
- "scatter": [
- {
- "fillpattern": {
- "fillmode": "overlay",
- "size": 10,
- "solidity": 0.2
- },
- "type": "scatter"
- }
- ],
- "scatter3d": [
- {
- "line": {
- "colorbar": {
- "outlinewidth": 0,
- "ticks": ""
- }
- },
- "marker": {
- "colorbar": {
- "outlinewidth": 0,
- "ticks": ""
- }
- },
- "type": "scatter3d"
- }
- ],
- "scattercarpet": [
- {
- "marker": {
- "colorbar": {
- "outlinewidth": 0,
- "ticks": ""
- }
- },
- "type": "scattercarpet"
- }
- ],
- "scattergeo": [
- {
- "marker": {
- "colorbar": {
- "outlinewidth": 0,
- "ticks": ""
- }
- },
- "type": "scattergeo"
- }
- ],
- "scattergl": [
- {
- "marker": {
- "colorbar": {
- "outlinewidth": 0,
- "ticks": ""
- }
- },
- "type": "scattergl"
- }
- ],
- "scattermap": [
- {
- "marker": {
- "colorbar": {
- "outlinewidth": 0,
- "ticks": ""
- }
- },
- "type": "scattermap"
- }
- ],
- "scattermapbox": [
- {
- "marker": {
- "colorbar": {
- "outlinewidth": 0,
- "ticks": ""
- }
- },
- "type": "scattermapbox"
- }
- ],
- "scatterpolar": [
- {
- "marker": {
- "colorbar": {
- "outlinewidth": 0,
- "ticks": ""
- }
- },
- "type": "scatterpolar"
- }
- ],
- "scatterpolargl": [
- {
- "marker": {
- "colorbar": {
- "outlinewidth": 0,
- "ticks": ""
- }
- },
- "type": "scatterpolargl"
- }
- ],
- "scatterternary": [
- {
- "marker": {
- "colorbar": {
- "outlinewidth": 0,
- "ticks": ""
- }
- },
- "type": "scatterternary"
- }
- ],
- "surface": [
- {
- "colorbar": {
- "outlinewidth": 0,
- "ticks": ""
- },
- "colorscale": [
- [
- 0,
- "#0d0887"
- ],
- [
- 0.1111111111111111,
- "#46039f"
- ],
- [
- 0.2222222222222222,
- "#7201a8"
- ],
- [
- 0.3333333333333333,
- "#9c179e"
- ],
- [
- 0.4444444444444444,
- "#bd3786"
- ],
- [
- 0.5555555555555556,
- "#d8576b"
- ],
- [
- 0.6666666666666666,
- "#ed7953"
- ],
- [
- 0.7777777777777778,
- "#fb9f3a"
- ],
- [
- 0.8888888888888888,
- "#fdca26"
- ],
- [
- 1,
- "#f0f921"
- ]
- ],
- "type": "surface"
- }
- ],
- "table": [
- {
- "cells": {
- "fill": {
- "color": "#EBF0F8"
- },
- "line": {
- "color": "white"
- }
- },
- "header": {
- "fill": {
- "color": "#C8D4E3"
- },
- "line": {
- "color": "white"
- }
- },
- "type": "table"
- }
- ]
- },
- "layout": {
- "annotationdefaults": {
- "arrowcolor": "#2a3f5f",
- "arrowhead": 0,
- "arrowwidth": 1
- },
- "autotypenumbers": "strict",
- "coloraxis": {
- "colorbar": {
- "outlinewidth": 0,
- "ticks": ""
- }
- },
- "colorscale": {
- "diverging": [
- [
- 0,
- "#8e0152"
- ],
- [
- 0.1,
- "#c51b7d"
- ],
- [
- 0.2,
- "#de77ae"
- ],
- [
- 0.3,
- "#f1b6da"
- ],
- [
- 0.4,
- "#fde0ef"
- ],
- [
- 0.5,
- "#f7f7f7"
- ],
- [
- 0.6,
- "#e6f5d0"
- ],
- [
- 0.7,
- "#b8e186"
- ],
- [
- 0.8,
- "#7fbc41"
- ],
- [
- 0.9,
- "#4d9221"
- ],
- [
- 1,
- "#276419"
- ]
- ],
- "sequential": [
- [
- 0,
- "#0d0887"
- ],
- [
- 0.1111111111111111,
- "#46039f"
- ],
- [
- 0.2222222222222222,
- "#7201a8"
- ],
- [
- 0.3333333333333333,
- "#9c179e"
- ],
- [
- 0.4444444444444444,
- "#bd3786"
- ],
- [
- 0.5555555555555556,
- "#d8576b"
- ],
- [
- 0.6666666666666666,
- "#ed7953"
- ],
- [
- 0.7777777777777778,
- "#fb9f3a"
- ],
- [
- 0.8888888888888888,
- "#fdca26"
- ],
- [
- 1,
- "#f0f921"
- ]
- ],
- "sequentialminus": [
- [
- 0,
- "#0d0887"
- ],
- [
- 0.1111111111111111,
- "#46039f"
- ],
- [
- 0.2222222222222222,
- "#7201a8"
- ],
- [
- 0.3333333333333333,
- "#9c179e"
- ],
- [
- 0.4444444444444444,
- "#bd3786"
- ],
- [
- 0.5555555555555556,
- "#d8576b"
- ],
- [
- 0.6666666666666666,
- "#ed7953"
- ],
- [
- 0.7777777777777778,
- "#fb9f3a"
- ],
- [
- 0.8888888888888888,
- "#fdca26"
- ],
- [
- 1,
- "#f0f921"
- ]
- ]
- },
- "colorway": [
- "#636efa",
- "#EF553B",
- "#00cc96",
- "#ab63fa",
- "#FFA15A",
- "#19d3f3",
- "#FF6692",
- "#B6E880",
- "#FF97FF",
- "#FECB52"
- ],
- "font": {
- "color": "#2a3f5f"
- },
- "geo": {
- "bgcolor": "white",
- "lakecolor": "white",
- "landcolor": "#E5ECF6",
- "showlakes": true,
- "showland": true,
- "subunitcolor": "white"
- },
- "hoverlabel": {
- "align": "left"
- },
- "hovermode": "closest",
- "mapbox": {
- "style": "light"
- },
- "paper_bgcolor": "white",
- "plot_bgcolor": "#E5ECF6",
- "polar": {
- "angularaxis": {
- "gridcolor": "white",
- "linecolor": "white",
- "ticks": ""
- },
- "bgcolor": "#E5ECF6",
- "radialaxis": {
- "gridcolor": "white",
- "linecolor": "white",
- "ticks": ""
- }
- },
- "scene": {
- "xaxis": {
- "backgroundcolor": "#E5ECF6",
- "gridcolor": "white",
- "gridwidth": 2,
- "linecolor": "white",
- "showbackground": true,
- "ticks": "",
- "zerolinecolor": "white"
- },
- "yaxis": {
- "backgroundcolor": "#E5ECF6",
- "gridcolor": "white",
- "gridwidth": 2,
- "linecolor": "white",
- "showbackground": true,
- "ticks": "",
- "zerolinecolor": "white"
- },
- "zaxis": {
- "backgroundcolor": "#E5ECF6",
- "gridcolor": "white",
- "gridwidth": 2,
- "linecolor": "white",
- "showbackground": true,
- "ticks": "",
- "zerolinecolor": "white"
- }
- },
- "shapedefaults": {
- "line": {
- "color": "#2a3f5f"
- }
- },
- "ternary": {
- "aaxis": {
- "gridcolor": "white",
- "linecolor": "white",
- "ticks": ""
- },
- "baxis": {
- "gridcolor": "white",
- "linecolor": "white",
- "ticks": ""
- },
- "bgcolor": "#E5ECF6",
- "caxis": {
- "gridcolor": "white",
- "linecolor": "white",
- "ticks": ""
- }
- },
- "title": {
- "x": 0.05
- },
- "xaxis": {
- "automargin": true,
- "gridcolor": "white",
- "linecolor": "white",
- "ticks": "",
- "title": {
- "standoff": 15
- },
- "zerolinecolor": "white",
- "zerolinewidth": 2
- },
- "yaxis": {
- "automargin": true,
- "gridcolor": "white",
- "linecolor": "white",
- "ticks": "",
- "title": {
- "standoff": 15
- },
- "zerolinecolor": "white",
- "zerolinewidth": 2
- }
- }
- },
- "title": {
- "font": {
- "size": 16
- },
- "text": "Revenue by State - 2023",
- "x": 0.5
- }
- }
- }
- },
- "metadata": {},
- "output_type": "display_data"
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "\n",
- "GEOGRAPHIC INSIGHTS:\n",
- "States with Sales: 20\n",
- "Top 5 States Revenue Share: 50.3%\n",
- "Highest AOV State: IN ($814)\n",
- "Geographic Diversity: Low\n"
- ]
- }
- ],
"source": [
- "# Geographic analysis\n",
- "geo_data = business_report['geographic_performance']\n",
+ "## 7. Geographic Performance Analysis {#geographic-analysis}\n",
"\n",
- "if 'error' not in geo_data.columns:\n",
- " print(f\"GEOGRAPHIC PERFORMANCE - {ANALYSIS_YEAR}\")\n",
- " print(\"=\" * 50)\n",
+ "State-level analysis of sales performance, revenue distribution, and market penetration."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Analyze geographic performance\n",
+ "print(f\"Analyzing geographic performance for {ANALYSIS_YEAR}...\")\n",
+ "geographic_performance = metrics_calculator.analyze_geographic_performance(ANALYSIS_YEAR)\n",
+ "\n",
+ "if 'error' not in geographic_performance.columns:\n",
+ " print(\"\\nTOP 15 STATES BY REVENUE:\")\n",
+ " print(\"=\" * 40)\n",
+ " top_states = geographic_performance.head(15)\n",
" \n",
- " # Top performing states\n",
- " top_states = geo_data.head(10)\n",
- " print(\"TOP 10 STATES BY REVENUE:\")\n",
" for idx, row in top_states.iterrows():\n",
- " print(f\"{row['state']:<3} ${row['revenue']:>10,.0f} ({row['orders']:>5,} orders, AOV: ${row['avg_order_value']:>7,.0f})\")\n",
+ " print(f\"{row['state']}:\")\n",
+ " print(f\" Revenue: ${row['revenue']:,.2f}\")\n",
+ " print(f\" Orders: {row['orders']:,}\")\n",
+ " print(f\" Avg Order Value: ${row['avg_order_value']:.2f}\")\n",
+ " print()\n",
+ " \n",
+ " # Geographic summary statistics\n",
+ " print(\"GEOGRAPHIC PERFORMANCE SUMMARY:\")\n",
+ " print(f\"Total States with Sales: {len(geographic_performance)}\")\n",
+ " print(f\"Top State Revenue: ${geographic_performance.iloc[0]['revenue']:,.2f}\")\n",
+ " print(f\"Median State Revenue: ${geographic_performance['revenue'].median():,.2f}\")\n",
+ " print(f\"Revenue Concentration (Top 5 States): {(geographic_performance.head(5)['revenue'].sum() / geographic_performance['revenue'].sum() * 100):.1f}%\")\n",
+ "else:\n",
+ " print(\"Geographic data not available for analysis\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Geographic visualizations\n",
+ "if 'error' not in geographic_performance.columns:\n",
+ " # Create choropleth map using Plotly\n",
+ " fig_map = px.choropleth(\n",
+ " geographic_performance,\n",
+ " locations='state',\n",
+ " color='revenue',\n",
+ " locationmode='USA-states',\n",
+ " scope='usa',\n",
+ " title=f'Revenue by State - {ANALYSIS_YEAR}',\n",
+ " color_continuous_scale='Blues',\n",
+ " labels={'revenue': 'Revenue ($)'}\n",
+ " )\n",
+ " \n",
+ " fig_map.update_layout(\n",
+ " title_font_size=16,\n",
+ " title_x=0.5,\n",
+ " geo=dict(showframe=False, showcoastlines=True),\n",
+ " width=1000,\n",
+ " height=600\n",
+ " )\n",
" \n",
- " # Geographic heatmap\n",
- " geo_fig = visualizer.plot_geographic_heatmap()\n",
- " geo_fig.show()\n",
+ " fig_map.show()\n",
" \n",
- " # Geographic insights\n",
- " total_states = len(geo_data)\n",
- " top_5_revenue = top_states.head(5)['revenue'].sum()\n",
- " total_revenue = geo_data['revenue'].sum()\n",
- " top_5_share = (top_5_revenue / total_revenue) * 100\n",
+ " # Top states bar chart\n",
+ " top_10_states = geographic_performance.head(10)\n",
" \n",
- " print(f\"\\nGEOGRAPHIC INSIGHTS:\")\n",
- " print(f\"States with Sales: {total_states}\")\n",
- " print(f\"Top 5 States Revenue Share: {top_5_share:.1f}%\")\n",
- " print(f\"Highest AOV State: {geo_data.loc[geo_data['avg_order_value'].idxmax(), 'state']} (${geo_data['avg_order_value'].max():,.0f})\")\n",
- " print(f\"Geographic Diversity: {'High' if total_states > 40 else 'Moderate' if total_states > 20 else 'Low'}\")\n",
+ " plt.figure(figsize=(15, 8))\n",
+ " bars = plt.bar(top_10_states['state'], top_10_states['revenue'], \n",
+ " color='skyblue', alpha=0.8)\n",
+ " plt.title(f'Top 10 States by Revenue - {ANALYSIS_YEAR}', fontsize=16, fontweight='bold', pad=20)\n",
+ " plt.xlabel('State', fontsize=12)\n",
+ " plt.ylabel('Revenue ($)', fontsize=12)\n",
+ " plt.xticks(rotation=45)\n",
+ " plt.gca().yaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'${x:,.0f}'))\n",
+ " plt.grid(True, alpha=0.3)\n",
+ " \n",
+ " # Add value labels\n",
+ " for bar, value in zip(bars, top_10_states['revenue']):\n",
+ " plt.text(bar.get_x() + bar.get_width()/2, bar.get_height() + max(top_10_states['revenue']) * 0.01,\n",
+ " f'${value:,.0f}', ha='center', va='bottom', fontsize=10, rotation=0)\n",
+ " \n",
+ " plt.tight_layout()\n",
+ " plt.show()\n",
"else:\n",
- " print(\"Geographic data not available for analysis\")"
+ " print(\"Cannot create geographic visualizations - location data not available\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Geographic Performance Insights\n",
+ "\n",
+ "**Key Findings:**\n",
+ "- Geographic revenue distribution reveals market concentration and expansion opportunities\n",
+ "- State-level performance variations indicate regional preferences and market maturity\n",
+ "- Average order values by state provide insights into regional customer spending patterns"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
- "### 5.4 Customer Experience Analysis {#customer-analysis}\n",
+ "## 8. Customer Experience Analysis {#customer-analysis}\n",
"\n",
- "Evaluating customer satisfaction through review scores and delivery performance metrics."
+ "Comprehensive analysis of customer satisfaction, delivery performance, and service quality metrics."
]
},
{
"cell_type": "code",
- "execution_count": 11,
+ "execution_count": null,
"metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "CUSTOMER SATISFACTION ANALYSIS - 2023\n",
- "==================================================\n",
- "Average Review Score: 4.10/5.0\n",
- "Total Reviews: 3,225\n",
- "5-Star Reviews: 34.7%\n",
- "4+ Star Reviews: 51.6%\n",
- "Low Satisfaction (1-2 stars): 7.2%\n"
- ]
- },
- {
- "data": {
- "image/png": "",
- "text/plain": [
- ""
- ]
- },
- "metadata": {},
- "output_type": "display_data"
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "\n",
- "SATISFACTION INSIGHTS:\n",
- "Overall Satisfaction Level: Good\n"
- ]
- }
- ],
+ "outputs": [],
"source": [
"# Customer satisfaction analysis\n",
- "satisfaction_metrics = business_report['customer_satisfaction']\n",
+ "print(f\"Analyzing customer satisfaction for {ANALYSIS_YEAR}...\")\n",
+ "satisfaction_metrics = metrics_calculator.analyze_customer_satisfaction(ANALYSIS_YEAR)\n",
"\n",
"if 'error' not in satisfaction_metrics:\n",
- " print(f\"CUSTOMER SATISFACTION ANALYSIS - {ANALYSIS_YEAR}\")\n",
- " print(\"=\" * 50)\n",
+ " print(\"\\nCUSTOMER SATISFACTION METRICS:\")\n",
+ " print(\"=\" * 40)\n",
" print(f\"Average Review Score: {satisfaction_metrics['avg_review_score']:.2f}/5.0\")\n",
" print(f\"Total Reviews: {satisfaction_metrics['total_reviews']:,}\")\n",
" print(f\"5-Star Reviews: {satisfaction_metrics['score_5_percentage']:.1f}%\")\n",
" print(f\"4+ Star Reviews: {satisfaction_metrics['score_4_plus_percentage']:.1f}%\")\n",
- " print(f\"Low Satisfaction (1-2 stars): {satisfaction_metrics['score_1_2_percentage']:.1f}%\")\n",
- " \n",
- " # Review distribution visualization\n",
- " review_fig = visualizer.plot_review_distribution(figsize=(12, 6))\n",
- " plt.show()\n",
- " \n",
- " # Satisfaction insights\n",
- " avg_score = satisfaction_metrics['avg_review_score']\n",
- " satisfaction_level = 'Excellent' if avg_score >= 4.5 else 'Good' if avg_score >= 4.0 else 'Fair' if avg_score >= 3.5 else 'Poor'\n",
- " \n",
- " print(f\"\\nSATISFACTION INSIGHTS:\")\n",
- " print(f\"Overall Satisfaction Level: {satisfaction_level}\")\n",
- " if satisfaction_metrics['score_4_plus_percentage'] >= 80:\n",
- " print(\"✅ Strong customer satisfaction (80%+ give 4+ stars)\")\n",
- " elif satisfaction_metrics['score_1_2_percentage'] > 10:\n",
- " print(\"⚠️ Significant dissatisfaction detected (>10% give 1-2 stars)\")\n",
+ " print(f\"Low Satisfaction (1-2 Stars): {satisfaction_metrics['score_1_2_percentage']:.1f}%\")\n",
"else:\n",
- " print(\"Customer satisfaction data not available for analysis\")"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 12,
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "DELIVERY PERFORMANCE ANALYSIS - 2023\n",
- "==================================================\n",
- "Average Delivery Time: 8.0 days\n",
- "Median Delivery Time: 8.0 days\n",
- "Fast Delivery (≤3 days): 7.2%\n",
- "Slow Delivery (>7 days): 55.6%\n",
- "\n",
- "DELIVERY INSIGHTS:\n",
- "Delivery Performance Rating: Poor\n",
- "⚠️ High percentage of slow deliveries needs attention\n",
- "\n",
- "DELIVERY-SATISFACTION CORRELATION:\n",
- "Average satisfaction score: 4.10\n",
- "Fast delivery rate: 7.2%\n"
- ]
- }
- ],
- "source": [
+ " print(\"Customer satisfaction data not available\")\n",
+ "\n",
"# Delivery performance analysis\n",
- "delivery_metrics = business_report['delivery_performance']\n",
+ "print(f\"\\nAnalyzing delivery performance for {ANALYSIS_YEAR}...\")\n",
+ "delivery_metrics = metrics_calculator.analyze_delivery_performance(ANALYSIS_YEAR)\n",
"\n",
"if 'error' not in delivery_metrics:\n",
- " print(f\"DELIVERY PERFORMANCE ANALYSIS - {ANALYSIS_YEAR}\")\n",
- " print(\"=\" * 50)\n",
+ " print(\"\\nDELIVERY PERFORMANCE METRICS:\")\n",
+ " print(\"=\" * 40)\n",
" print(f\"Average Delivery Time: {delivery_metrics['avg_delivery_days']:.1f} days\")\n",
" print(f\"Median Delivery Time: {delivery_metrics['median_delivery_days']:.1f} days\")\n",
" print(f\"Fast Delivery (≤3 days): {delivery_metrics['fast_delivery_percentage']:.1f}%\")\n",
" print(f\"Slow Delivery (>7 days): {delivery_metrics['slow_delivery_percentage']:.1f}%\")\n",
- " \n",
- " # Delivery performance evaluation\n",
- " avg_delivery = delivery_metrics['avg_delivery_days']\n",
- " delivery_rating = 'Excellent' if avg_delivery <= 3 else 'Good' if avg_delivery <= 5 else 'Fair' if avg_delivery <= 7 else 'Poor'\n",
- " \n",
- " print(f\"\\nDELIVERY INSIGHTS:\")\n",
- " print(f\"Delivery Performance Rating: {delivery_rating}\")\n",
- " \n",
- " if delivery_metrics['fast_delivery_percentage'] >= 30:\n",
- " print(\"✅ Strong fast delivery capability\")\n",
- " if delivery_metrics['slow_delivery_percentage'] > 20:\n",
- " print(\"⚠️ High percentage of slow deliveries needs attention\")\n",
- " \n",
- " # Delivery speed impact on satisfaction\n",
- " if 'error' not in satisfaction_metrics:\n",
- " print(f\"\\nDELIVERY-SATISFACTION CORRELATION:\")\n",
- " # This would require more detailed analysis of the relationship\n",
- " print(f\"Average satisfaction score: {satisfaction_metrics['avg_review_score']:.2f}\")\n",
- " print(f\"Fast delivery rate: {delivery_metrics['fast_delivery_percentage']:.1f}%\")\n",
"else:\n",
- " print(\"Delivery performance data not available for analysis\")"
+ " print(\"Delivery performance data not available\")"
]
},
{
- "cell_type": "markdown",
+ "cell_type": "code",
+ "execution_count": null,
"metadata": {},
+ "outputs": [],
"source": [
- "## 6. Summary of Key Observations {#summary}\n",
+ "# Create customer experience visualizations\n",
+ "fig, axes = plt.subplots(2, 2, figsize=(16, 12))\n",
+ "\n",
+ "# Review score distribution\n",
+ "if 'error' not in satisfaction_metrics:\n",
+ " # Get review distribution from sales data\n",
+ " current_year_data = sales_data[sales_data['purchase_year'] == ANALYSIS_YEAR]\n",
+ " review_counts = current_year_data.drop_duplicates('order_id')['review_score'].value_counts().sort_index()\n",
+ " \n",
+ " colors = ['#d62728', '#ff7f0e', '#ffbb78', '#2ca02c', '#1f77b4']\n",
+ " bars = axes[0,0].bar(review_counts.index, review_counts.values, color=colors)\n",
+ " axes[0,0].set_title(f'Review Score Distribution - {ANALYSIS_YEAR}', fontsize=14, fontweight='bold')\n",
+ " axes[0,0].set_xlabel('Review Score', fontsize=12)\n",
+ " axes[0,0].set_ylabel('Number of Reviews', fontsize=12)\n",
+ " axes[0,0].grid(True, alpha=0.3)\n",
+ " \n",
+ " # Add percentage labels\n",
+ " total_reviews = review_counts.sum()\n",
+ " for bar, count in zip(bars, review_counts.values):\n",
+ " percentage = count / total_reviews * 100\n",
+ " axes[0,0].text(bar.get_x() + bar.get_width()/2, bar.get_height() + total_reviews * 0.01,\n",
+ " f'{percentage:.1f}%', ha='center', va='bottom', fontsize=10)\n",
+ "else:\n",
+ " axes[0,0].text(0.5, 0.5, 'Review data not available', ha='center', va='center', \n",
+ " transform=axes[0,0].transAxes, fontsize=14)\n",
"\n",
- "### Executive Summary\n",
+ "# Delivery time distribution\n",
+ "if 'error' not in delivery_metrics:\n",
+ " delivery_data = current_year_data.drop_duplicates('order_id')['delivery_days'].dropna()\n",
+ " \n",
+ " axes[0,1].hist(delivery_data, bins=30, color='skyblue', alpha=0.7, edgecolor='black')\n",
+ " axes[0,1].axvline(delivery_data.mean(), color='red', linestyle='--', linewidth=2, \n",
+ " label=f'Mean: {delivery_data.mean():.1f} days')\n",
+ " axes[0,1].axvline(delivery_data.median(), color='green', linestyle='--', linewidth=2,\n",
+ " label=f'Median: {delivery_data.median():.1f} days')\n",
+ " axes[0,1].set_title(f'Delivery Time Distribution - {ANALYSIS_YEAR}', fontsize=14, fontweight='bold')\n",
+ " axes[0,1].set_xlabel('Delivery Days', fontsize=12)\n",
+ " axes[0,1].set_ylabel('Number of Orders', fontsize=12)\n",
+ " axes[0,1].legend()\n",
+ " axes[0,1].grid(True, alpha=0.3)\n",
+ "else:\n",
+ " axes[0,1].text(0.5, 0.5, 'Delivery data not available', ha='center', va='center', \n",
+ " transform=axes[0,1].transAxes, fontsize=14)\n",
+ "\n",
+ "# Satisfaction by delivery speed\n",
+ "if 'error' not in satisfaction_metrics and 'error' not in delivery_metrics:\n",
+ " # Categorize delivery times\n",
+ " order_data = current_year_data.drop_duplicates('order_id').copy()\n",
+ " order_data['delivery_category'] = pd.cut(\n",
+ " order_data['delivery_days'], \n",
+ " bins=[0, 3, 7, float('inf')], \n",
+ " labels=['1-3 days', '4-7 days', '8+ days']\n",
+ " )\n",
+ " \n",
+ " satisfaction_by_speed = order_data.groupby('delivery_category')['review_score'].mean()\n",
+ " \n",
+ " bars = axes[1,0].bar(range(len(satisfaction_by_speed)), satisfaction_by_speed.values, \n",
+ " color=['#2ca02c', '#ff7f0e', '#d62728'])\n",
+ " axes[1,0].set_xticks(range(len(satisfaction_by_speed)))\n",
+ " axes[1,0].set_xticklabels(satisfaction_by_speed.index)\n",
+ " axes[1,0].set_title(f'Average Review Score by Delivery Speed - {ANALYSIS_YEAR}', fontsize=14, fontweight='bold')\n",
+ " axes[1,0].set_xlabel('Delivery Time Category', fontsize=12)\n",
+ " axes[1,0].set_ylabel('Average Review Score', fontsize=12)\n",
+ " axes[1,0].set_ylim(0, 5)\n",
+ " axes[1,0].grid(True, alpha=0.3)\n",
+ " \n",
+ " # Add value labels\n",
+ " for bar, value in zip(bars, satisfaction_by_speed.values):\n",
+ " axes[1,0].text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.05,\n",
+ " f'{value:.2f}', ha='center', va='bottom', fontsize=11)\n",
+ "else:\n",
+ " axes[1,0].text(0.5, 0.5, 'Insufficient data for analysis', ha='center', va='center', \n",
+ " transform=axes[1,0].transAxes, fontsize=14)\n",
"\n",
- "Based on the comprehensive analysis of the e-commerce data, here are the key findings and recommendations:"
+ "# Order status distribution\n",
+ "status_data = sales_data[sales_data['purchase_year'] == ANALYSIS_YEAR]['order_status'].value_counts()\n",
+ "wedges, texts, autotexts = axes[1,1].pie(status_data.values, labels=status_data.index, autopct='%1.1f%%',\n",
+ " colors=plt.cm.Set3(np.linspace(0, 1, len(status_data))))\n",
+ "axes[1,1].set_title(f'Order Status Distribution - {ANALYSIS_YEAR}', fontsize=14, fontweight='bold')\n",
+ "\n",
+ "plt.tight_layout()\n",
+ "plt.show()"
]
},
{
- "cell_type": "code",
- "execution_count": 13,
+ "cell_type": "markdown",
"metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "EXECUTIVE SUMMARY - 2023 BUSINESS PERFORMANCE\n",
- "============================================================\n",
- "\n",
- "📊 FINANCIAL PERFORMANCE:\n",
- " • Total Revenue: $3,360,295\n",
- " • Total Orders: 4,635\n",
- " • Average Order Value: $725\n",
- " • Revenue Growth: 📉 -2.5% vs 2022\n",
- "\n",
- "🛍️ PRODUCT PERFORMANCE:\n",
- " • Top Category: electronics ($1,401,359)\n",
- " • Category Market Share: 41.7%\n",
- "\n",
- "🗺️ GEOGRAPHIC PERFORMANCE:\n",
- " • Top Market: CA ($537,881)\n",
- " • Active Markets: 20 states\n",
- "\n",
- "⭐ CUSTOMER EXPERIENCE:\n",
- " • Average Rating: 4.1/5.0\n",
- " • High Satisfaction: 52% (4+ stars)\n",
- " • Average Delivery: 8.0 days\n",
- " • Fast Delivery: 7% (≤3 days)\n",
- "\n",
- "============================================================\n"
- ]
- }
- ],
"source": [
- "# Generate executive summary\n",
- "print(f\"EXECUTIVE SUMMARY - {ANALYSIS_YEAR} BUSINESS PERFORMANCE\")\n",
- "print(\"=\" * 60)\n",
- "\n",
- "# Key metrics summary\n",
- "revenue_metrics = business_report['revenue_metrics']\n",
- "print(f\"\\n📊 FINANCIAL PERFORMANCE:\")\n",
- "print(f\" • Total Revenue: ${revenue_metrics['total_revenue']:,.0f}\")\n",
- "print(f\" • Total Orders: {revenue_metrics['total_orders']:,}\")\n",
- "print(f\" • Average Order Value: ${revenue_metrics['average_order_value']:,.0f}\")\n",
- "\n",
- "if 'revenue_growth_rate' in revenue_metrics:\n",
- " growth_direction = \"📈\" if revenue_metrics['revenue_growth_rate'] > 0 else \"📉\"\n",
- " print(f\" • Revenue Growth: {growth_direction} {revenue_metrics['revenue_growth_rate']:+.1f}% vs {COMPARISON_YEAR}\")\n",
- "\n",
- "# Product insights\n",
- "if 'error' not in business_report['product_performance']:\n",
- " top_category = business_report['product_performance']['top_categories'].iloc[0]\n",
- " print(f\"\\n🛍️ PRODUCT PERFORMANCE:\")\n",
- " print(f\" • Top Category: {top_category['product_category_name']} (${top_category['total_revenue']:,.0f})\")\n",
- " print(f\" • Category Market Share: {top_category['revenue_share']:.1f}%\")\n",
- "\n",
- "# Geographic insights\n",
- "geo_data = business_report['geographic_performance']\n",
- "if 'error' not in geo_data.columns:\n",
- " top_state = geo_data.iloc[0]\n",
- " print(f\"\\n🗺️ GEOGRAPHIC PERFORMANCE:\")\n",
- " print(f\" • Top Market: {top_state['state']} (${top_state['revenue']:,.0f})\")\n",
- " print(f\" • Active Markets: {len(geo_data)} states\")\n",
- "\n",
- "# Customer experience\n",
- "if 'error' not in business_report['customer_satisfaction']:\n",
- " satisfaction = business_report['customer_satisfaction']\n",
- " print(f\"\\n⭐ CUSTOMER EXPERIENCE:\")\n",
- " print(f\" • Average Rating: {satisfaction['avg_review_score']:.1f}/5.0\")\n",
- " print(f\" • High Satisfaction: {satisfaction['score_4_plus_percentage']:.0f}% (4+ stars)\")\n",
- "\n",
- "if 'error' not in business_report['delivery_performance']:\n",
- " delivery = business_report['delivery_performance']\n",
- " print(f\" • Average Delivery: {delivery['avg_delivery_days']:.1f} days\")\n",
- " print(f\" • Fast Delivery: {delivery['fast_delivery_percentage']:.0f}% (≤3 days)\")\n",
- "\n",
- "print(f\"\\n\" + \"=\" * 60)"
+ "### Customer Experience Insights\n",
+ "\n",
+ "**Key Findings:**\n",
+ "- Customer satisfaction levels indicate overall service quality and areas for improvement\n",
+ "- Delivery performance directly correlates with customer satisfaction scores\n",
+ "- Fast delivery (1-3 days) typically results in higher customer satisfaction ratings\n",
+ "- Order fulfillment rates demonstrate operational efficiency"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
- "### Strategic Recommendations\n",
+ "## 9. Executive Summary & Recommendations {#summary}\n",
"\n",
- "Based on the analysis results, here are the key strategic recommendations:"
+ "Strategic insights and actionable recommendations based on comprehensive business intelligence analysis."
]
},
{
"cell_type": "code",
- "execution_count": 14,
+ "execution_count": null,
"metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "STRATEGIC RECOMMENDATIONS\n",
- "========================================\n",
- "1. 🔴 PRIORITY: Address negative revenue growth through customer acquisition and retention strategies\n",
- "2. 📦 Consider diversifying product portfolio to reduce dependency on top categories\n",
- "3. 🚚 PRIORITY: Optimize logistics to reduce average delivery time\n",
- "4. ⚡ Invest in fast delivery capabilities to improve customer experience\n",
- "5. 🗺️ Explore expansion opportunities in underserved geographic markets\n",
- "\n",
- "========================================\n",
- "Analysis completed for 2023\n",
- "Comparison baseline: 2022\n",
- "Generated on: 2025-08-05 04:55:02\n"
- ]
- }
- ],
+ "outputs": [],
"source": [
- "# Generate recommendations based on analysis\n",
- "print(\"STRATEGIC RECOMMENDATIONS\")\n",
- "print(\"=\" * 40)\n",
+ "# Generate comprehensive business report\n",
+ "print(\"Generating comprehensive business intelligence report...\")\n",
+ "comprehensive_report = metrics_calculator.generate_comprehensive_report(\n",
+ " current_year=ANALYSIS_YEAR,\n",
+ " previous_year=COMPARISON_YEAR\n",
+ ")\n",
"\n",
- "recommendations = []\n",
- "\n",
- "# Revenue-based recommendations\n",
- "if 'revenue_growth_rate' in revenue_metrics:\n",
- " if revenue_metrics['revenue_growth_rate'] < 0:\n",
- " recommendations.append(\"🔴 PRIORITY: Address negative revenue growth through customer acquisition and retention strategies\")\n",
- " elif revenue_metrics['revenue_growth_rate'] < 5:\n",
- " recommendations.append(\"🟡 Focus on accelerating growth through market expansion and product diversification\")\n",
- " else:\n",
- " recommendations.append(\"🟢 Maintain strong growth momentum while optimizing operational efficiency\")\n",
- "\n",
- "# Product recommendations\n",
- "if 'error' not in business_report['product_performance']:\n",
- " top_5_share = business_report['product_performance']['top_categories'].head(5)['revenue_share'].sum()\n",
- " if top_5_share > 70:\n",
- " recommendations.append(\"📦 Consider diversifying product portfolio to reduce dependency on top categories\")\n",
- " else:\n",
- " recommendations.append(\"📦 Leverage balanced product portfolio to explore cross-selling opportunities\")\n",
- "\n",
- "# Customer experience recommendations\n",
- "if 'error' not in business_report['customer_satisfaction']:\n",
- " satisfaction = business_report['customer_satisfaction']\n",
- " if satisfaction['avg_review_score'] < 4.0:\n",
- " recommendations.append(\"⭐ PRIORITY: Improve customer satisfaction through quality and service enhancements\")\n",
- " if satisfaction['score_1_2_percentage'] > 10:\n",
- " recommendations.append(\"⚠️ Address root causes of customer dissatisfaction to reduce negative reviews\")\n",
- "\n",
- "# Delivery recommendations\n",
- "if 'error' not in business_report['delivery_performance']:\n",
- " delivery = business_report['delivery_performance']\n",
- " if delivery['avg_delivery_days'] > 7:\n",
- " recommendations.append(\"🚚 PRIORITY: Optimize logistics to reduce average delivery time\")\n",
- " if delivery['fast_delivery_percentage'] < 20:\n",
- " recommendations.append(\"⚡ Invest in fast delivery capabilities to improve customer experience\")\n",
- "\n",
- "# Geographic recommendations\n",
- "geo_data = business_report['geographic_performance']\n",
- "if 'error' not in geo_data.columns:\n",
- " if len(geo_data) < 30:\n",
- " recommendations.append(\"🗺️ Explore expansion opportunities in underserved geographic markets\")\n",
- "\n",
- "# Display recommendations\n",
- "for i, rec in enumerate(recommendations, 1):\n",
- " print(f\"{i}. {rec}\")\n",
- "\n",
- "if not recommendations:\n",
- " print(\"✅ Business performance appears strong across all analyzed metrics\")\n",
- "\n",
- "print(\"\\n\" + \"=\" * 40)\n",
- "print(f\"Analysis completed for {ANALYSIS_YEAR}\")\n",
- "if COMPARISON_YEAR:\n",
- " print(f\"Comparison baseline: {COMPARISON_YEAR}\")\n",
- "print(f\"Generated on: {pd.Timestamp.now().strftime('%Y-%m-%d %H:%M:%S')}\")"
+ "# Print formatted summary\n",
+ "print_metrics_summary(comprehensive_report)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
- "---\n",
+ "### Strategic Recommendations\n",
+ "\n",
+ "Based on the comprehensive analysis, here are key strategic recommendations:\n",
"\n",
- "## Analysis Configuration Summary\n",
+ "#### Revenue Optimization\n",
+ "- **Focus on High-Performing Categories**: Invest marketing resources in top revenue-generating product categories\n",
+ "- **Average Order Value Enhancement**: Implement cross-selling and upselling strategies to increase AOV\n",
+ "- **Seasonal Planning**: Develop targeted campaigns for months showing consistent growth patterns\n",
"\n",
- "This notebook provides a comprehensive, configurable framework for e-commerce business analysis. Key features:\n",
+ "#### Geographic Expansion\n",
+ "- **Market Penetration**: Focus on underperforming states with high potential\n",
+ "- **Regional Customization**: Tailor product offerings based on regional preferences and performance\n",
+ "- **Logistics Optimization**: Improve delivery networks in high-revenue states\n",
"\n",
- "- **Configurable Time Periods**: Easily adjust analysis and comparison years\n",
- "- **Modular Architecture**: Reusable data loading and metrics calculation modules\n",
- "- **Comprehensive Metrics**: Revenue, product, geographic, and customer experience analysis\n",
- "- **Visual Insights**: Interactive charts and geographic visualizations\n",
- "- **Strategic Recommendations**: Data-driven business insights and action items\n",
+ "#### Customer Experience Enhancement\n",
+ "- **Delivery Speed Improvement**: Prioritize faster delivery options to boost customer satisfaction\n",
+ "- **Quality Control**: Address factors contributing to low review scores (1-2 stars)\n",
+ "- **Customer Retention**: Develop loyalty programs for high-satisfaction customers\n",
"\n",
- "### Next Steps\n",
+ "#### Operational Efficiency\n",
+ "- **Fulfillment Optimization**: Reduce delivery times, especially for orders taking more than 7 days\n",
+ "- **Inventory Management**: Ensure adequate stock for high-performing product categories\n",
+ "- **Process Improvement**: Streamline order processing to reduce cancellation rates\n",
"\n",
- "1. **Regular Monitoring**: Schedule monthly/quarterly runs of this analysis\n",
- "2. **Deeper Segmentation**: Analyze specific customer segments or product lines\n",
- "3. **Predictive Analytics**: Implement forecasting models for future planning\n",
- "4. **A/B Testing**: Design experiments to test strategic recommendations\n",
- "5. **Real-time Dashboards**: Create live dashboards for ongoing monitoring\n",
+ "### Key Performance Indicators (KPIs) to Monitor\n",
+ "\n",
+ "1. **Revenue Growth Rate**: Target consistent month-over-month growth\n",
+ "2. **Customer Satisfaction Score**: Maintain average review score above 4.0\n",
+ "3. **Delivery Performance**: Achieve >30% of orders delivered within 3 days\n",
+ "4. **Order Fulfillment Rate**: Maintain >95% successful delivery rate\n",
+ "5. **Geographic Diversification**: Reduce revenue concentration risk across states"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Next Steps and Action Items\n",
+ "\n",
+ "1. **Data Collection Enhancement**\n",
+ " - Implement more granular tracking for customer journey analysis\n",
+ " - Add product profitability data for comprehensive category analysis\n",
+ " - Include customer acquisition costs for ROI calculations\n",
+ "\n",
+ "2. **Analysis Automation**\n",
+ " - Schedule regular execution of this analysis for ongoing monitoring\n",
+ " - Set up alerting for significant performance changes\n",
+ " - Create executive dashboards for real-time KPI tracking\n",
+ "\n",
+ "3. **Advanced Analytics**\n",
+ " - Implement predictive models for demand forecasting\n",
+ " - Develop customer segmentation for targeted marketing\n",
+ " - Create churn prediction models for retention strategies\n",
"\n",
"---\n",
"\n",
- "*This analysis framework is designed to be easily maintained and extended for future business intelligence needs.*"
+ "**Analysis Configuration Summary:**\n",
+ "- **Primary Analysis Period:** {ANALYSIS_YEAR}\n",
+ "- **Comparison Period:** {COMPARISON_YEAR}\n",
+ "- **Data Source:** {DATA_PATH}\n",
+ "- **Analysis Scope:** {'Full Year' if ANALYSIS_MONTH is None else f'Month {ANALYSIS_MONTH}'}\n",
+ "\n",
+ "*This analysis was generated using a configurable framework that can be easily adapted for different time periods and business requirements.*"
]
}
],
@@ -1998,9 +779,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
- "version": "3.13.5"
+ "version": "3.8.5"
}
},
"nbformat": 4,
"nbformat_minor": 4
-}
+}
\ No newline at end of file
diff --git a/lesson7_files/README.md b/lesson7_files/README.md
index ce2b484..86be9b8 100644
--- a/lesson7_files/README.md
+++ b/lesson7_files/README.md
@@ -77,7 +77,12 @@ data_analysis/
4. **Run the applications**:
- **For Streamlit Dashboard**:
+ **For Professional Dashboard (NEW)**:
+ ```bash
+ streamlit run dashboard_new.py
+ ```
+
+ **For Original Dashboard**:
```bash
streamlit run dashboard.py
```
@@ -89,30 +94,93 @@ data_analysis/
## Usage Guide
-### Streamlit Dashboard
+### Professional Streamlit Dashboard (NEW)
+
+#### dashboard_new.py - Converted from EDA_Refactored.ipynb
+
+1. **Launch the professional dashboard**:
+ ```bash
+ streamlit run dashboard_new.py
+ ```
+
+2. **Professional Dashboard Layout**:
+ The new dashboard follows the exact layout specifications with professional styling:
+
+ **Header Section**:
+ - **Title**: "E-commerce Analytics Dashboard" (left-aligned)
+ - **Year Filter**: Dropdown selector (right-aligned) that applies globally to all visualizations
+
+ **KPI Row** (4 cards with uniform heights):
+ - **Total Revenue**: with trend indicator vs previous year (green/red arrows)
+ - **Monthly Growth**: average month-over-month growth percentage
+ - **Average Order Value**: with trend indicator vs previous year (green/red arrows)
+ - **Total Orders**: with trend indicator vs previous year (green/red arrows)
+
+ **Charts Grid** (2x2 layout):
+ - **Revenue Trend**: Line chart with solid line (current year) vs dashed line (previous year), grid lines, Y-axis as $300K format
+ - **Top 10 Categories**: Horizontal bar chart with blue gradient, sorted descending, values as $300K/$2M
+ - **Revenue by State**: US Choropleth map with blue color scale
+ - **Satisfaction vs Delivery**: Bar chart showing review scores by delivery time buckets (1-3 days, 4-7 days, 8+ days)
+
+ **Bottom Row** (2 cards with uniform heights):
+ - **Average Delivery Time**: Large number with trend indicator
+ - **Review Score**: Large number with star rating display and "Average Review Score" subtitle
+
+3. **Professional Features**:
+ - **Real-time filtering**: All charts update automatically when year is changed
+ - **Trend indicators**: Green/red arrows with two decimal places showing performance changes
+ - **Professional styling**: Clean, business-ready interface without icons
+ - **Plotly charts**: All visualizations use Plotly for professional presentation
+ - **Formatted values**: Currency displayed as $300K, $2M for readability
+ - **Grid lines**: Added to charts for easier reading
+ - **Uniform card heights**: Consistent sizing across rows
+ - **Data caching**: Optimized performance with Streamlit caching
+
+### Original Streamlit Dashboard
-1. **Launch the dashboard**:
+#### dashboard.py - Original Implementation
+
+1. **Launch the original dashboard**:
```bash
streamlit run dashboard.py
```
-2. **Navigate the interface**:
- - Use the **year filter** in the top-right to select analysis period
- - View **KPI cards** showing key metrics with trend indicators
- - Explore **interactive charts** in the 2x2 grid layout
- - Monitor **customer experience metrics** in the bottom row
+2. **Original Dashboard Layout**:
+ The dashboard follows a professional layout with these sections:
+
+ **Header Row**:
+ - **Title**: "E-commerce Analytics Dashboard" (left-aligned)
+ - **Year Filter**: Dropdown selector (right-aligned) that applies globally to all visualizations
+
+ **KPI Row** (4 cards):
+ - **Total Revenue**: with trend indicator vs previous year
+ - **Monthly Growth**: average month-over-month growth percentage
+ - **Average Order Value**: with trend indicator vs previous year
+ - **Total Orders**: with trend indicator vs previous year
+
+ **Charts Grid** (2x2 layout):
+ - **Revenue Trend**: Line chart showing current year (solid line) vs previous year (dashed line)
+ - **Top 10 Categories**: Horizontal bar chart with blue gradient, sorted descending
+ - **Revenue by State**: US Choropleth map with blue color scale
+ - **Satisfaction vs Delivery**: Bar chart showing review scores by delivery time buckets
+
+ **Bottom Row** (2 cards):
+ - **Average Delivery Time**: with trend indicator
+ - **Review Score**: Large number with star rating display
3. **Dashboard Features**:
- **Real-time filtering**: All charts update automatically when year is changed
- - **Professional styling**: Clean, business-ready interface
- - **Trend indicators**: Green/red arrows showing performance changes
- - **Formatted values**: Currency displayed as $300K, $2M for readability
+ - **Professional styling**: Clean, business-ready interface with uniform card heights
+ - **Trend indicators**: Green/red arrows with two decimal places showing performance changes
+ - **Formatted values**: Currency displayed as $300K, $2M for readability (no icons used)
+ - **Interactive charts**: All visualizations built with Plotly for professional presentation
+ - **Grid lines**: Added to charts for easier reading as specified
### Notebook Analysis
1. **Open the refactored notebook**: `EDA_Refactored.ipynb`
-2. **Configure analysis parameters** in the first code cell:
+2. **Configure analysis parameters** in the Configuration & Setup section:
```python
ANALYSIS_YEAR = 2023 # Year to analyze
COMPARISON_YEAR = 2022 # Comparison year (optional)
@@ -122,6 +190,37 @@ data_analysis/
3. **Run all cells** to generate the complete analysis
+#### EDA_Refactored.ipynb Features
+
+The refactored notebook provides a professional, well-structured analysis with:
+
+**Notebook Structure:**
+- **Table of Contents**: Clickable navigation between sections
+- **Introduction & Business Objectives**: Clear analysis goals and KPIs
+- **Configuration & Setup**: Flexible parameters for any time period
+- **Data Dictionary**: Comprehensive explanation of business terms and metrics
+- **Modular Analysis Sections**: Revenue, Product, Geographic, and Customer Experience
+- **Executive Summary**: Strategic recommendations and actionable insights
+
+**Technical Improvements:**
+- **Configurable Framework**: Easily analyze any year/month combination
+- **Module Integration**: Uses existing data_loader.py and business_metrics.py
+- **Clean Code**: Eliminates pandas warnings and follows best practices
+- **Professional Visualizations**: Business-ready charts with proper formatting
+- **Comprehensive Documentation**: Detailed explanations and insights
+
+**Analysis Sections:**
+1. **Revenue Performance**: Total revenue, growth rates, monthly trends, AOV analysis
+2. **Product Categories**: Top performers, revenue share, market analysis
+3. **Geographic Performance**: State-level revenue, market penetration, choropleth maps
+4. **Customer Experience**: Satisfaction scores, delivery performance, correlation analysis
+
+**Key Benefits:**
+- **Reusable**: Works with any date range without code modifications
+- **Maintainable**: Clear structure for future analysts
+- **Business-Oriented**: Focuses on actionable insights rather than technical details
+- **Professional Output**: Publication-ready visualizations and reports
+
### Advanced Configuration
#### Analyzing Specific Time Periods
@@ -282,26 +381,29 @@ def plot_custom_metric(self, data):
## Dashboard Features
-### Layout Structure
-- **Header**: Title with year selection filter (applies globally)
-- **KPI Row**: 4 metric cards with trend indicators
+### Layout Structure (Exact Implementation)
+- **Header**: Title left-aligned, Year filter right-aligned (applies globally to all charts)
+- **KPI Row**: 4 metric cards with trend indicators showing two decimal places
- Total Revenue, Monthly Growth, Average Order Value, Total Orders
- Color-coded trends (green for positive, red for negative)
-- **Charts Grid**: 2x2 interactive visualization layout
- - Revenue trend (current vs previous year)
- - Top 10 product categories bar chart
- - US state choropleth map
- - Customer satisfaction vs delivery time analysis
-- **Bottom Row**: Customer experience metrics
- - Average delivery time with trend
- - Review score with star rating
+ - Uniform card heights maintained across the row
+- **Charts Grid**: 2x2 interactive Plotly visualization layout
+ - Revenue trend with solid line (current) and dashed line (previous year)
+ - Top 10 product categories horizontal bar chart with blue gradient
+ - US state choropleth map with blue color scale
+ - Customer satisfaction vs delivery time bar chart (1-3 days, 4-7 days, 8+ days buckets)
+- **Bottom Row**: Customer experience metrics with uniform card heights
+ - Average delivery time with trend indicator
+ - Review score with large number display and star rating
### Technical Features
-- **Real-time Filtering**: All visualizations update automatically
-- **Professional Styling**: Business-ready interface with uniform card heights
+- **Y-axis Formatting**: Currency values formatted as $300K, $2M instead of $300,000
+- **No Icons**: Professional styling without icon usage as specified
+- **Grid Lines**: Added to charts for easier reading
+- **Real-time Filtering**: All visualizations update automatically when year filter changes
+- **Professional Styling**: Business-ready interface with consistent formatting
- **Plotly Charts**: Interactive, publication-quality visualizations
-- **Responsive Design**: Adapts to different screen sizes
-- **Error Handling**: Graceful handling of missing data
+- **Error Handling**: Graceful handling of missing data with appropriate fallbacks
## Future Enhancements
diff --git a/lesson7_files/convert-to-dashboard.md b/lesson7_files/convert-to-dashboard.md
new file mode 100644
index 0000000..b246986
--- /dev/null
+++ b/lesson7_files/convert-to-dashboard.md
@@ -0,0 +1,60 @@
+# Streamlit Dashboard Conversion Requirements
+
+Convert @EDA_Refactored.ipynb into a professional Streamlit dashboard with the following exact layout.
+
+
+## Layout Structure
+
+### **Header**
+- Title + date range filter (applies globally)
+ - **Title**: aligned left
+ - **Date range filter**: aligned right
+
+### **KPI Row**
+- 4 cards: **Total Revenue**, **Monthly Growth**, **Average Order Value**, **Total Orders**
+- Trend indicators for:
+ - Total Revenue
+ - Average Order Value
+ - Total Orders
+- Color coding:
+ - Red for negative trends
+ - Green for positive trends
+
+### **Charts Grid** (2x2 layout)
+
+#### Revenue Trend Line Chart
+- Solid line for the current period
+- Dashed line for the previous period
+- Add grid lines for easier reading
+- Y-axis formatting: `$300K` instead of `$300,000`
+
+#### Top 10 Categories Bar Chart
+- Sorted descending
+- Blue gradient (light shade for lower values)
+- Value formatting: `$300K` and `$2M`
+
+#### Revenue by State (US Choropleth Map)
+- Color-coded by revenue amount
+- Blue gradient
+
+#### Satisfaction vs Delivery Time (Bar Chart)
+- **X-axis**: Delivery time buckets (1–3 days, 4–7 days, etc.)
+- **Y-axis**: Average review score
+
+### **Bottom Row**
+- **Average Delivery Time** with trend indicator
+- **Review Score**:
+ - Large number with stars
+ - Subtitle: `"Average Review Score"`
+
+
+## Key Requirements
+- Use **Plotly** for all charts
+- Ensure charts update correctly when filters change
+- Apply professional styling with trend arrows and colors
+- **Do not use icons**
+- Maintain **uniform card heights** for each row
+- Show **two decimal places** for each trend indicator
+- Include:
+ - `requirements.txt`
+ - `README.md`
\ No newline at end of file
diff --git a/lesson7_files/dashboard.py b/lesson7_files/dashboard.py
index 87bacca..0ed15d4 100644
--- a/lesson7_files/dashboard.py
+++ b/lesson7_files/dashboard.py
@@ -186,12 +186,25 @@ def create_revenue_trend_chart(current_data, previous_data, current_year, previo
yaxis_title="Revenue"
)
+ # Custom Y-axis formatting function
+ def format_yticks(value):
+ if value >= 1e6:
+ return f"${value/1e6:.0f}M"
+ elif value >= 1e3:
+ return f"${value/1e3:.0f}K"
+ else:
+ return f"${value:.0f}"
+
fig.update_layout(
showlegend=True,
hovermode='x unified',
plot_bgcolor='white',
xaxis=dict(showgrid=True, gridcolor='#f0f0f0'),
- yaxis=dict(showgrid=True, gridcolor='#f0f0f0', tickformat='$,.0f'),
+ yaxis=dict(
+ showgrid=True,
+ gridcolor='#f0f0f0',
+ tickformat='$,.0s' # This will format as K/M
+ ),
height=350,
margin=dict(t=50, b=50, l=50, r=50)
)
@@ -230,7 +243,7 @@ def create_category_chart(sales_data):
xaxis_title="Revenue",
yaxis_title="",
plot_bgcolor='white',
- xaxis=dict(showgrid=True, gridcolor='#f0f0f0', tickformat='$,.0f'),
+ xaxis=dict(showgrid=True, gridcolor='#f0f0f0', tickformat='$,.0s'),
yaxis=dict(showgrid=False),
height=350,
margin=dict(t=50, b=50, l=150, r=50)
@@ -337,14 +350,14 @@ def main():
st.error("Failed to load data. Please check your data files.")
return
- # Header with title and date filters
- col1, col2, col3 = st.columns([2, 1, 1])
+ # Header with title and date filters - aligned as requested
+ col1, spacer, col2 = st.columns([3, 1, 2])
with col1:
- st.title("📊 E-commerce Analytics Dashboard")
+ st.title("E-commerce Analytics Dashboard")
with col2:
- # Get available years from data
+ # Year filter aligned to the right
orders_data = processed_data['orders']
available_years = sorted(orders_data['purchase_year'].unique(), reverse=True)
@@ -359,19 +372,9 @@ def main():
index=default_year_index,
key="year_filter"
)
-
- with col3:
- # Month filter
- month_options = ['All Months'] + [f'Month {i}' for i in range(1, 13)]
- selected_month_display = st.selectbox(
- "Select Month",
- options=month_options,
- index=0,
- key="month_filter"
- )
- # Convert display to actual month number
- selected_month = None if selected_month_display == 'All Months' else int(selected_month_display.split(' ')[1])
+ # Month filter removed as per requirements - using year only
+ selected_month = None
# Create datasets based on selected year and month
current_data = loader.create_sales_dataset(
@@ -504,16 +507,16 @@ def main():
""", unsafe_allow_html=True)
with bottom_col2:
- # Review score
+ # Review Score with large number and stars as specified
if 'review_score' in current_data.columns:
avg_review = current_data['review_score'].mean()
- stars = "★" * int(round(avg_review))
+ stars = "★" * int(round(avg_review)) + "☆" * (5 - int(round(avg_review)))
st.markdown(f"""
-
Average Review Score
-
{avg_review:.1f}/5.0
-
{stars}
+
{avg_review:.1f}
+
{stars}
+
Average Review Score
""", unsafe_allow_html=True)
else:
diff --git a/lesson7_files/dashboard_new.py b/lesson7_files/dashboard_new.py
new file mode 100644
index 0000000..4817047
--- /dev/null
+++ b/lesson7_files/dashboard_new.py
@@ -0,0 +1,493 @@
+"""
+Professional E-commerce Analytics Dashboard
+Converted from EDA_Refactored.ipynb with exact layout specifications
+"""
+
+import streamlit as st
+import pandas as pd
+import numpy as np
+import plotly.express as px
+import plotly.graph_objects as go
+from plotly.subplots import make_subplots
+import warnings
+from datetime import datetime, date
+from data_loader import EcommerceDataLoader, load_and_process_data
+from business_metrics import BusinessMetricsCalculator
+
+# Configure page
+st.set_page_config(
+ page_title="E-commerce Analytics Dashboard",
+ page_icon="📊",
+ layout="wide",
+ initial_sidebar_state="collapsed"
+)
+
+# Custom CSS for professional styling
+st.markdown("""
+
+""", unsafe_allow_html=True)
+
+# Initialize session state
+if 'data_loaded' not in st.session_state:
+ st.session_state.data_loaded = False
+
+@st.cache_data
+def load_dashboard_data():
+ """Load and cache dashboard data"""
+ loader = EcommerceDataLoader('ecommerce_data/')
+ loader.load_raw_data()
+ processed_data = loader.process_all_data()
+ sales_data = loader.create_sales_dataset(
+ year_filter=None,
+ month_filter=None,
+ status_filter='delivered'
+ )
+ return sales_data
+
+def format_currency(value, compact=True):
+ """Format currency values"""
+ if compact:
+ if value >= 1_000_000:
+ return f"${value/1_000_000:.1f}M"
+ elif value >= 1_000:
+ return f"${value/1_000:.0f}K"
+ else:
+ return f"${value:.0f}"
+ else:
+ return f"${value:,.2f}"
+
+def format_trend_indicator(current, previous):
+ """Format trend indicator with arrow and color"""
+ if previous == 0:
+ return "N/A", "color: #6c757d;"
+
+ change = ((current - previous) / previous) * 100
+ arrow = "↑" if change >= 0 else "↓"
+ color = "color: #28a745;" if change >= 0 else "color: #dc3545;"
+
+ return f"{arrow} {change:.2f}%", color
+
+def create_revenue_trend_chart(sales_data, current_year, previous_year):
+ """Create revenue trend line chart"""
+ # Current year data
+ current_data = sales_data[sales_data['purchase_year'] == current_year]
+ current_monthly = current_data.groupby('purchase_month')['price'].sum().reset_index()
+ current_monthly.columns = ['Month', 'Revenue']
+
+ # Previous year data
+ previous_data = sales_data[sales_data['purchase_year'] == previous_year]
+ previous_monthly = previous_data.groupby('purchase_month')['price'].sum().reset_index()
+ previous_monthly.columns = ['Month', 'Revenue']
+
+ fig = go.Figure()
+
+ # Current year (solid line)
+ fig.add_trace(go.Scatter(
+ x=current_monthly['Month'],
+ y=current_monthly['Revenue'],
+ mode='lines+markers',
+ name=f'{current_year}',
+ line=dict(color='#1f77b4', width=3),
+ marker=dict(size=8)
+ ))
+
+ # Previous year (dashed line)
+ fig.add_trace(go.Scatter(
+ x=previous_monthly['Month'],
+ y=previous_monthly['Revenue'],
+ mode='lines+markers',
+ name=f'{previous_year}',
+ line=dict(color='#ff7f0e', width=3, dash='dash'),
+ marker=dict(size=8)
+ ))
+
+ fig.update_layout(
+ title='Monthly Revenue Trend',
+ xaxis_title='Month',
+ yaxis_title='Revenue',
+ showlegend=True,
+ height=400,
+ xaxis=dict(showgrid=True, gridcolor='rgba(0,0,0,0.1)'),
+ yaxis=dict(
+ showgrid=True,
+ gridcolor='rgba(0,0,0,0.1)',
+ tickformat='$,.0s'
+ ),
+ plot_bgcolor='white'
+ )
+
+ return fig
+
+def create_top_categories_chart(sales_data, year):
+ """Create top 10 categories pie chart"""
+ year_data = sales_data[sales_data['purchase_year'] == year]
+
+ if 'product_category_name' not in year_data.columns:
+ fig = go.Figure()
+ fig.add_annotation(text="Product category data not available",
+ x=0.5, y=0.5, showarrow=False)
+ return fig
+
+ category_revenue = year_data.groupby('product_category_name')['price'].sum().sort_values(ascending=False).head(10)
+
+ # Create labels with formatted category names
+ labels = [cat.replace('_', ' ').title() for cat in category_revenue.index]
+
+ # Create diverse color palette for pie chart
+ colors = px.colors.qualitative.Set3[:len(category_revenue)]
+
+ fig = go.Figure(data=[go.Pie(
+ labels=labels,
+ values=category_revenue.values,
+ marker=dict(colors=colors),
+ textinfo='label+percent',
+ textposition='auto',
+ hovertemplate='%{label} Revenue: %{customdata} Percentage: %{percent} ',
+ customdata=[format_currency(val) for val in category_revenue.values]
+ )])
+
+ fig.update_layout(
+ title='Top 10 Product Categories by Revenue',
+ height=400,
+ showlegend=False, # Remove legend since labels are on the pie slices
+ plot_bgcolor='white'
+ )
+
+ return fig
+
+def create_state_map(sales_data, year):
+ """Create US choropleth map"""
+ year_data = sales_data[sales_data['purchase_year'] == year]
+
+ if 'customer_state' not in year_data.columns:
+ fig = go.Figure()
+ fig.add_annotation(text="Geographic data not available",
+ x=0.5, y=0.5, showarrow=False)
+ return fig
+
+ state_revenue = year_data.groupby('customer_state')['price'].sum().reset_index()
+ state_revenue.columns = ['State', 'Revenue']
+
+ fig = px.choropleth(
+ state_revenue,
+ locations='State',
+ color='Revenue',
+ locationmode='USA-states',
+ scope='usa',
+ title='Revenue by State',
+ color_continuous_scale='Blues',
+ labels={'Revenue': 'Revenue ($)'}
+ )
+
+ fig.update_layout(
+ height=400,
+ geo=dict(showframe=False, showcoastlines=True)
+ )
+
+ return fig
+
+def create_satisfaction_delivery_chart(sales_data, year):
+ """Create satisfaction vs delivery time chart"""
+ year_data = sales_data[sales_data['purchase_year'] == year]
+
+ if 'review_score' not in year_data.columns or 'delivery_days' not in year_data.columns:
+ fig = go.Figure()
+ fig.add_annotation(text="Review or delivery data not available",
+ x=0.5, y=0.5, showarrow=False)
+ return fig
+
+ # Remove duplicates for order-level analysis
+ order_data = year_data.drop_duplicates('order_id').copy()
+ order_data = order_data.dropna(subset=['delivery_days', 'review_score'])
+
+ # Create delivery time buckets
+ order_data['delivery_bucket'] = pd.cut(
+ order_data['delivery_days'],
+ bins=[0, 3, 7, float('inf')],
+ labels=['1-3 days', '4-7 days', '8+ days']
+ )
+
+ satisfaction_by_speed = order_data.groupby('delivery_bucket')['review_score'].mean().reset_index()
+
+ fig = go.Figure(go.Bar(
+ x=satisfaction_by_speed['delivery_bucket'],
+ y=satisfaction_by_speed['review_score'],
+ marker_color=['#2ca02c', '#ff7f0e', '#d62728'],
+ text=[f'{score:.2f}' for score in satisfaction_by_speed['review_score']],
+ textposition='outside'
+ ))
+
+ fig.update_layout(
+ title='Average Review Score by Delivery Speed',
+ xaxis_title='Delivery Time',
+ yaxis_title='Average Review Score',
+ height=400,
+ yaxis=dict(range=[0, 5]),
+ plot_bgcolor='white'
+ )
+
+ return fig
+
+def main():
+ # Load data
+ if not st.session_state.data_loaded:
+ with st.spinner("Loading data..."):
+ sales_data = load_dashboard_data()
+ st.session_state.sales_data = sales_data
+ st.session_state.data_loaded = True
+
+ sales_data = st.session_state.sales_data
+
+ # Available years for filtering
+ available_years = sorted(sales_data['purchase_year'].unique(), reverse=True)
+
+ # Header with title and date filter
+ col1, col2 = st.columns([3, 1])
+
+ with col1:
+ st.markdown('E-commerce Analytics Dashboard
', unsafe_allow_html=True)
+
+ with col2:
+ # Set 2023 as default year
+ default_index = 0
+ if 2023 in available_years:
+ default_index = available_years.index(2023)
+
+ selected_year = st.selectbox(
+ "Select Year",
+ available_years,
+ index=default_index,
+ key="year_filter"
+ )
+
+ current_year = selected_year
+ previous_year = current_year - 1 if current_year - 1 in available_years else None
+
+ # Initialize metrics calculator
+ metrics_calculator = BusinessMetricsCalculator(sales_data)
+
+ # Calculate metrics
+ revenue_metrics = metrics_calculator.calculate_revenue_metrics(current_year, previous_year)
+ monthly_trends = metrics_calculator.calculate_monthly_trends(current_year)
+ satisfaction_metrics = metrics_calculator.analyze_customer_satisfaction(current_year)
+ delivery_metrics = metrics_calculator.analyze_delivery_performance(current_year)
+
+ # KPI Row - 4 cards
+ st.markdown("### Key Performance Indicators")
+ kpi_col1, kpi_col2, kpi_col3, kpi_col4 = st.columns(4)
+
+ with kpi_col1:
+ trend_text, trend_color = format_trend_indicator(
+ revenue_metrics['total_revenue'],
+ revenue_metrics.get('previous_year_revenue', 0)
+ ) if previous_year else ("N/A", "color: #6c757d;")
+
+ st.markdown(f"""
+
+
Total Revenue
+
{format_currency(revenue_metrics['total_revenue'])}
+
{trend_text}
+
+ """, unsafe_allow_html=True)
+
+ with kpi_col2:
+ avg_growth = monthly_trends['revenue_growth'].mean()
+ st.markdown(f"""
+
+
Monthly Growth
+
{avg_growth:.2f}%
+
Average
+
+ """, unsafe_allow_html=True)
+
+ with kpi_col3:
+ trend_text, trend_color = format_trend_indicator(
+ revenue_metrics['average_order_value'],
+ revenue_metrics.get('previous_year_aov', 0)
+ ) if previous_year else ("N/A", "color: #6c757d;")
+
+ st.markdown(f"""
+
+
Average Order Value
+
{format_currency(revenue_metrics['average_order_value'], False)}
+
{trend_text}
+
+ """, unsafe_allow_html=True)
+
+ with kpi_col4:
+ trend_text, trend_color = format_trend_indicator(
+ revenue_metrics['total_orders'],
+ revenue_metrics.get('previous_year_orders', 0)
+ ) if previous_year else ("N/A", "color: #6c757d;")
+
+ st.markdown(f"""
+
+
Total Orders
+
{revenue_metrics['total_orders']:,}
+
{trend_text}
+
+ """, unsafe_allow_html=True)
+
+ st.markdown(" ", unsafe_allow_html=True)
+
+ # Charts Grid - 2x2 layout
+ st.markdown("### Performance Analytics")
+
+ chart_row1_col1, chart_row1_col2 = st.columns(2)
+ chart_row2_col1, chart_row2_col2 = st.columns(2)
+
+ with chart_row1_col1:
+ if previous_year:
+ fig_revenue = create_revenue_trend_chart(sales_data, current_year, previous_year)
+ st.plotly_chart(fig_revenue, use_container_width=True)
+ else:
+ st.info("Previous year data not available for trend comparison")
+
+ with chart_row1_col2:
+ fig_categories = create_top_categories_chart(sales_data, current_year)
+ st.plotly_chart(fig_categories, use_container_width=True)
+
+ with chart_row2_col1:
+ fig_map = create_state_map(sales_data, current_year)
+ st.plotly_chart(fig_map, use_container_width=True)
+
+ with chart_row2_col2:
+ fig_satisfaction = create_satisfaction_delivery_chart(sales_data, current_year)
+ st.plotly_chart(fig_satisfaction, use_container_width=True)
+
+ st.markdown(" ", unsafe_allow_html=True)
+
+ # Bottom Row - 2 cards
+ st.markdown("### Customer Experience Metrics")
+ bottom_col1, bottom_col2 = st.columns(2)
+
+ with bottom_col1:
+ if 'error' not in delivery_metrics:
+ # Calculate trend for delivery time (assuming previous year comparison)
+ if previous_year:
+ prev_delivery = metrics_calculator.analyze_delivery_performance(previous_year)
+ if 'error' not in prev_delivery:
+ trend_text, trend_color = format_trend_indicator(
+ delivery_metrics['avg_delivery_days'],
+ prev_delivery['avg_delivery_days']
+ )
+ else:
+ trend_text, trend_color = "N/A", "color: #6c757d;"
+ else:
+ trend_text, trend_color = "N/A", "color: #6c757d;"
+
+ st.markdown(f"""
+
+
Average Delivery Time
+
{delivery_metrics['avg_delivery_days']:.1f} days
+
{trend_text}
+
+ """, unsafe_allow_html=True)
+ else:
+ st.markdown("""
+
+
Average Delivery Time
+
N/A
+
Data not available
+
+ """, unsafe_allow_html=True)
+
+ with bottom_col2:
+ if 'error' not in satisfaction_metrics:
+ # Create star rating display
+ score = satisfaction_metrics['avg_review_score']
+ full_stars = int(score)
+ partial_star = score - full_stars
+ stars = "★" * full_stars
+ if partial_star >= 0.5:
+ stars += "☆"
+
+ st.markdown(f"""
+
+
{score:.2f}
+
{stars}
+
Average Review Score
+
+ """, unsafe_allow_html=True)
+ else:
+ st.markdown("""
+
+
N/A
+
☆☆☆☆☆
+
Average Review Score
+
+ """, unsafe_allow_html=True)
+
+if __name__ == "__main__":
+ main()
\ No newline at end of file
diff --git a/lesson7_files/prompt.md b/lesson7_files/prompt.md
new file mode 100644
index 0000000..8df2b89
--- /dev/null
+++ b/lesson7_files/prompt.md
@@ -0,0 +1,58 @@
+# EDA Notebook Refactoring Plan
+
+The @EDA.ipynb contains exploratory data analysis on e-commerce data in @ecommerce_data, focusing on sales metrics for 2023.
+Keep the same analysis and graphs, and improve the structure and documentation of the notebook. 💡
+
+## Review Checklist
+
+Identify:
+- What business metrics are currently calculated
+- What visualizations are created
+- What data transformations are performed
+- Any code quality issues or inefficiencies
+
+
+## **Refactoring Requirements**
+
+### 1. Notebook Structure & Documentation
+- Add proper documentation and markdown cells with clear headers and a brief explanation for each section
+- Organize into logical sections:
+ 1. **Introduction & Business Objectives**
+ 2. **Data Loading & Configuration**
+ 3. **Data Preparation & Transformation**
+ 4. **Business Metrics Calculation** (revenue, product, geographic, customer experience analysis)
+ 5. **Summary of Observations**
+- Add table of contents at the beginning
+- Include data dictionary explaining key columns and business terms
+
+
+### 2. Code Quality Improvements
+- Create reusable functions with docstrings
+- Implement consistent naming and formatting
+- Create separate Python files:
+ - `business_metrics.py` → business metric calculations only
+ - `data_loader.py` → loading, processing, and cleaning the data
+
+
+### 3. Enhanced Visualizations
+Improve all plots with:
+- Clear and descriptive titles
+- Proper axis labels with units
+- Legends where needed
+- Appropriate chart types for the data
+- Include date range in plot titles or captions
+- Use consistent, business-oriented color schemes
+
+
+### 4. Configurable Analysis Framework
+- Current notebook computes metrics for a fixed date range (entire year of 2023 compared to 2022)
+- Refactor to:
+ - Filter data by configurable **month** and **year**
+ - Implement general-purpose metric calculations
+
+
+## **Deliverables Expected**
+- Refactored Jupyter notebook: `EDA_Refactored.ipynb` with all improvements
+- Business metrics module: `business_metrics.py` with documented functions
+- Requirements file: `requirements.txt` listing all dependencies
+- `README.md` section explaining how to use the refactored analysis
\ No newline at end of file
diff --git a/lesson7_files/requirements.txt b/lesson7_files/requirements.txt
index 54ae25b..01b4beb 100644
--- a/lesson7_files/requirements.txt
+++ b/lesson7_files/requirements.txt
@@ -5,4 +5,7 @@ seaborn>=0.11.0
plotly>=5.0.0
streamlit>=1.28.0
jupyter>=1.0.0
-ipykernel>=6.0.0
\ No newline at end of file
+ipykernel>=6.0.0
+nbformat>=5.0.0
+scipy>=1.9.0
+altair>=4.2.0
\ No newline at end of file