Vendor & Sales Analytics

Author

Christian Lira Gonzalez

Stack: T-SQL · Python · pandas · matplotlib
Source: Northwind Database (SQL Server 2022)

1 Executive Summary

This analysis addresses six business questions across three operational domains using Northwind transaction and fulfillment data. The goal is to equip procurement, finance, and logistics teams with evidence-based insights to prioritize supplier relationships, monitor revenue momentum, and evaluate carrier reliability.

  1. Which suppliers drive the most revenue?
  2. How does each supplier rank within their product category?
  3. How has monthly revenue trended?
  4. What is the month-over-month growth rate and 3-month rolling average?
  5. Are orders being shipped on time?
  6. Which shippers have the longest delays?

Key findings:

2 Business Context

Northwind is a fictional wholesale food distributor. Understanding supplier revenue contribution helps procurement teams prioritize vendor relationships, negotiate contracts, and identify concentration risk, where revenue depends heavily on a small number of suppliers.

3 Section 1: Supplier Revenue Analysis

3.1 Business Questions

  1. Which suppliers drive the most revenue?
  2. How does each supplier rank within their product category?

3.2 Context

For a wholesale distributor like Northwind, supplier relationships are the foundation of the business. Not all suppliers contribute equally, a small number typically drive the majority of revenue, while others serve niche product categories with lower overall volume.

Understanding supplier revenue concentration helps procurement teams:

  • Prioritize vendor relationships and contract negotiations
  • Identify dependency risk where revenue relies on few suppliers
  • Compare supplier performance within the same product category, a supplier may rank low overall but dominate their category

3.3 Data Model

This analysis joins three tables:

Table Role
Suppliers Vendor identity: SupplierID, CompanyName
Products Catalog: links suppliers to categories
Order Details Line-item transactions: quantity, unit price, discount
Categories Category names: Beverages, Dairy, Seafood, etc.

3.4 Revenue Formula

Revenue is calculated at the line-item level: net_revenue = unit_price × quantity × (1 - discount)

Discount is captured separately to show both the gross revenue potential and the actual net revenue per supplier after markdowns.


3.5 Query Design

The query uses a 3-CTE pattern each step has a single responsibility, making the logic easier to read, debug, and extend.

CTE Responsibility
SupplierProducts Resolves supplier → product → category relationships
SupplierOrders Flattens to order line items with pricing
SupplierTotals Aggregates revenue, quantity, and discount per supplier

Two window functions are applied on the aggregated results:

  • RANK() OVER (ORDER BY total_order_revenue DESC) overall rank across all suppliers
  • RANK() OVER (PARTITION BY CategoryName ORDER BY total_order_revenue DESC) rank within each product category

The PARTITION BY CategoryName is the key analytical addition it resets the rank for each category independently, allowing direct comparison of suppliers competing in the same product segment.

Code
#Load libraries
import os
from pathlib import Path
import pyodbc
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
import matplotlib.dates as mdates
from matplotlib.patches import Patch
Code
# Establish All Database Connections
conn = pyodbc.connect(
    "DRIVER={ODBC Driver 17 for SQL Server};"
    "SERVER=localhost;"
    "DATABASE=Northwind;"
    "Trusted_Connection=yes;"
)

# Import supplier_ranking query from SQL files in dir
supplier_ranking_query = Path("sql/01_supplier_ranking.sql").read_text()
supplier_ranking = pd.read_sql(supplier_ranking_query, conn)

# Import sales_trends query from SQL files in dir
sales_trends = Path("sql/02_sales_trends.sql").read_text()
sales_trends = pd.read_sql(sales_trends, conn)

# Import fulfillment Analysis query from SQL files in dir
fulfillment = Path("sql/03_fulfillment_analysis.sql").read_text()
fulfillment = pd.read_sql(fulfillment, conn)

# Import Late Orders query from SQL files in dir
order_timing = Path("sql/04_late_orders.sql").read_text()
order_timing = pd.read_sql(order_timing, conn)

conn.close()
Code
WITH MonthlySales AS (
    SELECT
        DATEFROMPARTS(YEAR(o.OrderDate), MONTH(o.OrderDate), 1) AS SaleMonth,
        SUM(od.UnitPrice * od.Quantity * (1 - od.Discount))     AS MonthlyRevenue
    FROM Orders o
    JOIN [Order Details] od ON o.OrderID = od.OrderID
    WHERE o.OrderDate IS NOT NULL
    GROUP BY DATEFROMPARTS(YEAR(o.OrderDate), MONTH(o.OrderDate), 1)
)
SELECT
    SaleMonth,
    ROUND(MonthlyRevenue, 2)                                          AS MonthlyRevenue,
    ROUND(LAG(MonthlyRevenue, 1) OVER (ORDER BY SaleMonth), 2)       AS PriorMonthRevenue,
    ROUND(
        (MonthlyRevenue - LAG(MonthlyRevenue, 1) OVER (ORDER BY SaleMonth))
        / NULLIF(LAG(MonthlyRevenue, 1) OVER (ORDER BY SaleMonth), 0) * 100
    , 1)                                                              AS MoMGrowthPct,
    ROUND(
        AVG(MonthlyRevenue) OVER (
            ORDER BY SaleMonth
            ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
        )
    , 2)                                                              AS Rolling3MonthAvg
FROM MonthlySales
ORDER BY SaleMonth;

3.6 Visualizations

The chart below ranks the top 15 suppliers by total net revenue. Bars are colored by product category, revealing which categories dominate the top revenue positions and where supplier concentration is highest.

Code
fig, ax = plt.subplots(figsize=(12, 7))

# Take top 15 by revenue rank
top15 = supplier_ranking.head(15).sort_values("total_order_revenue", ascending=True)

# Color bars by category
categories = top15["CategoryName"].unique()
color_map = {
    "Beverages":      "#2196F3",
    "Dairy Products": "#4CAF50",
    "Meat/Poultry":   "#F44336",
    "Grains/Cereals": "#FF9800",
    "Confections":    "#9C27B0",
    "Produce":        "#00BCD4",
    "Condiments":     "#795548",
    "Seafood":        "#607D8B",
}
colors = top15["CategoryName"].map(color_map)

bars = ax.barh(
    top15["SupplierName"],
    top15["total_order_revenue"],
    color=colors,
    edgecolor="white",
    linewidth=0.5,
    height=0.7
)

# Revenue labels on bars
for bar in bars:
    width = bar.get_width()
    ax.text(
        width + 500,
        bar.get_y() + bar.get_height() / 2,
        f"${width:,.0f}",
        va="center",
        ha="left",
        fontsize=9,
        color="#333333"
    )

# Formatting
ax.set_xlabel("Total Order Revenue (USD)", fontsize=11, labelpad=10)
ax.set_title("Top 15 Suppliers by Revenue — Northwind Database", fontsize=14, fontweight="bold", pad=15)
ax.xaxis.set_major_formatter(mticker.FuncFormatter(lambda x, _: f"${x:,.0f}"))
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.tick_params(axis="y", labelsize=9)

# Legend for categories
from matplotlib.patches import Patch
legend_elements = [Patch(facecolor=color_map[cat], label=cat) for cat in color_map if cat in top15["CategoryName"].values]
ax.legend(handles=legend_elements, loc="lower right", fontsize=9, framealpha=0.7)

plt.tight_layout()
plt.show()

Aux joyeux ecclésiastiques leads all suppliers by a significant margin with $153,691 in total net revenue, nearly 30% ahead of the second-ranked Gai pâturage at $117,981 both representing Beverages and Dairy Products respectively. A notable drop-off follows at position three, where Plutzer Lebensmittelgroßmärkte AG (Meat/Poultry) generates $80,369, roughly half the top supplier’s revenue. The remaining twelve suppliers cluster tightly between $26,591 and $50,255, suggesting the revenue tail is relatively flat after the top three. Category-wise, Beverages and Dairy Products dominate the upper ranks, while Confections (purple) appears three times in the mid-tier, and Seafood and Condiments hold the bottom two positions with New England Seafood Cannery at $26,591 and New Orleans Cajun Delights at $31,168.

Code
# ── Chart 2: Revenue by Category (aggregated) ───────────────────────────────
fig, ax = plt.subplots(figsize=(10, 6))

category_totals = (
    supplier_ranking
    .groupby("CategoryName")["total_order_revenue"]
    .sum()
    .sort_values(ascending=True)
)

colors_cat = [color_map.get(cat, "#999999") for cat in category_totals.index]

bars = ax.barh(
    category_totals.index,
    category_totals.values,
    color=colors_cat,
    edgecolor="white",
    linewidth=0.5,
    height=0.6
)

for bar in bars:
    width = bar.get_width()
    ax.text(
        width + 500,
        bar.get_y() + bar.get_height() / 2,
        f"${width:,.0f}",
        va="center",
        ha="left",
        fontsize=9,
        color="#333333"
    )

ax.set_xlabel("Total Revenue (USD)", fontsize=11, labelpad=10)
ax.set_title("Total Revenue by Product Category — Northwind Database", fontsize=14, fontweight="bold", pad=15)
ax.xaxis.set_major_formatter(mticker.FuncFormatter(lambda x, _: f"${x:,.0f}"))
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.tick_params(axis="y", labelsize=10)

plt.tight_layout()
plt.savefig("outputs/01_category_revenue.png", dpi=150, bbox_inches="tight")
plt.show()

Beverages is the undisputed top-performing category with $267,868 in total revenue, followed closely by Dairy Products at $234,507 together these two categories account for over 38% of all revenue across the eight segments. Confections ($167,357) and Meat/Poultry ($163,022) form a tight second tier, separated by less than $5,000. Seafood ($131,262) and Condiments ($106,047) occupy the middle ground, while Produce ($99,985) and Grains/Cereals ($95,745) sit at the bottom, both falling just below the $100,000 mark. The spread between the highest and lowest categories is nearly 2.8×, indicating meaningful structural concentration at the category level.

Code
# ── Chart 3: Overall vs Category Rank — Quadrant Analysis ───────────────────
fig, ax = plt.subplots(figsize=(12, 8))

# Calculate medians for quadrant lines
median_overall = supplier_ranking["revenue_rank"].median()
median_category = supplier_ranking["Category_rank"].median()

# Plot points colored by category
for cat, group in supplier_ranking.groupby("CategoryName"):
    ax.scatter(
        group["revenue_rank"],
        group["Category_rank"],
        label=cat,
        color=color_map.get(cat, "#999999"),
        s=100,
        alpha=0.85,
        edgecolors="white",
        linewidth=0.5,
        zorder=3
    )

# Draw quadrant lines
ax.axvline(x=median_overall, color="#cccccc", linestyle="--", linewidth=1, alpha=0.7, zorder=1)
ax.axhline(y=median_category, color="#cccccc", linestyle="--", linewidth=1, alpha=0.7, zorder=1)

# Quadrant labels
ax.text(median_overall * 0.3, median_category * 0.3 - 0.3,
        "★ Overall & Category\nLeaders",
        fontsize=9, color="#2ecc71", fontweight="bold", alpha=0.8)
ax.text(median_overall * 1.3, median_category * 0.3 - 0.3,
        "◆ Niche Category\nSpecialists",
        fontsize=9, color="#3498db", fontweight="bold", alpha=0.8)
ax.text(median_overall * 0.3, median_category * 1.4,
        "▲ Overall Leaders\nMid Category",
        fontsize=9, color="#e67e22", fontweight="bold", alpha=0.8)
ax.text(median_overall * 1.3, median_category * 1.4,
        "▼ Underperformers",
        fontsize=9, color="#e74c3c", fontweight="bold", alpha=0.8)

# Annotate top 5 overall
for _, row in supplier_ranking[supplier_ranking["revenue_rank"] <= 5].iterrows():
    ax.annotate(
        row["SupplierName"].split()[0],
        (row["revenue_rank"], row["Category_rank"]),
        textcoords="offset points",
        xytext=(6, 4),
        fontsize=8,
        color="#333333"
    )

ax.set_xlabel("Overall Revenue Rank", fontsize=11, labelpad=10)
ax.set_ylabel("Category Revenue Rank", fontsize=11, labelpad=10)
ax.set_title("Overall vs Category Rank by Supplier — Northwind Database",
             fontsize=14, fontweight="bold", pad=15)
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.legend(fontsize=9, framealpha=0.7, loc="upper left")

plt.tight_layout()
plt.savefig("outputs/01_rank_quadrant.png", dpi=150, bbox_inches="tight")
plt.show()

A scatter plot positioning every supplier at the intersection of their overall revenue rank (x-axis) and their within-category rank (y-axis). Median lines divide the chart into four quadrants. The top-5 overall suppliers are individually annotated.

Quadrant Label Interpretation
Low overall rank, low category rank ★ Overall & Category Leaders Dominant performers across both dimensions highest strategic value
High overall rank, low category rank ◆ Niche Category Specialists Lead their category but have lower total volume valuable within segment
Low overall rank, high category rank ▲ Overall Leaders, Mid Category High total revenue but face strong competition within their category
High overall rank, high category rank ▼ Underperformers Lag on both dimensions candidates for review

The quadrant plot reveals a clear cluster of top performers in the bottom-left corner, where Aux joyeux ecclésiastiques, Gai pâturage, Plutzer, and Pasta Buttini all hold both low overall ranks (1–5) and category rank 1 confirming them as true Overall & Category Leaders. Formaggi Fortini sits just outside this elite group at overall rank ~8 and category rank 2, representing a strong Dairy Products contender. The upper-left region (Overall Leaders, Mid Category) is sparsely populated, meaning few suppliers achieve high global revenue while ranking only mid-tier within their category. The right half of the chart is densely populated with Underperformers and Niche Specialists, with several Condiments and Seafood suppliers clustering between overall ranks 30–50 and category ranks 5–8 high on both axes and therefore the lowest strategic priority. One notable Niche Specialist appears around overall rank 38–40 with category rank 1, indicating a supplier who dominates their product segment despite a modest global revenue position.

5 Section 3: Fulfillment & Shipper Analysis

Timely fulfillment is a key driver of customer satisfaction in wholesale distribution. This section evaluates the three active shippers, Federal Shipping, Speedy Express, and United Package, across on-time rate, average ship time, freight spend, and severity of delays.

5.1 Data Model

Table Role
Shippers ShipperID, CompanyName, carrier identity
Orders OrderDate, ShippedDate, RequiredDate, Freight, ShipVia, fulfillment timing and cost

5.2 Query Design

A single CTE (ShipperDetails) joins Shippers to Orders and computes all metrics in a single pass using conditional aggregation:

  • AvgDaysToShip average of DATEDIFF(DAY, OrderDate, ShippedDate) per shipper
  • OnTimeOrders / OnTimePct count and share of orders where ShippedDate ≤ RequiredDate
  • LatePct share of orders where ShippedDate > RequiredDate
  • AvgFreightExp and TotalSpentPerShipper freight cost analysis ### Shipper Performance Summary
Shipper Avg Days to Ship On-Time Performance Late Orders Total Freight
Speedy Express 7.5 days Fastest 15 Lowest
Federal Shipping 8.6 days Best overall consistency 15 Mid
United Package 9.2 days Lowest on-time rate 27 Highest

5.3 Visualizations & Findings

Code
-- USE Northwind;
-- GO

WITH ShipperDetails AS (
    SELECT 
        s.ShipperID, 
        s.CompanyName,
        COUNT(DISTINCT o.OrderID) AS OrderCount,
        ROUND(AVG(CAST(DATEDIFF(DAY, o.OrderDate, o.ShippedDate) AS FLOAT)),1) AS AvgDaysToShip,
        SUM(CASE WHEN o.ShippedDate <= RequiredDate THEN 1 ELSE 0 END) AS OnTimeOrders,
        ROUND(100 * SUM(CASE WHEN o.ShippedDate <= o.RequiredDate THEN 1 ELSE 0 END) /NULLIF(COUNT(o.OrderID),0),2) AS OnTimePct,
        ROUND(100 * SUM(CASE WHEN o.ShippedDate > RequiredDate THEN 1 ELSE 0 END) /NULLIF(COUNT(o.OrderID),0),2) AS LatePct,
        AVG(o.Freight) AS AvgFreightExp,
        SUM(o.Freight) AS TotalSpentPerShipper
    FROM Shippers AS s
    JOIN Orders AS o
        ON s.ShipperID = o.ShipVia
    GROUP BY s.ShipperID, s.CompanyName
)
SELECT *
FROM ShipperDetails
ORDER BY AvgDaysToShip
Code
fig, ax = plt.subplots(figsize=(10, 6))

companies = fulfillment['CompanyName']
on_time = fulfillment['OnTimeOrders']
late_orders = fulfillment['OrderCount'] - fulfillment['OnTimeOrders']

x = np.arange(len(companies))
width = 0.35

bars_on = ax.bar(x - width/2, on_time, width,
                 label='On Time', color='#27AE60')
bars_late = ax.bar(x + width/2, late_orders, width,
                   label='Late', color='#C0392B')

# Value labels on bars
for bar in bars_on:
    ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 2,
            str(int(bar.get_height())),
            ha='center', va='bottom', fontsize=9, color='#27AE60')

for bar in bars_late:
    ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 2,
            str(int(bar.get_height())),
            ha='center', va='bottom', fontsize=9, color='#C0392B')

ax.grid(axis='y', linestyle=':', linewidth=0.5, alpha=0.3, color='gray')
ax.set_xticks(x)
ax.set_xticklabels(companies)
ax.set_xlabel("Shipper", fontsize=11, labelpad=10)
ax.set_ylabel("Order Count", fontsize=11, labelpad=10)
ax.set_title("On-Time vs Late Orders by Shipper — Northwind Database",
             fontsize=14, fontweight="bold", pad=15)
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.legend(fontsize=9, framealpha=0.7)

plt.tight_layout()
plt.savefig("outputs/03_ontime_vs_late.png", dpi=150, bbox_inches="tight")
plt.show()

United Package carries the highest total order volume at 326 orders (299 on-time, 27 late), making it the dominant carrier by throughput. Federal Shipping handled 255 orders with only 15 late an on-time rate of approximately 94% the strongest reliability performance of the three. Speedy Express processed 249 orders with 16 late, yielding a very similar on-time rate of ~93.6%, essentially tied with Federal Shipping on reliability despite handling slightly fewer orders. The gap in late order counts between Federal Shipping/Speedy Express (15–16) and United Package (27) is notable: United Package generates nearly twice the late deliveries of either peer, suggesting that its higher volume may be straining fulfillment capacity and depressing on-time performance.

Code
fig, ax = plt.subplots(figsize=(10, 6))

# Color bars by company
companies = fulfillment['CompanyName'].unique()
color_map = {
    'Federal Shipping': "#1F4E79",
    'Speedy Express':   "#2E74B5",
    'United Package':   "#BDD7EE"
}

fulfillment_sorted = fulfillment.sort_values('TotalSpentPerShipper', ascending=True)

companies = fulfillment_sorted['CompanyName']
spending = fulfillment_sorted['TotalSpentPerShipper']
colors = fulfillment_sorted['CompanyName'].map(color_map)

bars = ax.bar(
    companies,
    spending,
    color=colors,
    width=0.5)

# Value labels
for bar in bars:
    ax.text(bar.get_x() + bar.get_width()/2,
            bar.get_height() + 200,
            f'${bar.get_height():,.0f}',
            ha='center', va='bottom',
            fontsize=9, fontweight='bold'
            )

ax.grid(axis='y', linestyle=':', linewidth=0.5, alpha=0.3, color='gray')
ax.set_xlabel("Shipper", fontsize=11, labelpad=10)
ax.set_ylabel("Total Freight Spend", fontsize=11, labelpad=10)
ax.set_title("Total Freight Spend By Shipper - Northwind Database", fontsize=14, fontweight='bold', pad=15)
ax.yaxis.set_major_formatter(mticker.FuncFormatter(lambda x, p: f'${x:,.0f}'))
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

plt.tight_layout()
plt.savefig("outputs/03_freight_by_shipper.png", dpi=150, bbox_inches="tight")
plt.show()

Freight spend scales closely with order volume across all three carriers. United Package, as the highest-volume shipper, also incurs the largest freight cost at $28,245 — 74% more than Speedy Express ($16,185) and 38% more than Federal Shipping ($20,513). Federal Shipping sits in the middle both in volume and cost, while Speedy Express is the most economical option despite its near-identical on-time rate to Federal Shipping. From a cost-efficiency perspective, Speedy Express delivers comparable reliability at the lowest freight cost, whereas United Package’s combination of the highest spend and the most late orders makes it the least favorable carrier on a cost-per-reliable-delivery basis.

Code
fig, ax = plt.subplots(figsize=(10,6))

# Color bars by company
companies = fulfillment['CompanyName'].unique()
color_map = {
    'Federal Shipping': "#1F4E79",
    'Speedy Express':   "#2E74B5",
    'United Package':   "#BDD7EE"
}

fulfillment_sorted = fulfillment.sort_values('AvgDaysToShip', ascending=True)

companies = fulfillment_sorted['CompanyName']
shipping = fulfillment_sorted['AvgDaysToShip']
colors = fulfillment_sorted['CompanyName'].map(color_map)


bars = ax.bar(
    companies,
    shipping,
    color=colors,
    width=0.5)

# Value labels
for bar in bars:
    ax.text(bar.get_x() + bar.get_width()/2,
            bar.get_height() + 0.1,
            f'{bar.get_height():,.1f} days',
            ha='center', va='bottom',
            fontsize=9, fontweight='bold'
            )

ax.grid(axis='y', linestyle=':', linewidth=0.5, alpha=0.3, color='gray')
ax.set_xlabel("Shipper", fontsize=11, labelpad=10)
ax.set_ylabel("Average Days to Ship", fontsize=11, labelpad=10)
ax.set_title("Average Days to Ship By Shipper - Northwind Database", fontsize=14, fontweight='bold', pad=15)
ax.yaxis.set_major_formatter(mticker.FuncFormatter(lambda x, p: f'{x:,.0f}'))
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

plt.tight_layout()
plt.savefig("outputs/03_avg_time_to_ship.png", dpi=150, bbox_inches="tight")
plt.show()

Speedy Express is the fastest shipper at 7.5 days average that is 1.7 days faster than United Package at 9.2 days, an 18% difference. Federal Shipping falls between at 8.6 days. While the gap between shippers is relatively narrow, for time-sensitive orders the choice of carrier has meaningful impact on customer experience. Notably, United Package handles the highest order volume (326 orders) despite having both the slowest average ship time and the lowest on-time rate, suggesting capacity constraints may be affecting performance.

6 Section 4: Late Order Severity Analysis

While Section 3 measures fulfillment performance across all orders, this section isolates only the orders that missed their required delivery date, where ShippedDate > RequiredDate, and measures how severe those delays were per carrier. Frequency and severity are distinct failure modes: a shipper can have few late orders but extreme delays, or many late orders with quick recovery. Both dimensions matter for evaluating carrier risk.

6.1 Data Model

Table Role
Shippers ShipperID, CompanyName, carrier identity
Orders RequiredDate, ShippedDate, ShipVia, deadline and actual delivery

6.2 Query Design Late Orders Isolation

A single CTE (ShippingDetails) filters to late orders only using WHERE o.ShippedDate > o.RequiredDate, then computes DATEDIFF(DAY, o.RequiredDate, o.ShippedDate) as DaysLate per order. The outer query returns the full detail sorted by worst delay, allowing both per-order inspection and downstream aggregation by shipper.

WITH ShippingDetails AS (
    SELECT 
        s.ShipperID, 
        s.CompanyName,
        o.OrderID,
        o.OrderDate,
        o.RequiredDate,
        o.ShippedDate,
        DATEDIFF(DAY, o.RequiredDate, o.ShippedDate) AS DaysLate
    FROM Shippers AS s
    JOIN Orders AS o
        ON s.ShipperID = o.ShipVia
    WHERE o.ShippedDate > o.RequiredDate
)
SELECT *
FROM ShippingDetails
ORDER BY DaysLate DESC;

The WHERE clause is the critical filter on-time orders never enter the CTE, so all downstream aggregation reflects delay severity only. ORDER BY DaysLate DESC surfaces the worst offenders first for immediate visibility.

6.3 Visualizations

Code
WITH ShippingDetails AS (
    SELECT 
        s.ShipperID, 
        s.CompanyName,
        o.OrderID,
        o.OrderDate,
        o.RequiredDate,
        o.ShippedDate,
        DATEDIFF(DAY, o.RequiredDate, o.ShippedDate) AS DaysLate
    FROM Shippers AS s
    JOIN Orders AS o
        ON s.ShipperID = o.ShipVia
    WHERE o.ShippedDate > o.RequiredDate
)
SELECT *
FROM ShippingDetails
ORDER BY DaysLate DESC;
Code
fig, ax = plt.subplots(figsize=(10, 5))

late_by_shipper = order_timing.groupby('CompanyName').agg(
    late_count=('OrderID', 'count'),
    avg_days_late=('DaysLate', 'mean'),
    max_days_late=('DaysLate', 'max')
).reset_index().sort_values('avg_days_late', ascending=False)


late_sorted = late_by_shipper.sort_values('avg_days_late', ascending=True)

color_map = {
    'Federal Shipping': "#1F4E79",
    'Speedy Express':   "#2E74B5",
    'United Package':   "#BDD7EE"
}

colors = late_sorted['CompanyName'].map(color_map)

bars = ax.barh(late_sorted['CompanyName'],
               late_sorted['avg_days_late'],
               color=colors,
               height=0.5)

# Value labels
for bar in bars:
    ax.text(bar.get_width() + 0.1,
            bar.get_y() + bar.get_height()/2,
            f'{bar.get_width():.1f} days',
            va='center', fontsize=9, fontweight='bold')

ax.grid(axis='x', linestyle=':', linewidth=0.5, alpha=0.3, color='gray')
ax.set_xlabel("Average Days Late", fontsize=11, labelpad=10)
ax.set_title("Average Days Late When Orders Miss Deadline — Northwind Database",
             fontsize=14, fontweight='bold', pad=15)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

plt.tight_layout()
plt.savefig("outputs/04_avg_days_late_by_shipper.png", dpi=150, bbox_inches="tight")
plt.show()

While Chart “On Time VS Late Orders by Shippers” identified United Package as having the lowest on-time rate with 27 late orders, the most of any shipper Chart “Average Days Late When Orders Miss Deadline” reveals a more nuanced picture of delay severity. Speedy Express, despite fewer late orders (15), averages 8.1 days past the required date when delays do occur 45% more severe than United Package and Federal Shipping, both averaging 5.6 days late. Federal Shipping performs most consistently across all three metrics: fewest late orders (15), tied for lowest average delay (5.6 days), and the best on-time rate from Chart 8. The data suggests two distinct failure modes, United Package fails frequently but recovers quickly, while Speedy Express fails rarely but severely. For time-critical shipments, Federal Shipping is the lowest-risk carrier across all fulfillment dimensions.

7 Conclusions & Recommendations

7.1 Supplier Revenue

  • Concentrate contract renewal efforts on the top 5–6 suppliers, which collectively account for the majority of net revenue.
  • Use the Overall vs. Category Rank quadrant to identify Niche Specialists, suppliers who lead their segment and may warrant priority even if their global rank appears modest.
  • Monitor concentration risk: heavy reliance on a small number of Beverages and Dairy Products suppliers creates vulnerability if any one relationship is disrupted.

7.3 Fulfillment & Carriers

  • Federal Shipping is the lowest-risk carrier across all fulfillment dimensions: fewest late orders (tied), lowest average delay (5.6 days), and competitive ship speed.
  • For time-critical shipments, avoid Speedy Express despite its speed advantage, when it misses deadlines, delays are 45% more severe than its peers.
  • Investigate United Package’s capacity constraints: its high volume combined with the most late deliveries suggests it may be approaching fulfillment limits.
  • Consider renegotiating freight contracts with United Package, which carries both the highest order volume and the highest total freight spend.

8 Appendix Technical Notes

8.1 Stack & Environment

Component Details
Database Northwind (SQL Server 2022)
Query Language T-SQL with CTEs and window functions
Analysis Python 3 · pandas · numpy
Visualization matplotlib · matplotlib.ticker · matplotlib.dates
Connectivity pyodbc (ODBC Driver 17 for SQL Server)

8.2 Chart Index

# Chart Title Section
1 Top 15 Suppliers by Revenue Section 1
2 Total Revenue by Product Category Section 1
3 Overall vs. Category Rank Quadrant Section 1
4 Monthly Revenue Trend Section 2
5 3-Month Rolling Average vs. Monthly Revenue Section 2
6 Month-over-Month Revenue Growth Section 2
7 On-Time vs. Late Orders by Shipper Section 3
8 Total Freight Spend by Shipper Section 3
9 Average Days to Ship by Shipper Section 3
10 Average Days Late When Orders Miss Deadline Section 4