Vendor & Sales Analytics

Author

Christian Lira Gonzalez

Stack: T-SQL · Python · pandas · matplotlib
Source: Northwind Database (SQL Server 2022)

1 Executive Summary

This analysis addresses six business questions across three operational domains using Northwind transaction and fulfillment data. The goal is to equip procurement, finance, and logistics teams with evidence-based insights to prioritize supplier relationships, monitor revenue momentum, and evaluate carrier reliability.

Which suppliers drive the most revenue?
How does each supplier rank within their product category?
How has monthly revenue trended?
What is the month-over-month growth rate and 3-month rolling average?
Are orders being shipped on time?
Which shippers have the longest delays?

Key findings:

2 Business Context

Northwind is a fictional wholesale food distributor. Understanding supplier revenue contribution helps procurement teams prioritize vendor relationships, negotiate contracts, and identify concentration risk, where revenue depends heavily on a small number of suppliers.

3 Section 1: Supplier Revenue Analysis

3.1 Business Questions

Which suppliers drive the most revenue?
How does each supplier rank within their product category?

3.2 Context

For a wholesale distributor like Northwind, supplier relationships are the foundation of the business. Not all suppliers contribute equally, a small number typically drive the majority of revenue, while others serve niche product categories with lower overall volume.

Understanding supplier revenue concentration helps procurement teams:

Prioritize vendor relationships and contract negotiations
Identify dependency risk where revenue relies on few suppliers
Compare supplier performance within the same product category, a supplier may rank low overall but dominate their category

3.3 Data Model

This analysis joins three tables:

Table	Role
`Suppliers`	Vendor identity: SupplierID, CompanyName
`Products`	Catalog: links suppliers to categories
`Order Details`	Line-item transactions: quantity, unit price, discount
`Categories`	Category names: Beverages, Dairy, Seafood, etc.

3.4 Revenue Formula

Revenue is calculated at the line-item level: net_revenue = unit_price × quantity × (1 - discount)

Discount is captured separately to show both the gross revenue potential and the actual net revenue per supplier after markdowns.

3.5 Query Design

The query uses a 3-CTE pattern each step has a single responsibility, making the logic easier to read, debug, and extend.

CTE	Responsibility
`SupplierProducts`	Resolves supplier → product → category relationships
`SupplierOrders`	Flattens to order line items with pricing
`SupplierTotals`	Aggregates revenue, quantity, and discount per supplier

Two window functions are applied on the aggregated results:

RANK() OVER (ORDER BY total_order_revenue DESC) overall rank across all suppliers
RANK() OVER (PARTITION BY CategoryName ORDER BY total_order_revenue DESC) rank within each product category

The PARTITION BY CategoryName is the key analytical addition it resets the rank for each category independently, allowing direct comparison of suppliers competing in the same product segment.

Code

#Load libraries
import os
from pathlib import Path
import pyodbc
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
import matplotlib.dates as mdates
from matplotlib.patches import Patch

Code

# Establish All Database Connections
conn = pyodbc.connect(
    "DRIVER={ODBC Driver 17 for SQL Server};"
    "SERVER=localhost;"
    "DATABASE=Northwind;"
    "Trusted_Connection=yes;"
)

# Import supplier_ranking query from SQL files in dir
supplier_ranking_query = Path("sql/01_supplier_ranking.sql").read_text()
supplier_ranking = pd.read_sql(supplier_ranking_query, conn)

# Import sales_trends query from SQL files in dir
sales_trends = Path("sql/02_sales_trends.sql").read_text()
sales_trends = pd.read_sql(sales_trends, conn)

# Import fulfillment Analysis query from SQL files in dir
fulfillment = Path("sql/03_fulfillment_analysis.sql").read_text()
fulfillment = pd.read_sql(fulfillment, conn)

# Import Late Orders query from SQL files in dir
order_timing = Path("sql/04_late_orders.sql").read_text()
order_timing = pd.read_sql(order_timing, conn)

conn.close()

Code

WITH MonthlySales AS (
    SELECT
        DATEFROMPARTS(YEAR(o.OrderDate), MONTH(o.OrderDate), 1) AS SaleMonth,
        SUM(od.UnitPrice * od.Quantity * (1 - od.Discount))     AS MonthlyRevenue
    FROM Orders o
    JOIN [Order Details] od ON o.OrderID = od.OrderID
    WHERE o.OrderDate IS NOT NULL
    GROUP BY DATEFROMPARTS(YEAR(o.OrderDate), MONTH(o.OrderDate), 1)
)
SELECT
    SaleMonth,
    ROUND(MonthlyRevenue, 2)                                          AS MonthlyRevenue,
    ROUND(LAG(MonthlyRevenue, 1) OVER (ORDER BY SaleMonth), 2)       AS PriorMonthRevenue,
    ROUND(
        (MonthlyRevenue - LAG(MonthlyRevenue, 1) OVER (ORDER BY SaleMonth))
        / NULLIF(LAG(MonthlyRevenue, 1) OVER (ORDER BY SaleMonth), 0) * 100
    , 1)                                                              AS MoMGrowthPct,
    ROUND(
        AVG(MonthlyRevenue) OVER (
            ORDER BY SaleMonth
            ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
        )
    , 2)                                                              AS Rolling3MonthAvg
FROM MonthlySales
ORDER BY SaleMonth;

3.6 Visualizations

The chart below ranks the top 15 suppliers by total net revenue. Bars are colored by product category, revealing which categories dominate the top revenue positions and where supplier concentration is highest.

Code

fig, ax = plt.subplots(figsize=(12, 7))

# Take top 15 by revenue rank
top15 = supplier_ranking.head(15).sort_values("total_order_revenue", ascending=True)

# Color bars by category
categories = top15["CategoryName"].unique()
color_map = {
    "Beverages":      "#2196F3",
    "Dairy Products": "#4CAF50",
    "Meat/Poultry":   "#F44336",
    "Grains/Cereals": "#FF9800",
    "Confections":    "#9C27B0",
    "Produce":        "#00BCD4",
    "Condiments":     "#795548",
    "Seafood":        "#607D8B",
}
colors = top15["CategoryName"].map(color_map)

bars = ax.barh(
    top15["SupplierName"],
    top15["total_order_revenue"],
    color=colors,
    edgecolor="white",
    linewidth=0.5,
    height=0.7
)

# Revenue labels on bars
for bar in bars:
    width = bar.get_width()
    ax.text(
        width + 500,
        bar.get_y() + bar.get_height() / 2,
        f"${width:,.0f}",
        va="center",
        ha="left",
        fontsize=9,
        color="#333333"
    )

# Formatting
ax.set_xlabel("Total Order Revenue (USD)", fontsize=11, labelpad=10)
ax.set_title("Top 15 Suppliers by Revenue — Northwind Database", fontsize=14, fontweight="bold", pad=15)
ax.xaxis.set_major_formatter(mticker.FuncFormatter(lambda x, _: f"${x:,.0f}"))
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.tick_params(axis="y", labelsize=9)

# Legend for categories
from matplotlib.patches import Patch
legend_elements = [Patch(facecolor=color_map[cat], label=cat) for cat in color_map if cat in top15["CategoryName"].values]
ax.legend(handles=legend_elements, loc="lower right", fontsize=9, framealpha=0.7)

plt.tight_layout()
plt.show()

Aux joyeux ecclésiastiques leads all suppliers by a significant margin with $153,691 in total net revenue, nearly 30% ahead of the second-ranked Gai pâturage at $117,981 both representing Beverages and Dairy Products respectively. A notable drop-off follows at position three, where Plutzer Lebensmittelgroßmärkte AG (Meat/Poultry) generates $80,369, roughly half the top supplier’s revenue. The remaining twelve suppliers cluster tightly between $26,591 and $50,255, suggesting the revenue tail is relatively flat after the top three. Category-wise, Beverages and Dairy Products dominate the upper ranks, while Confections (purple) appears three times in the mid-tier, and Seafood and Condiments hold the bottom two positions with New England Seafood Cannery at $26,591 and New Orleans Cajun Delights at $31,168.

Code

# ── Chart 2: Revenue by Category (aggregated) ───────────────────────────────
fig, ax = plt.subplots(figsize=(10, 6))

category_totals = (
    supplier_ranking
    .groupby("CategoryName")["total_order_revenue"]
    .sum()
    .sort_values(ascending=True)
)

colors_cat = [color_map.get(cat, "#999999") for cat in category_totals.index]

bars = ax.barh(
    category_totals.index,
    category_totals.values,
    color=colors_cat,
    edgecolor="white",
    linewidth=0.5,
    height=0.6
)

for bar in bars:
    width = bar.get_width()
    ax.text(
        width + 500,
        bar.get_y() + bar.get_height() / 2,
        f"${width:,.0f}",
        va="center",
        ha="left",
        fontsize=9,
        color="#333333"
    )

ax.set_xlabel("Total Revenue (USD)", fontsize=11, labelpad=10)
ax.set_title("Total Revenue by Product Category — Northwind Database", fontsize=14, fontweight="bold", pad=15)
ax.xaxis.set_major_formatter(mticker.FuncFormatter(lambda x, _: f"${x:,.0f}"))
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.tick_params(axis="y", labelsize=10)

plt.tight_layout()
plt.savefig("outputs/01_category_revenue.png", dpi=150, bbox_inches="tight")
plt.show()

Beverages is the undisputed top-performing category with $267,868 in total revenue, followed closely by Dairy Products at $234,507 together these two categories account for over 38% of all revenue across the eight segments. Confections ($167,357) and Meat/Poultry ($163,022) form a tight second tier, separated by less than $5,000. Seafood ($131,262) and Condiments ($106,047) occupy the middle ground, while Produce ($99,985) and Grains/Cereals ($95,745) sit at the bottom, both falling just below the $100,000 mark. The spread between the highest and lowest categories is nearly 2.8×, indicating meaningful structural concentration at the category level.

Code

# ── Chart 3: Overall vs Category Rank — Quadrant Analysis ───────────────────
fig, ax = plt.subplots(figsize=(12, 8))

# Calculate medians for quadrant lines
median_overall = supplier_ranking["revenue_rank"].median()
median_category = supplier_ranking["Category_rank"].median()

# Plot points colored by category
for cat, group in supplier_ranking.groupby("CategoryName"):
    ax.scatter(
        group["revenue_rank"],
        group["Category_rank"],
        label=cat,
        color=color_map.get(cat, "#999999"),
        s=100,
        alpha=0.85,
        edgecolors="white",
        linewidth=0.5,
        zorder=3
    )

# Draw quadrant lines
ax.axvline(x=median_overall, color="#cccccc", linestyle="--", linewidth=1, alpha=0.7, zorder=1)
ax.axhline(y=median_category, color="#cccccc", linestyle="--", linewidth=1, alpha=0.7, zorder=1)

# Quadrant labels
ax.text(median_overall * 0.3, median_category * 0.3 - 0.3,
        "★ Overall & Category\nLeaders",
        fontsize=9, color="#2ecc71", fontweight="bold", alpha=0.8)
ax.text(median_overall * 1.3, median_category * 0.3 - 0.3,
        "◆ Niche Category\nSpecialists",
        fontsize=9, color="#3498db", fontweight="bold", alpha=0.8)
ax.text(median_overall * 0.3, median_category * 1.4,
        "▲ Overall Leaders\nMid Category",
        fontsize=9, color="#e67e22", fontweight="bold", alpha=0.8)
ax.text(median_overall * 1.3, median_category * 1.4,
        "▼ Underperformers",
        fontsize=9, color="#e74c3c", fontweight="bold", alpha=0.8)

# Annotate top 5 overall
for _, row in supplier_ranking[supplier_ranking["revenue_rank"] <= 5].iterrows():
    ax.annotate(
        row["SupplierName"].split()[0],
        (row["revenue_rank"], row["Category_rank"]),
        textcoords="offset points",
        xytext=(6, 4),
        fontsize=8,
        color="#333333"
    )

ax.set_xlabel("Overall Revenue Rank", fontsize=11, labelpad=10)
ax.set_ylabel("Category Revenue Rank", fontsize=11, labelpad=10)
ax.set_title("Overall vs Category Rank by Supplier — Northwind Database",
             fontsize=14, fontweight="bold", pad=15)
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.legend(fontsize=9, framealpha=0.7, loc="upper left")

plt.tight_layout()
plt.savefig("outputs/01_rank_quadrant.png", dpi=150, bbox_inches="tight")
plt.show()

A scatter plot positioning every supplier at the intersection of their overall revenue rank (x-axis) and their within-category rank (y-axis). Median lines divide the chart into four quadrants. The top-5 overall suppliers are individually annotated.

Quadrant	Label	Interpretation
Low overall rank, low category rank	★ Overall & Category Leaders	Dominant performers across both dimensions highest strategic value
High overall rank, low category rank	◆ Niche Category Specialists	Lead their category but have lower total volume valuable within segment
Low overall rank, high category rank	▲ Overall Leaders, Mid Category	High total revenue but face strong competition within their category
High overall rank, high category rank	▼ Underperformers	Lag on both dimensions candidates for review

The quadrant plot reveals a clear cluster of top performers in the bottom-left corner, where Aux joyeux ecclésiastiques, Gai pâturage, Plutzer, and Pasta Buttini all hold both low overall ranks (1–5) and category rank 1 confirming them as true Overall & Category Leaders. Formaggi Fortini sits just outside this elite group at overall rank ~8 and category rank 2, representing a strong Dairy Products contender. The upper-left region (Overall Leaders, Mid Category) is sparsely populated, meaning few suppliers achieve high global revenue while ranking only mid-tier within their category. The right half of the chart is densely populated with Underperformers and Niche Specialists, with several Condiments and Seafood suppliers clustering between overall ranks 30–50 and category ranks 5–8 high on both axes and therefore the lowest strategic priority. One notable Niche Specialist appears around overall rank 38–40 with category rank 1, indicating a supplier who dominates their product segment despite a modest global revenue position.

4 Section 2: Sales Trends Analysis

This section examines how revenue has evolved over the 23-month dataset (July 1996 – April 1998), identifies seasonal patterns, and measures the velocity and consistency of growth using month-over-month rates and a 3-month rolling average.

4.1 Data Model

Table	Role
`Orders`	OrderDate anchors each transaction to a calendar month
`Order Details`	UnitPrice, Quantity, Discount source of net revenue

4.2 Query Design Window Functions

A single CTE (MonthlySales) aggregates net revenue per calendar month. Three window functions are then applied in the outer SELECT:

LAG(MonthlyRevenue, 1) retrieves the prior month’s revenue for comparison
Month-over-Month Growth % computed as (current − prior) / prior × 100, with NULLIF guarding against division by zero
AVG() OVER (ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) produces the 3-month rolling average, smoothing short-term volatility

Code

-- USE Northwind;
-- GO

# -- Business question: How has monthly revenue trended?
# -- What is the month-over-month growth rate and 3-month rolling average?


WITH MonthlySales AS (
    SELECT
        DATEFROMPARTS(YEAR(o.OrderDate), MONTH(o.OrderDate), 1) AS SaleMonth,
        SUM(od.UnitPrice * od.Quantity * (1 - od.Discount))     AS MonthlyRevenue
    FROM Orders o
    JOIN [Order Details] od ON o.OrderID = od.OrderID
    WHERE o.OrderDate IS NOT NULL
    GROUP BY DATEFROMPARTS(YEAR(o.OrderDate), MONTH(o.OrderDate), 1)
)
SELECT
    SaleMonth,
    ROUND(MonthlyRevenue, 2)                                          AS MonthlyRevenue,
    ROUND(LAG(MonthlyRevenue, 1) OVER (ORDER BY SaleMonth), 2)       AS PriorMonthRevenue,
    ROUND(
        (MonthlyRevenue - LAG(MonthlyRevenue, 1) OVER (ORDER BY SaleMonth))
        / NULLIF(LAG(MonthlyRevenue, 1) OVER (ORDER BY SaleMonth), 0) * 100
    , 1)                                                              AS MoMGrowthPct,
    ROUND(
        AVG(MonthlyRevenue) OVER (
            ORDER BY SaleMonth
            ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
        )
    , 2)                                                              AS Rolling3MonthAvg
FROM MonthlySales
ORDER BY SaleMonth;

Code


sales_trends["SaleMonth"] = pd.to_datetime(sales_trends['SaleMonth'])
sales_trends =sales_trends.sort_values('SaleMonth')

fig, ax = plt.subplots(figsize=(14, 6))

median_overall = sales_trends["MonthlyRevenue"].median()
months = sales_trends["SaleMonth"]
revenue = sales_trends['MonthlyRevenue']

# Line + markers
ax.plot(months, revenue, 
        color='#2E74B5', 
        marker='o', 
        markersize=6, 
        linewidth=2.5, 
        label='Monthly Revenue',
        zorder=3)

# Median line
ax.axhline(y=median_overall, 
           color="#27AE60", 
           linestyle="--", 
           linewidth=1.5, 
           alpha=0.8, 
           zorder=1, 
           label=f'Median ${median_overall:,.0f}')

# Subtle vertical grid — helps trace each point to its month
ax.grid(axis='x', linestyle=':', linewidth=0.5, alpha=0.5, color='gray')
ax.grid(axis='y', linestyle=':', linewidth=0.5, alpha=0.3, color='gray')

# Every month on X axis
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b\n%Y'))
ax.xaxis.set_major_locator(mdates.MonthLocator(interval=1))

# Y axis currency
ax.yaxis.set_major_formatter(mticker.FuncFormatter(
    lambda x, p: f'${x:,.0f}'
))

# Peak annotation
peak_idx = sales_trends['MonthlyRevenue'].idxmax()
peak_row = sales_trends.loc[peak_idx]
ax.annotate(f"Peak\n${peak_row['MonthlyRevenue']:,.0f}",
            xy=(peak_row['SaleMonth'], peak_row['MonthlyRevenue']),
            xytext=(-40, 15),
            textcoords='offset points',
            fontsize=8,
            fontweight='bold',
            color='#2E74B5',
            arrowprops=dict(arrowstyle='->', color='#2E74B5', lw=1.2))

# Low annotation
low_idx = sales_trends['MonthlyRevenue'].idxmin()
low_row = sales_trends.loc[low_idx]
ax.annotate(f"Low\n${low_row['MonthlyRevenue']:,.0f}",
            xy=(low_row['SaleMonth'], low_row['MonthlyRevenue']),
            xytext=(10, -30),
            textcoords='offset points',
            fontsize=8,
            fontweight='bold',
            color='#C0392B',
            arrowprops=dict(arrowstyle='->', color='#C0392B', lw=1.2))

ax.set_xlabel("Month", fontsize=11, labelpad=10)
ax.set_ylabel("Net Revenue", fontsize=11, labelpad=10)
ax.set_title("Monthly Revenue Trend — Northwind Database",
             fontsize=14, fontweight="bold", pad=15)
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.legend(fontsize=9, framealpha=0.7, loc="upper left")

ax.text(months.iloc[-1], median_overall + 1500, 
        f'Median ${median_overall:,.0f}', 
        fontsize=8, color='#27AE60', va='bottom')
ax.fill_between(months, revenue, alpha=0.08, color='#2E74B5')

plt.tight_layout()
plt.savefig("outputs/02_Monthly_Revenue.png", dpi=150, bbox_inches="tight")
plt.show()

Revenue opened at $27,861 in July 1996 and followed a broadly upward trajectory over the 23-month period, reaching its all-time peak of $123,799 in April 1998 a 4.44× increase from the starting point. The path was not linear: an early spike to $61,000 in January 1997 was immediately reversed to roughly $39,000 in February, and a mid-series trough of ~$36,000 in June 1997 briefly pulled the line back below the median. The $47,288 median was first crossed in January 1997 and then again consistently from December 1997 onward, with the final five months of the dataset all sitting well above it. The sharp collapse to the series low of $18,334 in May 1998 stands out as a clear anomaly falling 85% from the April peak in a single month, and is most likely a data cutoff or partial-month recording artifact rather than a genuine business event.

Code

fig, ax = plt.subplots(figsize=(14, 6))

months = sales_trends['SaleMonth']
three_month_average = sales_trends['Rolling3MonthAvg']
revenue = sales_trends['MonthlyRevenue']
median_overall = sales_trends["Rolling3MonthAvg"].median()


# Monthly Revenue — thinner, more transparent
ax.plot(months, revenue,
        color='#2E74B5',
        marker='o',
        markersize=4,
        linewidth=1.5,
        alpha=0.6,
        label='Monthly Revenue',
        zorder=2)

# Rolling Average — thicker, more prominent
ax.plot(months, three_month_average,
        color='#3BDDB8',
        marker='o',
        markersize=6,
        linewidth=2.5,
        label='3-Month Rolling Average',
        zorder=3)

# Median line
ax.axhline(y=median_overall, 
           color="#00712F", 
           linestyle="--", 
           linewidth=1.5, 
           alpha=0.8, 
           zorder=1, 
           label=f'Median ${median_overall:,.0f}')

# Subtle vertical grid — helps trace each point to its month
ax.grid(axis='x', linestyle=':', linewidth=0.5, alpha=0.5, color='gray')
ax.grid(axis='y', linestyle=':', linewidth=0.5, alpha=0.3, color='gray')

# Every month on X axis
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b\n%Y'))
ax.xaxis.set_major_locator(mdates.MonthLocator(interval=1))

# Y axis currency
ax.yaxis.set_major_formatter(mticker.FuncFormatter(
    lambda x, p: f'${x:,.0f}'
))

# Peak annotation
peak_idx = sales_trends['MonthlyRevenue'].idxmax()
peak_row = sales_trends.loc[peak_idx]
ax.annotate(f"Peak\n${peak_row['MonthlyRevenue']:,.0f}",
            xy=(peak_row['SaleMonth'], peak_row['MonthlyRevenue']),
            xytext=(-40, 15),
            textcoords='offset points',
            fontsize=8,
            fontweight='bold',
            color='#2E74B5',
            arrowprops=dict(arrowstyle='->', color='#2E74B5', lw=1.2))

# Low annotation
low_idx = sales_trends['MonthlyRevenue'].idxmin()
low_row = sales_trends.loc[low_idx]
ax.annotate(f"Low\n${low_row['MonthlyRevenue']:,.0f}",
            xy=(low_row['SaleMonth'], low_row['MonthlyRevenue']),
            xytext=(10, -30),
            textcoords='offset points',
            fontsize=8,
            fontweight='bold',
            color='#C0392B',
            arrowprops=dict(arrowstyle='->', color='#C0392B', lw=1.2))

ax.set_xlabel("Month", fontsize=11, labelpad=10)
ax.set_ylabel("Net Revenue", fontsize=11, labelpad=10)
ax.set_title("3 Month Average Revenue vs Monthly Revenue Trend — Northwind Database",
             fontsize=14, fontweight="bold", pad=15)
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.legend(fontsize=9, framealpha=0.7, loc="upper left")

ax.text(months.iloc[-1], median_overall + 1500, 
        f'Median ${median_overall:,.0f}', 
        fontsize=8, color='#27AE60', va='bottom')
ax.fill_between(months, revenue, alpha=0.08, color='#2E74B5')


plt.tight_layout()
plt.savefig("outputs/02_Rolling3MonthAverage.png", dpi=150, bbox_inches="tight")
plt.show()

The 3-month rolling average (median $48,327) tracks the same general upward arc as raw monthly revenue but absorbs the most volatile swings, making the underlying growth trend easier to read. The most telling divergence appears in the January 1997 spike: while monthly revenue jumped to ~$61,000, the rolling average climbed only to ~$51,000, confirming that the spike was a short-term aberration rather than a sustained shift. Conversely, from December 1997 through April 1998 the rolling average rose steadily from ~$56,000 to ~$110,000, closely shadowing the raw line and signaling that the late-period acceleration was genuine and sustained across multiple months. After the May 1998 collapse to $18,334, the rolling average lands around $82,000, still well above the median, because it is anchored by the two strong preceding months, underscoring why the trailing average is a more reliable planning baseline than any single monthly figure.

Code

fig, ax = plt.subplots(figsize=(14, 6))

months = sales_trends["SaleMonth"]
growth = sales_trends["MoMGrowthPct"]

colors = ['#27AE60' if x > 0 else '#C0392B'
                  for x in sales_trends["MoMGrowthPct"]]

ax.bar(months,
       growth,
       color= colors,
       width=20)

ax.axhline(y=0, color='black', linewidth=1, zorder=2)

# Every month on X axis
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b\n%Y'))
ax.xaxis.set_major_locator(mdates.MonthLocator(interval=1))

# Subtle vertical grid — helps trace each point to its month
ax.grid(axis='x', linestyle=':', linewidth=0.5, alpha=0.5, color='gray')
ax.grid(axis='y', linestyle=':', linewidth=0.5, alpha=0.3, color='gray')

ax.yaxis.set_major_formatter(mticker.FuncFormatter(
    lambda x, p: f'{x:.1f}%'
))
ax.set_xlabel("Month", fontsize=11, labelpad=10)
ax.set_ylabel("MoM Growth %", fontsize=11, labelpad=10)
ax.set_title("Month-over-Month Revenue Growth — Northwind Database",
             fontsize=14, fontweight="bold", pad=15)
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)

legend_elements = [
    Patch(facecolor='#27AE60', label='Growth'),
    Patch(facecolor='#C0392B', label='Contraction')
]

ax.legend(handles=legend_elements, fontsize=9, 
          framealpha=0.7, loc="upper right")


for bar, value in zip(ax.patches, growth):
    ax.text(
        bar.get_x() + bar.get_width() / 2,
        bar.get_height() + (1 if value > 0 else -2.5),
        f'{value:.1f}%',
        ha='center',
        va='bottom' if value > 0 else 'top',
        fontsize=7,
        color='black'
    )

plt.tight_layout()
plt.savefig("outputs/02_MoM_Growth.png", dpi=150, bbox_inches="tight")
plt.show()

Across the 23 periods measured, growth months (15) outnumber contraction months (8) by nearly 2-to-1, confirming the positive directional bias seen in the trend charts. The most explosive single-month acceleration was December 1997 at +64.0%, followed by three other strong surges: October 1996 (+42.2%), January 1997 (+35.4%), and April 1997 (+37.6%). Contractions were concentrated in four episodes, August 1996 (−8.5%), February 1997 (−37.2%), June 1997 (−32.4%), and November 1997 (−34.8%) all recovering within one to two months, suggesting cyclical demand patterns rather than structural decline. The outlier is May 1998 at −85.2%, which dwarfs every other contraction by a factor of more than two and reinforces the interpretation that this figure reflects an incomplete data period rather than an actual revenue collapse.

5 Section 3: Fulfillment & Shipper Analysis

Timely fulfillment is a key driver of customer satisfaction in wholesale distribution. This section evaluates the three active shippers, Federal Shipping, Speedy Express, and United Package, across on-time rate, average ship time, freight spend, and severity of delays.

5.1 Data Model

Table	Role
`Shippers`	ShipperID, CompanyName, carrier identity
`Orders`	OrderDate, ShippedDate, RequiredDate, Freight, ShipVia, fulfillment timing and cost

5.2 Query Design

A single CTE (ShipperDetails) joins Shippers to Orders and computes all metrics in a single pass using conditional aggregation:

AvgDaysToShip average of DATEDIFF(DAY, OrderDate, ShippedDate) per shipper
OnTimeOrders / OnTimePct count and share of orders where ShippedDate ≤ RequiredDate
LatePct share of orders where ShippedDate > RequiredDate
AvgFreightExp and TotalSpentPerShipper freight cost analysis ### Shipper Performance Summary

Shipper	Avg Days to Ship	On-Time Performance	Late Orders	Total Freight
Speedy Express	7.5 days	Fastest	15	Lowest
Federal Shipping	8.6 days	Best overall consistency	15	Mid
United Package	9.2 days	Lowest on-time rate	27	Highest

5.3 Visualizations & Findings

Code

-- USE Northwind;
-- GO

WITH ShipperDetails AS (
    SELECT 
        s.ShipperID, 
        s.CompanyName,
        COUNT(DISTINCT o.OrderID) AS OrderCount,
        ROUND(AVG(CAST(DATEDIFF(DAY, o.OrderDate, o.ShippedDate) AS FLOAT)),1) AS AvgDaysToShip,
        SUM(CASE WHEN o.ShippedDate <= RequiredDate THEN 1 ELSE 0 END) AS OnTimeOrders,
        ROUND(100 * SUM(CASE WHEN o.ShippedDate <= o.RequiredDate THEN 1 ELSE 0 END) /NULLIF(COUNT(o.OrderID),0),2) AS OnTimePct,
        ROUND(100 * SUM(CASE WHEN o.ShippedDate > RequiredDate THEN 1 ELSE 0 END) /NULLIF(COUNT(o.OrderID),0),2) AS LatePct,
        AVG(o.Freight) AS AvgFreightExp,
        SUM(o.Freight) AS TotalSpentPerShipper
    FROM Shippers AS s
    JOIN Orders AS o
        ON s.ShipperID = o.ShipVia
    GROUP BY s.ShipperID, s.CompanyName
)
SELECT *
FROM ShipperDetails
ORDER BY AvgDaysToShip

Code

fig, ax = plt.subplots(figsize=(10, 6))

companies = fulfillment['CompanyName']
on_time = fulfillment['OnTimeOrders']
late_orders = fulfillment['OrderCount'] - fulfillment['OnTimeOrders']

x = np.arange(len(companies))
width = 0.35

bars_on = ax.bar(x - width/2, on_time, width,
                 label='On Time', color='#27AE60')
bars_late = ax.bar(x + width/2, late_orders, width,
                   label='Late', color='#C0392B')

# Value labels on bars
for bar in bars_on:
    ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 2,
            str(int(bar.get_height())),
            ha='center', va='bottom', fontsize=9, color='#27AE60')

for bar in bars_late:
    ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 2,
            str(int(bar.get_height())),
            ha='center', va='bottom', fontsize=9, color='#C0392B')

ax.grid(axis='y', linestyle=':', linewidth=0.5, alpha=0.3, color='gray')
ax.set_xticks(x)
ax.set_xticklabels(companies)
ax.set_xlabel("Shipper", fontsize=11, labelpad=10)
ax.set_ylabel("Order Count", fontsize=11, labelpad=10)
ax.set_title("On-Time vs Late Orders by Shipper — Northwind Database",
             fontsize=14, fontweight="bold", pad=15)
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.legend(fontsize=9, framealpha=0.7)

plt.tight_layout()
plt.savefig("outputs/03_ontime_vs_late.png", dpi=150, bbox_inches="tight")
plt.show()

United Package carries the highest total order volume at 326 orders (299 on-time, 27 late), making it the dominant carrier by throughput. Federal Shipping handled 255 orders with only 15 late an on-time rate of approximately 94% the strongest reliability performance of the three. Speedy Express processed 249 orders with 16 late, yielding a very similar on-time rate of ~93.6%, essentially tied with Federal Shipping on reliability despite handling slightly fewer orders. The gap in late order counts between Federal Shipping/Speedy Express (15–16) and United Package (27) is notable: United Package generates nearly twice the late deliveries of either peer, suggesting that its higher volume may be straining fulfillment capacity and depressing on-time performance.

Code

fig, ax = plt.subplots(figsize=(10, 6))

# Color bars by company
companies = fulfillment['CompanyName'].unique()
color_map = {
    'Federal Shipping': "#1F4E79",
    'Speedy Express':   "#2E74B5",
    'United Package':   "#BDD7EE"
}

fulfillment_sorted = fulfillment.sort_values('TotalSpentPerShipper', ascending=True)

companies = fulfillment_sorted['CompanyName']
spending = fulfillment_sorted['TotalSpentPerShipper']
colors = fulfillment_sorted['CompanyName'].map(color_map)

bars = ax.bar(
    companies,
    spending,
    color=colors,
    width=0.5)

# Value labels
for bar in bars:
    ax.text(bar.get_x() + bar.get_width()/2,
            bar.get_height() + 200,
            f'${bar.get_height():,.0f}',
            ha='center', va='bottom',
            fontsize=9, fontweight='bold'
            )

ax.grid(axis='y', linestyle=':', linewidth=0.5, alpha=0.3, color='gray')
ax.set_xlabel("Shipper", fontsize=11, labelpad=10)
ax.set_ylabel("Total Freight Spend", fontsize=11, labelpad=10)
ax.set_title("Total Freight Spend By Shipper - Northwind Database", fontsize=14, fontweight='bold', pad=15)
ax.yaxis.set_major_formatter(mticker.FuncFormatter(lambda x, p: f'${x:,.0f}'))
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

plt.tight_layout()
plt.savefig("outputs/03_freight_by_shipper.png", dpi=150, bbox_inches="tight")
plt.show()

Freight spend scales closely with order volume across all three carriers. United Package, as the highest-volume shipper, also incurs the largest freight cost at $28,245 — 74% more than Speedy Express ($16,185) and 38% more than Federal Shipping ($20,513). Federal Shipping sits in the middle both in volume and cost, while Speedy Express is the most economical option despite its near-identical on-time rate to Federal Shipping. From a cost-efficiency perspective, Speedy Express delivers comparable reliability at the lowest freight cost, whereas United Package’s combination of the highest spend and the most late orders makes it the least favorable carrier on a cost-per-reliable-delivery basis.

Code

fig, ax = plt.subplots(figsize=(10,6))

# Color bars by company
companies = fulfillment['CompanyName'].unique()
color_map = {
    'Federal Shipping': "#1F4E79",
    'Speedy Express':   "#2E74B5",
    'United Package':   "#BDD7EE"
}

fulfillment_sorted = fulfillment.sort_values('AvgDaysToShip', ascending=True)

companies = fulfillment_sorted['CompanyName']
shipping = fulfillment_sorted['AvgDaysToShip']
colors = fulfillment_sorted['CompanyName'].map(color_map)


bars = ax.bar(
    companies,
    shipping,
    color=colors,
    width=0.5)

# Value labels
for bar in bars:
    ax.text(bar.get_x() + bar.get_width()/2,
            bar.get_height() + 0.1,
            f'{bar.get_height():,.1f} days',
            ha='center', va='bottom',
            fontsize=9, fontweight='bold'
            )

ax.grid(axis='y', linestyle=':', linewidth=0.5, alpha=0.3, color='gray')
ax.set_xlabel("Shipper", fontsize=11, labelpad=10)
ax.set_ylabel("Average Days to Ship", fontsize=11, labelpad=10)
ax.set_title("Average Days to Ship By Shipper - Northwind Database", fontsize=14, fontweight='bold', pad=15)
ax.yaxis.set_major_formatter(mticker.FuncFormatter(lambda x, p: f'{x:,.0f}'))
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

plt.tight_layout()
plt.savefig("outputs/03_avg_time_to_ship.png", dpi=150, bbox_inches="tight")
plt.show()

Speedy Express is the fastest shipper at 7.5 days average that is 1.7 days faster than United Package at 9.2 days, an 18% difference. Federal Shipping falls between at 8.6 days. While the gap between shippers is relatively narrow, for time-sensitive orders the choice of carrier has meaningful impact on customer experience. Notably, United Package handles the highest order volume (326 orders) despite having both the slowest average ship time and the lowest on-time rate, suggesting capacity constraints may be affecting performance.

6 Section 4: Late Order Severity Analysis

While Section 3 measures fulfillment performance across all orders, this section isolates only the orders that missed their required delivery date, where ShippedDate > RequiredDate, and measures how severe those delays were per carrier. Frequency and severity are distinct failure modes: a shipper can have few late orders but extreme delays, or many late orders with quick recovery. Both dimensions matter for evaluating carrier risk.

6.1 Data Model

Table	Role
`Shippers`	ShipperID, CompanyName, carrier identity
`Orders`	RequiredDate, ShippedDate, ShipVia, deadline and actual delivery

6.2 Query Design Late Orders Isolation

A single CTE (ShippingDetails) filters to late orders only using WHERE o.ShippedDate > o.RequiredDate, then computes DATEDIFF(DAY, o.RequiredDate, o.ShippedDate) as DaysLate per order. The outer query returns the full detail sorted by worst delay, allowing both per-order inspection and downstream aggregation by shipper.

WITH ShippingDetails AS (
    SELECT 
        s.ShipperID, 
        s.CompanyName,
        o.OrderID,
        o.OrderDate,
        o.RequiredDate,
        o.ShippedDate,
        DATEDIFF(DAY, o.RequiredDate, o.ShippedDate) AS DaysLate
    FROM Shippers AS s
    JOIN Orders AS o
        ON s.ShipperID = o.ShipVia
    WHERE o.ShippedDate > o.RequiredDate
)
SELECT *
FROM ShippingDetails
ORDER BY DaysLate DESC;

The WHERE clause is the critical filter on-time orders never enter the CTE, so all downstream aggregation reflects delay severity only. ORDER BY DaysLate DESC surfaces the worst offenders first for immediate visibility.

6.3 Visualizations

Code

WITH ShippingDetails AS (
    SELECT 
        s.ShipperID, 
        s.CompanyName,
        o.OrderID,
        o.OrderDate,
        o.RequiredDate,
        o.ShippedDate,
        DATEDIFF(DAY, o.RequiredDate, o.ShippedDate) AS DaysLate
    FROM Shippers AS s
    JOIN Orders AS o
        ON s.ShipperID = o.ShipVia
    WHERE o.ShippedDate > o.RequiredDate
)
SELECT *
FROM ShippingDetails
ORDER BY DaysLate DESC;

Code

fig, ax = plt.subplots(figsize=(10, 5))

late_by_shipper = order_timing.groupby('CompanyName').agg(
    late_count=('OrderID', 'count'),
    avg_days_late=('DaysLate', 'mean'),
    max_days_late=('DaysLate', 'max')
).reset_index().sort_values('avg_days_late', ascending=False)


late_sorted = late_by_shipper.sort_values('avg_days_late', ascending=True)

color_map = {
    'Federal Shipping': "#1F4E79",
    'Speedy Express':   "#2E74B5",
    'United Package':   "#BDD7EE"
}

colors = late_sorted['CompanyName'].map(color_map)

bars = ax.barh(late_sorted['CompanyName'],
               late_sorted['avg_days_late'],
               color=colors,
               height=0.5)

# Value labels
for bar in bars:
    ax.text(bar.get_width() + 0.1,
            bar.get_y() + bar.get_height()/2,
            f'{bar.get_width():.1f} days',
            va='center', fontsize=9, fontweight='bold')

ax.grid(axis='x', linestyle=':', linewidth=0.5, alpha=0.3, color='gray')
ax.set_xlabel("Average Days Late", fontsize=11, labelpad=10)
ax.set_title("Average Days Late When Orders Miss Deadline — Northwind Database",
             fontsize=14, fontweight='bold', pad=15)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

plt.tight_layout()
plt.savefig("outputs/04_avg_days_late_by_shipper.png", dpi=150, bbox_inches="tight")
plt.show()

While Chart “On Time VS Late Orders by Shippers” identified United Package as having the lowest on-time rate with 27 late orders, the most of any shipper Chart “Average Days Late When Orders Miss Deadline” reveals a more nuanced picture of delay severity. Speedy Express, despite fewer late orders (15), averages 8.1 days past the required date when delays do occur 45% more severe than United Package and Federal Shipping, both averaging 5.6 days late. Federal Shipping performs most consistently across all three metrics: fewest late orders (15), tied for lowest average delay (5.6 days), and the best on-time rate from Chart 8. The data suggests two distinct failure modes, United Package fails frequently but recovers quickly, while Speedy Express fails rarely but severely. For time-critical shipments, Federal Shipping is the lowest-risk carrier across all fulfillment dimensions.

7 Conclusions & Recommendations

7.1 Supplier Revenue

Concentrate contract renewal efforts on the top 5–6 suppliers, which collectively account for the majority of net revenue.
Use the Overall vs. Category Rank quadrant to identify Niche Specialists, suppliers who lead their segment and may warrant priority even if their global rank appears modest.
Monitor concentration risk: heavy reliance on a small number of Beverages and Dairy Products suppliers creates vulnerability if any one relationship is disrupted.

7.2 Sales Trends

The 4.44× revenue growth over 23 months indicates strong underlying demand; however, the −85% MoM contraction in May 1998 warrants investigation to determine whether it reflects a genuine seasonal dip or a data collection artifact.
The 3-month rolling average is a more reliable planning baseline than raw monthly revenue given visible month-to-month volatility.
December consistently shows strong acceleration; incorporate Q4 seasonality into annual forecasting models.

7.3 Fulfillment & Carriers

Federal Shipping is the lowest-risk carrier across all fulfillment dimensions: fewest late orders (tied), lowest average delay (5.6 days), and competitive ship speed.
For time-critical shipments, avoid Speedy Express despite its speed advantage, when it misses deadlines, delays are 45% more severe than its peers.
Investigate United Package’s capacity constraints: its high volume combined with the most late deliveries suggests it may be approaching fulfillment limits.
Consider renegotiating freight contracts with United Package, which carries both the highest order volume and the highest total freight spend.

8 Appendix Technical Notes

8.1 Stack & Environment

Component	Details
Database	Northwind (SQL Server 2022)
Query Language	T-SQL with CTEs and window functions
Analysis	Python 3 · pandas · numpy
Visualization	matplotlib · matplotlib.ticker · matplotlib.dates
Connectivity	pyodbc (ODBC Driver 17 for SQL Server)

8.2 Chart Index

#	Chart Title	Section
1	Top 15 Suppliers by Revenue	Section 1
2	Total Revenue by Product Category	Section 1
3	Overall vs. Category Rank Quadrant	Section 1
4	Monthly Revenue Trend	Section 2
5	3-Month Rolling Average vs. Monthly Revenue	Section 2
6	Month-over-Month Revenue Growth	Section 2
7	On-Time vs. Late Orders by Shipper	Section 3
8	Total Freight Spend by Shipper	Section 3
9	Average Days to Ship by Shipper	Section 3
10	Average Days Late When Orders Miss Deadline	Section 4