Sondov Engen (@sen)16d ago·permalink
Really cool chart Terje Tofteberg, would be awesome if some of the data was added as variables. Maybe best/worst performer name/return. Maybe also best/worst sector?

Summary

User supplied summary for the plot

S&P componenent sized by company market cap and colored by yesterday return.

Description

The below description is supplied in free-text by the user

S&P 500 Treemap Visualization

This interactive treemap shows the S&P 500 constituents sized by market capitalization and colored by yesterday's return. Each tile represents a company, with larger tiles indicating higher market cap and colors ranging from red (negative returns) to green (positive returns).

Click any company tile to see detailed information including company name, GICS sector, and sub-industry classification.

How It Works

Data Collection Pipeline

The visualization is powered by a Python script (ex-spx-constituents.py) that runs daily to fetch fresh market data and update the plot.

1. Fetching S&P 500 Constituents from Wikipedia

The script scrapes the live list of S&P 500 companies from Wikipedia, including company names, GICS sectors, and sub-industries:

def spx_tickers() -> tuple[list[str], pd.DataFrame]:
    url = "https://en.wikipedia.org/wiki/List_of_S%26P_500_companies"
    headers = {
        "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36..."
    }
    response = requests.get(url, headers=headers).text
    html = StringIO(response)
    
    # Find the table with 490-510 rows and a "Symbol" column
    for table in pd.read_html(html):
        if table.shape[0] >= 490 and table.shape[0] <= 510 and "Symbol" in table.columns:
            break

    # Extract company information - handle missing columns gracefully
    columns_to_extract = ["Symbol"]
    optional_columns = {
        "Security": "Company",
        "GICS Sector": "Sector",
        "GICS Sub-Industry": "Sub-Industry"
    }

    for col, fallback_name in optional_columns.items():
        if col in table.columns:
            columns_to_extract.append(col)
        else:
            table[col] = ""
            columns_to_extract.append(col)

    company_info = table[columns_to_extract].copy()
    company_info["Symbol"] = company_info["Symbol"].str.replace(".", "-", regex=False)
    company_info = company_info.set_index("Symbol")
    
    tickers = company_info.index.tolist()
    return tickers, company_info

This approach ensures the list stays current as companies are added or removed from the index.

2. Fetching Market Cap from Yahoo Finance

Market capitalization data is fetched using yahooquery in parallel batches for efficiency:

def get_mcap(tickers):
    BATCH_SIZE = 51  # Number of tickers per batch
    MAX_THREADS = 10 # Concurrent batches
    
    def fetch_market_cap_batch(batch_tickers):
        t = Ticker(batch_tickers, validate=False)
        data = t.summary_detail
        
        batch_caps = {}
        for ticker, detail in data.items():
            if isinstance(detail, dict) and 'marketCap' in detail:
                batch_caps[ticker] = detail['marketCap']
        return batch_caps
    
    # Process all batches concurrently
    with ThreadPoolExecutor(max_workers=MAX_THREADS) as executor:
        futures = {executor.submit(fetch_market_cap_batch, batch): batch 
                   for batch in batches}
        for future in as_completed(futures):
            all_market_caps.update(future.result())

3. Calculating Yesterday's Returns

Daily returns are calculated by downloading 2 days of price history:

def get_return(tickers): 
    prices = yf.download(tickers, period="2d", interval="1d", 
                        auto_adjust=True, threads=True)
    close = prices.xs("Close", level=1, axis=1)
    r = close.pct_change(fill_method=None).iloc[-1]
    
    df = pd.DataFrame({
        'MarketCap': mcap,
        'ReturnYday': r
    })
    return df, close.index[-1]

# Merge company information with market data
df = df.join(company_info, how='left')

Treemap Visualization (D3.js)

The custom JavaScript code creates the interactive treemap using D3.js v7:

/*
 * Available global variables:
 * `info`   - the novem info object (contains data and metadata)
 * `render` - the novem render object (dark mode, DPI scale, etc.)
 * `node`   - DOM element to render into
 * `width`  - current width of the target
 * `height` - current height of the target
 *
 * Available libraries:
 * `R`      - ramda version 29.0.0
 * `d3`     - d3.js version 7.0
 * `Plot`   - Observable plot 0.6
 */

// Extract and convert data
const headers = info.metadata.header;
const rows = info.data;

const data = rows.map(row => {
  const obj = {};
  headers.forEach((h, i) => obj[h] = row[i]);
  obj.Date = new Date(obj.Date);
  obj.size = +obj.MarketCap;
  obj.value = +obj.ReturnYday;
  return obj;
});

// Color scale: red for losses, green for gains
const color = d3.scaleThreshold()
    .domain([-0.03, -0.01, 0.01, 0.03])
    .range([
        "#F2453D", // strong red (< -3%)
        "#F7A9A7", // mild red (-3% to -1%)
        "#DDDDDD", // neutral (-1% to +1%)
        "#7EDFC0", // mild green (+1% to +3%)
        "#24B588"  // strong green (> +3%)
    ]);

// Create treemap hierarchy
const root = d3.treemap()
  .size([width, height])
  .padding(1)
  .round(true)(
    d3.hierarchy({children: data})
      .sum(d => d.MarketCap)
      .sort((a, b) => b.value - a.value)
  );

const svg = d3.create("svg")
  .attr("viewBox", [0, 0, width, height])
  .attr("width", width)
  .attr("height", height);

const leaf = svg.selectAll("g")
  .data(root.leaves())
  .join("g")
  .attr("transform", d => `translate(${d.x0},${d.y0})`)
  .style("cursor", "pointer");

// Create info display element for click events
const infoDiv = d3.select(node)
  .insert("div", ":first-child")
  .attr("class", "company-info")
  .style("position", "absolute")
  .style("top", "10px")
  .style("left", "10px")
  .style("padding", "12px 16px")
  .style("background", "rgba(255, 255, 255, 0.95)")
  .style("border-radius", "6px")
  .style("box-shadow", "0 2px 8px rgba(0,0,0,0.15)")
  .style("font-family", "system-ui, -apple-system, sans-serif")
  .style("font-size", "20px")
  .style("line-height", "1.6")
  .style("pointer-events", "none")
  .style("opacity", "0")
  .style("transition", "opacity 0.2s");

// Click event handler to display company information
leaf.on("click", function(event, d) {
  event.stopPropagation();

  const ticker = d.data.ticker || "";
  const companyName = d.data.Security || d.data.Company || "";
  const sector = d.data["GICS Sector"] || d.data.Sector || "";
  const subIndustry = d.data["GICS Sub-Industry"] || d.data["Sub-Industry"] || "";
  const returnPct = d.data.ReturnYday * 100;
  const returnStr = (returnPct >= 0 ? "+" : "") + returnPct.toFixed(1) + "%";

  // Format: "AAPL - Apple Inc. +2.5%"
  const line1 = `${ticker}${companyName ? " - " + companyName : ""} ${returnStr}`;
  // Format: "Technology - Software - Infrastructure"
  const line2 = `${sector}${subIndustry ? " - " + subIndustry : ""}`;

  infoDiv
    .style("opacity", "1")
    .html(`<div style="font-weight: 600; margin-bottom: 4px;">${line1}</div>
           <div style="color: #666;">${line2}</div>`);
});

// Click outside to hide info
svg.on("click", function() {
  infoDiv.style("opacity", "0");
});

// Rectangles with color and shading
leaf.append("rect")
    .attr("fill", d => color(d.data.ReturnYday))
    .attr("width", d => d.x1 - d.x0)
    .attr("height", d => d.y1 - d.y0);

// Smart label sizing based on tile dimensions
// ... (label logic continues)

Key Visualization Features

  • Interactive Click Events: Click any company tile to display:
    • Line 1: Ticker symbol, company name, and yesterday's return (e.g., "AAPL - Apple Inc. +2.5%")
    • Line 2: GICS sector and sub-industry classification (e.g., "Technology - Software - Infrastructure")
    • Click anywhere outside to dismiss the info display
  • Color Scale: 5-tier threshold scale from red (strong losses) to green (strong gains)
  • Adaptive Labels: Font sizes and visibility adjust based on tile dimensions
  • Smart Text Contrast: Automatically switches between dark and light text based on background color luminance
  • Gradient Shading: Subtle gradient overlay adds depth to the tiles
  • Hover Tooltips: Native browser tooltips show market cap and return percentage

Automation & Deployment

Docker Configuration

The job runs in a lightweight Python Alpine container:

FROM python:alpine 
WORKDIR /app
COPY ./requirements.txt requirements.txt

RUN apk add --no-cache gcc python3-dev musl-dev linux-headers
RUN pip install -r requirements.txt

COPY . ./
ENTRYPOINT ["ash", "run.sh"]

Scheduling

The job runs daily via cron at 6:43 AM UTC:

43 6 * * *

This timing ensures fresh data from the previous trading day is processed and available before US market open.

Data Quality Notes

  • For companies with dual share classes (e.g., GOOGL/GOOG, BRK.A/BRK.B), only the larger class is shown but scaled by total company market capitalization
  • Excluded tickers: BRK-A, GOOGL, FOXA (smaller share classes)
  • Data sources:
    • Wikipedia for company names, GICS sectors, and sub-industries
    • Yahoo Finance via yfinance and yahooquery libraries for market caps and prices
  • The plot automatically updates its caption with the current date
  • Missing company information fields are handled gracefully with empty values

Repository: trt/research (hosted on Novem Git)
Job: finance-spx-constituents
Plot Type: Custom (D3.js treemap)
Update Frequency: Daily at 6:43 AM UTC