Production

Cost Tracking and Optimization for AI Applications

Build cost tracking for AI applications with per-request logging, feature attribution, budget alerts, and optimization strategies in Node.js.

Cost Tracking and Optimization for AI Applications

Overview

AI features are deceptively expensive. A single LLM call might cost a fraction of a cent, but multiply that by thousands of users hitting your API every hour and you have a billing surprise that makes your CFO question every decision your engineering team has ever made. This article walks through building a production cost tracking system that captures every dollar flowing through your AI features, attributes costs to specific users and features, enforces spending limits, and gives you the data you need to optimize before your cloud bill forces the conversation.

Prerequisites

  • Node.js v18+ installed
  • PostgreSQL database running locally or remotely
  • Basic understanding of LLM APIs (OpenAI, Anthropic, or similar)
  • Familiarity with Express.js
  • Working knowledge of SQL

The Hidden Costs of AI Applications

Most teams dramatically underestimate what AI features actually cost in production. The LLM API call itself is the obvious cost, but it is rarely the full picture. Here is what actually adds up:

LLM API Calls are the line item everyone watches. GPT-4o runs roughly $2.50 per million input tokens and $10 per million output tokens. Claude Sonnet sits at $3/$15. These prices shift constantly, and the gap between "works in development" and "works at scale" is enormous. A prompt that costs $0.003 per call becomes $3,000 per day at a million requests.

Embedding Generation is the silent budget killer. If you are running RAG (retrieval-augmented generation), every document you ingest needs to be embedded. Every user query needs to be embedded at search time. OpenAI's text-embedding-3-small is cheap at $0.02 per million tokens, but high-volume search features can generate millions of embedding calls per day.

Vector Storage costs grow linearly with your dataset. Pinecone, Weaviate, or pgvector all charge for storage and queries. A million 1536-dimension vectors in Pinecone runs about $70/month on the starter plan. That grows fast when you are indexing real document corpuses.

Compute Costs cover your application servers, background workers processing AI tasks, and any GPU instances you run for local models. These are easy to forget when you are focused on API pricing.

Retry and Error Costs are the ones nobody budgets for. Rate limiting, timeouts, and malformed responses all trigger retries. Each retry is a billable API call. A 5% retry rate on an expensive model adds 5% to your AI budget.

Development and Testing Costs add up during iteration. Engineers running prompts against production APIs during development, integration tests hitting real endpoints, staging environments making real API calls. I have seen teams spend $500/month just on developer testing before a feature even launched.

Building a Cost Tracking System

The foundation is a cost ledger — a database table that records every AI operation with its associated cost, the feature that triggered it, and the user who initiated it.

Database Schema

CREATE TABLE ai_cost_ledger (
    id SERIAL PRIMARY KEY,
    request_id UUID NOT NULL DEFAULT gen_random_uuid(),
    timestamp TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    provider VARCHAR(50) NOT NULL,
    model VARCHAR(100) NOT NULL,
    operation_type VARCHAR(50) NOT NULL,
    feature_name VARCHAR(100) NOT NULL,
    user_id VARCHAR(100),
    input_tokens INTEGER NOT NULL DEFAULT 0,
    output_tokens INTEGER NOT NULL DEFAULT 0,
    total_tokens INTEGER NOT NULL DEFAULT 0,
    input_cost DECIMAL(10, 6) NOT NULL DEFAULT 0,
    output_cost DECIMAL(10, 6) NOT NULL DEFAULT 0,
    total_cost DECIMAL(10, 6) NOT NULL DEFAULT 0,
    latency_ms INTEGER,
    status VARCHAR(20) NOT NULL DEFAULT 'success',
    metadata JSONB DEFAULT '{}'
);

CREATE INDEX idx_cost_ledger_timestamp ON ai_cost_ledger(timestamp);
CREATE INDEX idx_cost_ledger_feature ON ai_cost_ledger(feature_name);
CREATE INDEX idx_cost_ledger_user ON ai_cost_ledger(user_id);
CREATE INDEX idx_cost_ledger_provider ON ai_cost_ledger(provider);

CREATE TABLE ai_budgets (
    id SERIAL PRIMARY KEY,
    scope_type VARCHAR(20) NOT NULL,
    scope_id VARCHAR(100) NOT NULL,
    daily_limit DECIMAL(10, 2),
    monthly_limit DECIMAL(10, 2),
    alert_threshold DECIMAL(3, 2) NOT NULL DEFAULT 0.80,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    UNIQUE(scope_type, scope_id)
);

CREATE TABLE ai_budget_alerts (
    id SERIAL PRIMARY KEY,
    budget_id INTEGER REFERENCES ai_budgets(id),
    alert_type VARCHAR(50) NOT NULL,
    threshold_pct DECIMAL(5, 2) NOT NULL,
    current_spend DECIMAL(10, 4) NOT NULL,
    budget_limit DECIMAL(10, 2) NOT NULL,
    triggered_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    acknowledged BOOLEAN DEFAULT FALSE
);

Pricing Registry

Hardcoding prices into application logic is a maintenance nightmare. Providers change pricing constantly, and you need the ability to update rates without redeploying. Build a pricing registry that maps providers, models, and token types to per-token costs.

// pricing-registry.js
var pricing = {
    openai: {
        "gpt-4o": { input: 2.50 / 1000000, output: 10.00 / 1000000 },
        "gpt-4o-mini": { input: 0.15 / 1000000, output: 0.60 / 1000000 },
        "gpt-4-turbo": { input: 10.00 / 1000000, output: 30.00 / 1000000 },
        "text-embedding-3-small": { input: 0.02 / 1000000, output: 0 },
        "text-embedding-3-large": { input: 0.13 / 1000000, output: 0 }
    },
    anthropic: {
        "claude-sonnet-4-20250514": { input: 3.00 / 1000000, output: 15.00 / 1000000 },
        "claude-haiku-3-20250414": { input: 0.80 / 1000000, output: 4.00 / 1000000 },
        "claude-opus-4-20250514": { input: 15.00 / 1000000, output: 75.00 / 1000000 }
    }
};

function getPricing(provider, model) {
    var providerPricing = pricing[provider];
    if (!providerPricing) {
        console.warn("Unknown provider: " + provider + ". Using zero cost.");
        return { input: 0, output: 0 };
    }
    var modelPricing = providerPricing[model];
    if (!modelPricing) {
        console.warn("Unknown model: " + model + ". Using zero cost.");
        return { input: 0, output: 0 };
    }
    return modelPricing;
}

function calculateCost(provider, model, inputTokens, outputTokens) {
    var rates = getPricing(provider, model);
    var inputCost = inputTokens * rates.input;
    var outputCost = outputTokens * rates.output;
    return {
        inputCost: inputCost,
        outputCost: outputCost,
        totalCost: inputCost + outputCost
    };
}

module.exports = {
    getPricing: getPricing,
    calculateCost: calculateCost,
    pricing: pricing
};

Per-Request Cost Tracking Middleware

The cost tracker wraps your AI provider calls and automatically logs every request to the ledger. This is the core of the system — every AI call goes through this wrapper so nothing escapes tracking.

// cost-tracker.js
var pg = require("pg");
var pricingRegistry = require("./pricing-registry");
var crypto = require("crypto");

var pool = new pg.Pool({
    connectionString: process.env.POSTGRES_CONNECTION_STRING
});

function trackAICall(options) {
    var requestId = crypto.randomUUID();
    var startTime = Date.now();

    return {
        requestId: requestId,
        record: function(response) {
            var latencyMs = Date.now() - startTime;
            var inputTokens = response.usage ? response.usage.input_tokens || response.usage.prompt_tokens || 0 : 0;
            var outputTokens = response.usage ? response.usage.output_tokens || response.usage.completion_tokens || 0 : 0;
            var totalTokens = inputTokens + outputTokens;

            var costs = pricingRegistry.calculateCost(
                options.provider,
                options.model,
                inputTokens,
                outputTokens
            );

            var entry = {
                request_id: requestId,
                provider: options.provider,
                model: options.model,
                operation_type: options.operationType || "completion",
                feature_name: options.featureName,
                user_id: options.userId || null,
                input_tokens: inputTokens,
                output_tokens: outputTokens,
                total_tokens: totalTokens,
                input_cost: costs.inputCost,
                output_cost: costs.outputCost,
                total_cost: costs.totalCost,
                latency_ms: latencyMs,
                status: "success",
                metadata: JSON.stringify(options.metadata || {})
            };

            return insertLedgerEntry(entry);
        },
        recordError: function(error) {
            var latencyMs = Date.now() - startTime;
            var entry = {
                request_id: requestId,
                provider: options.provider,
                model: options.model,
                operation_type: options.operationType || "completion",
                feature_name: options.featureName,
                user_id: options.userId || null,
                input_tokens: 0,
                output_tokens: 0,
                total_tokens: 0,
                input_cost: 0,
                output_cost: 0,
                total_cost: 0,
                latency_ms: latencyMs,
                status: "error",
                metadata: JSON.stringify({ error: error.message })
            };

            return insertLedgerEntry(entry);
        }
    };
}

function insertLedgerEntry(entry) {
    var query = "INSERT INTO ai_cost_ledger " +
        "(request_id, provider, model, operation_type, feature_name, user_id, " +
        "input_tokens, output_tokens, total_tokens, input_cost, output_cost, " +
        "total_cost, latency_ms, status, metadata) " +
        "VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15)";

    var values = [
        entry.request_id, entry.provider, entry.model,
        entry.operation_type, entry.feature_name, entry.user_id,
        entry.input_tokens, entry.output_tokens, entry.total_tokens,
        entry.input_cost, entry.output_cost, entry.total_cost,
        entry.latency_ms, entry.status, entry.metadata
    ];

    return pool.query(query, values).catch(function(err) {
        console.error("Failed to log AI cost:", err.message);
    });
}

module.exports = {
    trackAICall: trackAICall,
    pool: pool
};

Wrapping Provider Calls

Here is how you wrap actual OpenAI and Anthropic calls so every request is tracked automatically:

// ai-client.js
var OpenAI = require("openai");
var Anthropic = require("@anthropic-ai/sdk");
var costTracker = require("./cost-tracker");

var openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
var anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });

function callOpenAI(options) {
    var tracker = costTracker.trackAICall({
        provider: "openai",
        model: options.model || "gpt-4o-mini",
        operationType: "completion",
        featureName: options.featureName,
        userId: options.userId,
        metadata: options.metadata
    });

    return openai.chat.completions.create({
        model: options.model || "gpt-4o-mini",
        messages: options.messages,
        max_tokens: options.maxTokens || 1000,
        temperature: options.temperature || 0.7
    }).then(function(response) {
        tracker.record({
            usage: {
                prompt_tokens: response.usage.prompt_tokens,
                completion_tokens: response.usage.completion_tokens
            }
        });
        return response;
    }).catch(function(error) {
        tracker.recordError(error);
        throw error;
    });
}

function callAnthropic(options) {
    var tracker = costTracker.trackAICall({
        provider: "anthropic",
        model: options.model || "claude-haiku-3-20250414",
        operationType: "completion",
        featureName: options.featureName,
        userId: options.userId,
        metadata: options.metadata
    });

    return anthropic.messages.create({
        model: options.model || "claude-haiku-3-20250414",
        max_tokens: options.maxTokens || 1000,
        messages: options.messages
    }).then(function(response) {
        tracker.record({
            usage: {
                input_tokens: response.usage.input_tokens,
                output_tokens: response.usage.output_tokens
            }
        });
        return response;
    }).catch(function(error) {
        tracker.recordError(error);
        throw error;
    });
}

function generateEmbedding(options) {
    var tracker = costTracker.trackAICall({
        provider: "openai",
        model: options.model || "text-embedding-3-small",
        operationType: "embedding",
        featureName: options.featureName,
        userId: options.userId
    });

    return openai.embeddings.create({
        model: options.model || "text-embedding-3-small",
        input: options.input
    }).then(function(response) {
        tracker.record({
            usage: { prompt_tokens: response.usage.prompt_tokens, completion_tokens: 0 }
        });
        return response;
    }).catch(function(error) {
        tracker.recordError(error);
        throw error;
    });
}

module.exports = {
    callOpenAI: callOpenAI,
    callAnthropic: callAnthropic,
    generateEmbedding: generateEmbedding
};

Cost Attribution by Feature

Knowing your total AI spend is table stakes. The real insight comes from knowing which features drive that spend. When every call carries a featureName tag, you can run queries that tell you exactly where money goes.

// cost-queries.js
var pool = require("./cost-tracker").pool;

function getCostByFeature(startDate, endDate) {
    var query = "SELECT feature_name, " +
        "COUNT(*) as request_count, " +
        "SUM(total_tokens) as total_tokens, " +
        "SUM(total_cost) as total_cost, " +
        "AVG(total_cost) as avg_cost_per_request, " +
        "AVG(latency_ms) as avg_latency_ms " +
        "FROM ai_cost_ledger " +
        "WHERE timestamp BETWEEN $1 AND $2 AND status = 'success' " +
        "GROUP BY feature_name " +
        "ORDER BY total_cost DESC";

    return pool.query(query, [startDate, endDate]).then(function(result) {
        return result.rows;
    });
}

function getCostByUser(startDate, endDate, limit) {
    var query = "SELECT user_id, " +
        "COUNT(*) as request_count, " +
        "SUM(total_cost) as total_cost, " +
        "AVG(total_cost) as avg_cost_per_request, " +
        "array_agg(DISTINCT feature_name) as features_used " +
        "FROM ai_cost_ledger " +
        "WHERE timestamp BETWEEN $1 AND $2 " +
        "AND user_id IS NOT NULL AND status = 'success' " +
        "GROUP BY user_id " +
        "ORDER BY total_cost DESC " +
        "LIMIT $3";

    return pool.query(query, [startDate, endDate, limit || 50]).then(function(result) {
        return result.rows;
    });
}

function getDailyTrend(days) {
    var query = "SELECT DATE(timestamp) as day, " +
        "COUNT(*) as request_count, " +
        "SUM(total_cost) as total_cost, " +
        "SUM(total_tokens) as total_tokens " +
        "FROM ai_cost_ledger " +
        "WHERE timestamp > NOW() - INTERVAL '" + days + " days' " +
        "AND status = 'success' " +
        "GROUP BY DATE(timestamp) " +
        "ORDER BY day ASC";

    return pool.query(query).then(function(result) {
        return result.rows;
    });
}

function getCostByModel(startDate, endDate) {
    var query = "SELECT provider, model, " +
        "COUNT(*) as request_count, " +
        "SUM(input_tokens) as total_input_tokens, " +
        "SUM(output_tokens) as total_output_tokens, " +
        "SUM(total_cost) as total_cost " +
        "FROM ai_cost_ledger " +
        "WHERE timestamp BETWEEN $1 AND $2 AND status = 'success' " +
        "GROUP BY provider, model " +
        "ORDER BY total_cost DESC";

    return pool.query(query, [startDate, endDate]).then(function(result) {
        return result.rows;
    });
}

module.exports = {
    getCostByFeature: getCostByFeature,
    getCostByUser: getCostByUser,
    getDailyTrend: getDailyTrend,
    getCostByModel: getCostByModel
};

Cost Attribution by User

Identifying expensive users is not about penalizing anyone — it is about understanding usage patterns. Some users legitimately need heavy AI features. Others might be abusing a free tier or triggering unnecessary calls through UI behavior your team did not anticipate.

The getCostByUser query above gives you the top spenders. Pair that with user plan data and you can calculate whether each user is profitable:

function getUserProfitability(startDate, endDate) {
    var query = "SELECT l.user_id, " +
        "SUM(l.total_cost) as ai_cost, " +
        "u.plan_type, u.monthly_revenue " +
        "FROM ai_cost_ledger l " +
        "JOIN users u ON l.user_id = u.id " +
        "WHERE l.timestamp BETWEEN $1 AND $2 " +
        "AND l.status = 'success' AND l.user_id IS NOT NULL " +
        "GROUP BY l.user_id, u.plan_type, u.monthly_revenue " +
        "HAVING SUM(l.total_cost) > u.monthly_revenue * 0.3 " +
        "ORDER BY SUM(l.total_cost) DESC";

    return pool.query(query, [startDate, endDate]).then(function(result) {
        return result.rows;
    });
}

That query finds users whose AI costs exceed 30% of their monthly revenue — a threshold you should tune based on your margins. When AI costs eat more than a third of what a user pays you, it is time to either optimize their usage path or adjust pricing.

Implementing Spending Limits and Budget Alerts

Budget enforcement needs to happen before the API call, not after. Checking spend after you have already made the call is like checking your bank balance after swiping the card.

// budget-manager.js
var pool = require("./cost-tracker").pool;
var EventEmitter = require("events");

var budgetEvents = new EventEmitter();

function checkBudget(scopeType, scopeId) {
    var budgetQuery = "SELECT * FROM ai_budgets WHERE scope_type = $1 AND scope_id = $2";
    var spendQuery;

    if (scopeType === "user") {
        spendQuery = "SELECT " +
            "COALESCE(SUM(CASE WHEN timestamp > DATE_TRUNC('day', NOW()) THEN total_cost ELSE 0 END), 0) as daily_spend, " +
            "COALESCE(SUM(CASE WHEN timestamp > DATE_TRUNC('month', NOW()) THEN total_cost ELSE 0 END), 0) as monthly_spend " +
            "FROM ai_cost_ledger WHERE user_id = $1 AND status = 'success'";
    } else {
        spendQuery = "SELECT " +
            "COALESCE(SUM(CASE WHEN timestamp > DATE_TRUNC('day', NOW()) THEN total_cost ELSE 0 END), 0) as daily_spend, " +
            "COALESCE(SUM(CASE WHEN timestamp > DATE_TRUNC('month', NOW()) THEN total_cost ELSE 0 END), 0) as monthly_spend " +
            "FROM ai_cost_ledger WHERE feature_name = $1 AND status = 'success'";
    }

    return Promise.all([
        pool.query(budgetQuery, [scopeType, scopeId]),
        pool.query(spendQuery, [scopeId])
    ]).then(function(results) {
        var budget = results[0].rows[0];
        var spend = results[1].rows[0];

        if (!budget) {
            return { allowed: true, reason: "no_budget_configured" };
        }

        var dailySpend = parseFloat(spend.daily_spend);
        var monthlySpend = parseFloat(spend.monthly_spend);

        // Check hard limits
        if (budget.daily_limit && dailySpend >= parseFloat(budget.daily_limit)) {
            return { allowed: false, reason: "daily_limit_exceeded", spend: dailySpend, limit: budget.daily_limit };
        }
        if (budget.monthly_limit && monthlySpend >= parseFloat(budget.monthly_limit)) {
            return { allowed: false, reason: "monthly_limit_exceeded", spend: monthlySpend, limit: budget.monthly_limit };
        }

        // Check alert thresholds
        var threshold = parseFloat(budget.alert_threshold);
        if (budget.daily_limit && dailySpend >= parseFloat(budget.daily_limit) * threshold) {
            budgetEvents.emit("alert", {
                scopeType: scopeType,
                scopeId: scopeId,
                alertType: "daily_threshold",
                thresholdPct: threshold * 100,
                currentSpend: dailySpend,
                limit: budget.daily_limit
            });
        }
        if (budget.monthly_limit && monthlySpend >= parseFloat(budget.monthly_limit) * threshold) {
            budgetEvents.emit("alert", {
                scopeType: scopeType,
                scopeId: scopeId,
                alertType: "monthly_threshold",
                thresholdPct: threshold * 100,
                currentSpend: monthlySpend,
                limit: budget.monthly_limit
            });
        }

        return { allowed: true, dailySpend: dailySpend, monthlySpend: monthlySpend };
    });
}

function recordAlert(alert) {
    var query = "INSERT INTO ai_budget_alerts " +
        "(budget_id, alert_type, threshold_pct, current_spend, budget_limit) " +
        "SELECT id, $2, $3, $4, $5 FROM ai_budgets " +
        "WHERE scope_type = $6 AND scope_id = $7";

    return pool.query(query, [
        alert.alertType, alert.alertType, alert.thresholdPct,
        alert.currentSpend, alert.limit,
        alert.scopeType, alert.scopeId
    ]);
}

budgetEvents.on("alert", function(alert) {
    recordAlert(alert);
    console.warn("BUDGET ALERT: " + alert.scopeType + "/" + alert.scopeId +
        " - " + alert.alertType + " at " + alert.thresholdPct + "% " +
        "($" + alert.currentSpend.toFixed(4) + " / $" + alert.limit + ")");
});

module.exports = {
    checkBudget: checkBudget,
    budgetEvents: budgetEvents
};

Cost Optimization Strategies

Tracking costs is step one. Reducing them is where you actually save money. Here are the strategies that deliver the biggest impact.

Model Downgrading

Not every request needs your most expensive model. A classification task that GPT-4o handles for $0.003 can often be done by GPT-4o-mini for $0.0002 with comparable accuracy. Build a model selector that picks the right model for the job:

// model-selector.js
var modelTiers = {
    simple: { openai: "gpt-4o-mini", anthropic: "claude-haiku-3-20250414" },
    standard: { openai: "gpt-4o", anthropic: "claude-sonnet-4-20250514" },
    complex: { openai: "gpt-4o", anthropic: "claude-opus-4-20250514" }
};

function selectModel(provider, taskComplexity, budgetRemaining) {
    // If budget is getting tight, downgrade automatically
    if (budgetRemaining !== undefined && budgetRemaining < 0.5) {
        console.log("Budget low ($" + budgetRemaining.toFixed(2) + " remaining). Forcing simple tier.");
        return modelTiers.simple[provider];
    }

    return modelTiers[taskComplexity || "standard"][provider];
}

module.exports = { selectModel: selectModel, modelTiers: modelTiers };

Response Caching

If the same prompt produces the same output, cache it. This is especially effective for classification, extraction, and summarization where inputs repeat.

// ai-cache.js
var crypto = require("crypto");
var pool = require("./cost-tracker").pool;

function getCacheKey(provider, model, messages) {
    var payload = JSON.stringify({ provider: provider, model: model, messages: messages });
    return crypto.createHash("sha256").update(payload).digest("hex");
}

function getCached(provider, model, messages) {
    var key = getCacheKey(provider, model, messages);
    var query = "SELECT response_data FROM ai_response_cache " +
        "WHERE cache_key = $1 AND expires_at > NOW()";

    return pool.query(query, [key]).then(function(result) {
        if (result.rows.length > 0) {
            return JSON.parse(result.rows[0].response_data);
        }
        return null;
    });
}

function setCache(provider, model, messages, response, ttlSeconds) {
    var key = getCacheKey(provider, model, messages);
    var query = "INSERT INTO ai_response_cache (cache_key, response_data, expires_at) " +
        "VALUES ($1, $2, NOW() + INTERVAL '" + (ttlSeconds || 3600) + " seconds') " +
        "ON CONFLICT (cache_key) DO UPDATE SET response_data = $2, " +
        "expires_at = NOW() + INTERVAL '" + (ttlSeconds || 3600) + " seconds'";

    return pool.query(query, [key, JSON.stringify(response)]);
}

module.exports = { getCached: getCached, setCache: setCache };

Prompt Compression

Long prompts cost more. Stripping unnecessary whitespace, removing verbose instructions, and using concise system messages can cut input token counts by 20-40%.

function compressPrompt(text) {
    // Remove excessive whitespace
    var compressed = text.replace(/\n{3,}/g, "\n\n");
    compressed = compressed.replace(/[ \t]+/g, " ");
    compressed = compressed.trim();
    return compressed;
}

function estimateTokens(text) {
    // Rough estimate: ~4 characters per token for English text
    return Math.ceil(text.length / 4);
}

Analyzing Cost Trends

Daily and weekly trends tell you whether costs are growing linearly with users (healthy) or exponentially (problem). The getDailyTrend function above gives you raw data. Here is how to compute growth rate:

function analyzeCostGrowth(trendData) {
    if (trendData.length < 7) {
        return { status: "insufficient_data", message: "Need at least 7 days of data" };
    }

    var recentWeek = trendData.slice(-7);
    var previousWeek = trendData.slice(-14, -7);

    var recentTotal = recentWeek.reduce(function(sum, day) {
        return sum + parseFloat(day.total_cost);
    }, 0);

    var previousTotal = previousWeek.reduce(function(sum, day) {
        return sum + parseFloat(day.total_cost);
    }, 0);

    var growthRate = previousTotal > 0
        ? ((recentTotal - previousTotal) / previousTotal) * 100
        : 0;

    var avgDailyCost = recentTotal / 7;
    var projectedMonthly = avgDailyCost * 30;

    return {
        recentWeekTotal: recentTotal,
        previousWeekTotal: previousTotal,
        weekOverWeekGrowth: growthRate.toFixed(2) + "%",
        avgDailyCost: avgDailyCost.toFixed(4),
        projectedMonthlyCost: projectedMonthly.toFixed(2),
        status: growthRate > 20 ? "warning" : "healthy"
    };
}

Cost-Per-Outcome Analysis

The most important metric is not cost per API call — it is cost per successful outcome. If your AI-powered search feature costs $0.05 per query but users find what they need on the first try, that is better than a $0.01 feature that takes five attempts.

function getCostPerOutcome(featureName, startDate, endDate) {
    var query = "SELECT " +
        "COUNT(*) FILTER (WHERE status = 'success') as successful_calls, " +
        "COUNT(*) FILTER (WHERE status = 'error') as failed_calls, " +
        "SUM(total_cost) as total_cost, " +
        "SUM(total_cost) FILTER (WHERE status = 'success') as success_cost, " +
        "CASE WHEN COUNT(*) FILTER (WHERE status = 'success') > 0 " +
        "THEN SUM(total_cost) / COUNT(*) FILTER (WHERE status = 'success') " +
        "ELSE 0 END as cost_per_success, " +
        "CASE WHEN COUNT(*) FILTER (WHERE (metadata->>'user_satisfied')::boolean = true) > 0 " +
        "THEN SUM(total_cost) / COUNT(*) FILTER (WHERE (metadata->>'user_satisfied')::boolean = true) " +
        "ELSE 0 END as cost_per_satisfied_user " +
        "FROM ai_cost_ledger " +
        "WHERE feature_name = $1 AND timestamp BETWEEN $2 AND $3";

    return pool.query(query, [featureName, startDate, endDate]).then(function(result) {
        return result.rows[0];
    });
}

Forecasting Future Costs

Historical data makes forecasting straightforward. Linear regression on daily costs gives you a reasonable projection. For early-stage products, factor in your user growth rate:

function forecastCosts(trendData, userGrowthRateMonthly) {
    var days = trendData.length;
    var totalCost = 0;
    var weightedSum = 0;
    var daySum = 0;

    for (var i = 0; i < days; i++) {
        var cost = parseFloat(trendData[i].total_cost);
        totalCost += cost;
        weightedSum += cost * (i + 1);
        daySum += (i + 1);
    }

    var avgCost = totalCost / days;
    var slope = (weightedSum - (daySum * avgCost)) / (daySum * daySum / days - daySum * (days + 1) / 2);

    var growthMultiplier = userGrowthRateMonthly ? (1 + userGrowthRateMonthly / 100) : 1;

    var forecast30 = 0;
    var forecast90 = 0;
    for (var d = 1; d <= 90; d++) {
        var projectedDaily = avgCost + (slope * d);
        var monthsOut = Math.floor(d / 30);
        var adjustedDaily = projectedDaily * Math.pow(growthMultiplier, monthsOut);
        if (d <= 30) forecast30 += adjustedDaily;
        forecast90 += adjustedDaily;
    }

    return {
        next30Days: forecast30.toFixed(2),
        next90Days: forecast90.toFixed(2),
        dailyTrend: slope > 0 ? "increasing" : "decreasing",
        dailyTrendRate: "$" + Math.abs(slope).toFixed(4) + "/day"
    };
}

Comparing Actual vs Budgeted Costs

Set budgets at the beginning of each month and compare actuals as they come in:

function getBudgetComparison(month, year) {
    var startDate = new Date(year, month - 1, 1);
    var endDate = new Date(year, month, 0, 23, 59, 59);

    var query = "SELECT b.scope_type, b.scope_id, b.monthly_limit, " +
        "COALESCE(SUM(l.total_cost), 0) as actual_spend, " +
        "CASE WHEN b.monthly_limit > 0 " +
        "THEN (COALESCE(SUM(l.total_cost), 0) / b.monthly_limit) * 100 " +
        "ELSE 0 END as pct_of_budget " +
        "FROM ai_budgets b " +
        "LEFT JOIN ai_cost_ledger l ON (" +
        "  (b.scope_type = 'feature' AND l.feature_name = b.scope_id) OR " +
        "  (b.scope_type = 'user' AND l.user_id = b.scope_id)" +
        ") AND l.timestamp BETWEEN $1 AND $2 AND l.status = 'success' " +
        "GROUP BY b.scope_type, b.scope_id, b.monthly_limit " +
        "ORDER BY COALESCE(SUM(l.total_cost), 0) DESC";

    return pool.query(query, [startDate, endDate]).then(function(result) {
        return result.rows.map(function(row) {
            return {
                scope: row.scope_type + "/" + row.scope_id,
                budget: parseFloat(row.monthly_limit),
                actual: parseFloat(row.actual_spend),
                pctUsed: parseFloat(row.pct_of_budget).toFixed(1) + "%",
                status: parseFloat(row.pct_of_budget) > 100 ? "over_budget" :
                    parseFloat(row.pct_of_budget) > 80 ? "warning" : "on_track"
            };
        });
    });
}

Complete Working Example: Cost Dashboard

Here is the Express.js dashboard that ties everything together. It exposes API endpoints for the cost data and serves a management dashboard.

// routes/cost-dashboard.js
var express = require("express");
var router = express.Router();
var costQueries = require("../cost-queries");
var budgetManager = require("../budget-manager");
var pool = require("../cost-tracker").pool;

// Dashboard overview
router.get("/", function(req, res) {
    var now = new Date();
    var startOfMonth = new Date(now.getFullYear(), now.getMonth(), 1);
    var endOfDay = new Date(now.getFullYear(), now.getMonth(), now.getDate(), 23, 59, 59);

    Promise.all([
        costQueries.getCostByFeature(startOfMonth, endOfDay),
        costQueries.getCostByUser(startOfMonth, endOfDay, 20),
        costQueries.getDailyTrend(30),
        costQueries.getCostByModel(startOfMonth, endOfDay)
    ]).then(function(results) {
        var featureCosts = results[0];
        var userCosts = results[1];
        var dailyTrend = results[2];
        var modelCosts = results[3];

        var totalSpend = featureCosts.reduce(function(sum, f) {
            return sum + parseFloat(f.total_cost);
        }, 0);

        res.json({
            period: {
                start: startOfMonth.toISOString(),
                end: endOfDay.toISOString()
            },
            summary: {
                totalSpend: totalSpend.toFixed(4),
                totalRequests: featureCosts.reduce(function(sum, f) {
                    return sum + parseInt(f.request_count);
                }, 0),
                avgCostPerRequest: featureCosts.length > 0
                    ? (totalSpend / featureCosts.reduce(function(s, f) { return s + parseInt(f.request_count); }, 0)).toFixed(6)
                    : "0"
            },
            costByFeature: featureCosts,
            topUsers: userCosts,
            dailyTrend: dailyTrend,
            costByModel: modelCosts
        });
    }).catch(function(err) {
        res.status(500).json({ error: err.message });
    });
});

// Budget status
router.get("/budgets", function(req, res) {
    var now = new Date();
    var month = parseInt(req.query.month) || (now.getMonth() + 1);
    var year = parseInt(req.query.year) || now.getFullYear();

    var query = "SELECT b.*, " +
        "COALESCE(SUM(l.total_cost), 0) as current_spend " +
        "FROM ai_budgets b " +
        "LEFT JOIN ai_cost_ledger l ON (" +
        "  (b.scope_type = 'feature' AND l.feature_name = b.scope_id) OR " +
        "  (b.scope_type = 'user' AND l.user_id = b.scope_id)" +
        ") AND l.timestamp >= DATE_TRUNC('month', NOW()) " +
        "AND l.status = 'success' " +
        "GROUP BY b.id";

    pool.query(query).then(function(result) {
        res.json({
            budgets: result.rows.map(function(row) {
                var monthlyPct = row.monthly_limit
                    ? (parseFloat(row.current_spend) / parseFloat(row.monthly_limit) * 100).toFixed(1)
                    : null;
                return {
                    scope: row.scope_type + "/" + row.scope_id,
                    monthlyLimit: row.monthly_limit,
                    currentSpend: parseFloat(row.current_spend).toFixed(4),
                    percentUsed: monthlyPct ? monthlyPct + "%" : "no limit",
                    alertThreshold: (parseFloat(row.alert_threshold) * 100) + "%"
                };
            })
        });
    }).catch(function(err) {
        res.status(500).json({ error: err.message });
    });
});

// Set or update budget
router.post("/budgets", function(req, res) {
    var body = req.body;
    var query = "INSERT INTO ai_budgets (scope_type, scope_id, daily_limit, monthly_limit, alert_threshold) " +
        "VALUES ($1, $2, $3, $4, $5) " +
        "ON CONFLICT (scope_type, scope_id) DO UPDATE SET " +
        "daily_limit = $3, monthly_limit = $4, alert_threshold = $5";

    pool.query(query, [
        body.scopeType, body.scopeId,
        body.dailyLimit || null, body.monthlyLimit || null,
        body.alertThreshold || 0.80
    ]).then(function() {
        res.json({ success: true });
    }).catch(function(err) {
        res.status(500).json({ error: err.message });
    });
});

// Recent alerts
router.get("/alerts", function(req, res) {
    var query = "SELECT a.*, b.scope_type, b.scope_id " +
        "FROM ai_budget_alerts a " +
        "JOIN ai_budgets b ON a.budget_id = b.id " +
        "ORDER BY a.triggered_at DESC LIMIT 50";

    pool.query(query).then(function(result) {
        res.json({ alerts: result.rows });
    }).catch(function(err) {
        res.status(500).json({ error: err.message });
    });
});

// Forecast endpoint
router.get("/forecast", function(req, res) {
    var growthRate = parseFloat(req.query.growth_rate) || 10;

    costQueries.getDailyTrend(60).then(function(trendData) {
        if (trendData.length < 14) {
            return res.json({ error: "Need at least 14 days of data for forecasting" });
        }

        var days = trendData.length;
        var totalCost = 0;
        var weightedSum = 0;

        for (var i = 0; i < days; i++) {
            var cost = parseFloat(trendData[i].total_cost);
            totalCost += cost;
            weightedSum += cost * (i + 1);
        }

        var avgCost = totalCost / days;
        var growthMultiplier = 1 + growthRate / 100;

        res.json({
            historicalAvgDaily: avgCost.toFixed(4),
            projectedNext30Days: (avgCost * 30 * growthMultiplier).toFixed(2),
            projectedNext90Days: (avgCost * 90 * Math.pow(growthMultiplier, 3)).toFixed(2),
            assumptions: {
                userGrowthRate: growthRate + "% monthly",
                basedOnDays: days
            }
        });
    }).catch(function(err) {
        res.status(500).json({ error: err.message });
    });
});

module.exports = router;

Register the dashboard in your main Express app:

// In app.js
var costDashboard = require("./routes/cost-dashboard");
app.use("/admin/costs", costDashboard);

Chargebacks and Internal Billing

For organizations with multiple teams sharing AI infrastructure, chargebacks attribute costs to the team that generated them. Tag each request with a team_id in the metadata and run chargeback reports monthly:

function getChargebackReport(month, year) {
    var startDate = new Date(year, month - 1, 1);
    var endDate = new Date(year, month, 0, 23, 59, 59);

    var query = "SELECT metadata->>'team_id' as team_id, " +
        "SUM(total_cost) as total_cost, " +
        "COUNT(*) as request_count, " +
        "array_agg(DISTINCT feature_name) as features_used " +
        "FROM ai_cost_ledger " +
        "WHERE timestamp BETWEEN $1 AND $2 AND status = 'success' " +
        "AND metadata->>'team_id' IS NOT NULL " +
        "GROUP BY metadata->>'team_id' " +
        "ORDER BY total_cost DESC";

    return pool.query(query, [startDate, endDate]).then(function(result) {
        return result.rows;
    });
}

Automated Cost Optimization

When budgets get tight, the system should automatically downgrade models rather than rejecting requests entirely. This provides graceful degradation:

function getOptimizedModel(provider, preferredModel, featureName) {
    return budgetManager.checkBudget("feature", featureName).then(function(result) {
        if (!result.allowed) {
            return null; // Hard-blocked
        }

        // If budget allows but spending is high, downgrade
        if (result.monthlySpend && result.monthlyLimit) {
            var pctUsed = result.monthlySpend / parseFloat(result.monthlyLimit);
            if (pctUsed > 0.9) {
                console.log("Budget > 90% for " + featureName + ". Downgrading to cheapest model.");
                return provider === "openai" ? "gpt-4o-mini" : "claude-haiku-3-20250414";
            }
            if (pctUsed > 0.7) {
                console.log("Budget > 70% for " + featureName + ". Using standard tier.");
                return provider === "openai" ? "gpt-4o-mini" : "claude-sonnet-4-20250514";
            }
        }

        return preferredModel;
    });
}

Common Issues and Troubleshooting

1. Token counts do not match provider invoices

Expected monthly cost: $142.50
Actual provider invoice: $187.32
Discrepancy: $44.82 (31.4%)

This usually happens because your tracking misses retried requests, background jobs that bypass the wrapper, or development/staging traffic hitting production APIs. Audit your codebase for any direct provider SDK calls that skip the cost tracker. Also check for webhook-triggered AI calls and cron jobs.

2. Budget check race conditions under high concurrency

Error: Budget exceeded - daily limit $5.00, current spend $7.34

When multiple requests check the budget simultaneously, they all see the same "under budget" state and all proceed. The fix is to use PostgreSQL advisory locks or Redis distributed locks around budget checks:

function checkBudgetWithLock(scopeType, scopeId) {
    var lockKey = scopeType + ":" + scopeId;
    var lockId = Math.abs(hashCode(lockKey));

    return pool.query("SELECT pg_advisory_xact_lock($1)", [lockId]).then(function() {
        return checkBudget(scopeType, scopeId);
    });
}

3. PostgreSQL connection pool exhaustion from high-volume logging

Error: TimeoutError: Connection terminated due to connection timeout
  at Pool._pulseQueue (node_modules/pg-pool/index.js:182:27)

Every AI call creates a database insert for the ledger. At high volume, this overwhelms the connection pool. Batch inserts using a local buffer that flushes every few seconds instead of writing on every request:

var buffer = [];
var FLUSH_INTERVAL = 5000;
var BATCH_SIZE = 100;

setInterval(function() {
    if (buffer.length === 0) return;
    var batch = buffer.splice(0, BATCH_SIZE);
    var values = batch.map(function(entry, i) {
        var offset = i * 15;
        return "(" + Array.from({ length: 15 }, function(_, j) {
            return "$" + (offset + j + 1);
        }).join(",") + ")";
    }).join(",");

    var params = batch.reduce(function(acc, entry) {
        return acc.concat(Object.values(entry));
    }, []);

    pool.query("INSERT INTO ai_cost_ledger (...) VALUES " + values, params);
}, FLUSH_INTERVAL);

4. Stale pricing registry causes cost calculation drift

Warning: Unknown model: gpt-4o-2025-08-06. Using zero cost.

Providers release new model versions constantly. When your registry does not recognize a model string, it logs zero cost, making your tracking useless. Set up a fallback that estimates cost from the model family and logs a warning for operations teams to update the registry:

function getPricingWithFallback(provider, model) {
    var exact = pricing[provider] && pricing[provider][model];
    if (exact) return exact;

    // Try matching model family
    var familyMatch = Object.keys(pricing[provider] || {}).find(function(key) {
        return model.startsWith(key.split("-").slice(0, 2).join("-"));
    });

    if (familyMatch) {
        console.warn("Using fallback pricing for " + model + " (matched " + familyMatch + ")");
        return pricing[provider][familyMatch];
    }

    console.error("CRITICAL: No pricing found for " + provider + "/" + model);
    return { input: 0, output: 0 };
}

Best Practices

  • Track every AI call without exception. If a single code path bypasses the cost tracker, your numbers will drift from reality. Use the wrapper pattern and lint for direct SDK imports.

  • Store costs in the highest precision your database supports. Individual API calls cost fractions of a cent. DECIMAL(10,6) gives you six decimal places, enough to track sub-penny costs accurately across millions of records.

  • Separate cost tracking from budget enforcement. Tracking is an append-only log that should never fail or block requests. Budget enforcement is a gatekeeper that can reject requests. Keep these concerns in different code paths so a logging failure never blocks production traffic.

  • Update your pricing registry within 24 hours of provider price changes. Set a calendar reminder for provider pricing pages or subscribe to their changelogs. Stale prices make your entire tracking system unreliable.

  • Build cost awareness into your development workflow. Add estimated cost per request to your API documentation. Include cost impact in pull request reviews for AI features. Make cost a first-class engineering metric alongside latency and error rate.

  • Cache aggressively for deterministic operations. Classification, extraction, and any prompt where the same input should produce the same output are cache candidates. A 30% cache hit rate on your most expensive model saves real money.

  • Run cost reports weekly, not monthly. Monthly reports surface problems too late. A weekly review lets you catch cost spikes from new features, traffic surges, or prompt regressions before they compound into budget-breaking bills.

  • Set budget alerts at 50%, 75%, and 90% thresholds. The 50% alert gives you early warning. The 75% alert triggers investigation. The 90% alert triggers automatic model downgrading. The 100% limit is your hard stop.

  • Measure cost per outcome, not just cost per call. A feature that costs $0.10 per successful task completion is cheaper than one that costs $0.02 per call but takes five calls to succeed. Optimize for business outcomes, not API metrics.

References

Powered by Contentful