Logging and Observability for LLM Calls
Build comprehensive logging for LLM calls with structured output, PII redaction, tracing, and searchable log storage in Node.js.
Logging and Observability for LLM Calls
When you ship LLM-powered features to production, you quickly discover that traditional application logging is not enough. LLM calls are expensive, non-deterministic, and slow compared to typical API calls — which means you need specialized logging that captures token usage, latency, cost, prompt content, and model behavior in a structured, searchable format. This article walks through building a production-grade logging and observability layer for LLM calls in Node.js, covering everything from structured JSON output and PII redaction to OpenTelemetry tracing and log-based alerting.
Prerequisites
- Node.js v18 or later
- Working knowledge of Express.js and middleware patterns
- Basic familiarity with the OpenAI SDK or similar LLM client
- PostgreSQL 14+ (for queryable log storage)
- A general understanding of structured logging concepts
Install the dependencies you will need:
npm install winston openai uuid pg express @opentelemetry/api @opentelemetry/sdk-node @opentelemetry/sdk-trace-base @opentelemetry/exporter-trace-otlp-http
What to Log for LLM Calls
Standard application logs capture request/response cycles, but LLM calls generate metadata that is critical for debugging, cost management, and compliance. Here is the minimum set of fields I log for every LLM interaction:
| Field | Why It Matters |
|---|---|
correlation_id |
Ties the LLM call to the originating user request |
model |
Model drift detection; cost differs per model |
prompt_tokens |
Input cost tracking |
completion_tokens |
Output cost tracking |
total_tokens |
Budget enforcement |
latency_ms |
SLA monitoring, timeout tuning |
estimated_cost |
Real-time spend visibility |
temperature |
Reproducibility; explains output variance |
max_tokens |
Helps debug truncated responses |
status |
success, error, timeout, rate_limited |
error_message |
The actual failure reason from the provider |
prompt_hash |
Deduplication without storing raw prompts |
response_length |
Quick indicator of response quality |
Missing any of these fields means you are flying blind. I have debugged production issues where the root cause was a model parameter change that only showed up because we logged temperature and max_tokens consistently.
Structured Logging Format
Unstructured log lines like "Called GPT-4, got response in 2.3s" are useless at scale. You need JSON with consistent field names that tools can parse, index, and query.
Here is the log schema I use:
var llmLogEntry = {
timestamp: new Date().toISOString(),
level: "info",
service: "llm-gateway",
correlation_id: "req-abc123",
trace_id: "trace-xyz789",
span_id: "span-001",
event: "llm.completion",
model: "gpt-4o",
provider: "openai",
prompt_tokens: 1240,
completion_tokens: 356,
total_tokens: 1596,
latency_ms: 2340,
estimated_cost_usd: 0.0247,
temperature: 0.7,
max_tokens: 1024,
status: "success",
prompt_hash: "sha256:a1b2c3d4...",
response_length: 1423,
user_id: "user-redacted",
endpoint: "/api/summarize",
error: null
};
Every field is present on every log entry — even if null. This makes downstream parsing predictable and prevents schema mismatches in your log aggregator.
Implementing a Logging Wrapper
The core idea is to wrap your LLM client so that every call automatically captures metadata without the caller needing to think about it.
var { createHash } = require("crypto");
var { v4: uuidv4 } = require("uuid");
var winston = require("winston");
var logger = winston.createLogger({
level: process.env.LOG_LEVEL || "info",
format: winston.format.combine(
winston.format.timestamp(),
winston.format.json()
),
defaultMeta: { service: "llm-gateway" },
transports: [
new winston.transports.Console(),
new winston.transports.File({ filename: "logs/llm-calls.log", maxsize: 52428800, maxFiles: 10 })
]
});
// Cost per 1K tokens (update as pricing changes)
var COST_TABLE = {
"gpt-4o": { input: 0.005, output: 0.015 },
"gpt-4o-mini": { input: 0.00015, output: 0.0006 },
"gpt-4-turbo": { input: 0.01, output: 0.03 },
"claude-3-5-sonnet": { input: 0.003, output: 0.015 }
};
function estimateCost(model, promptTokens, completionTokens) {
var pricing = COST_TABLE[model];
if (!pricing) return null;
return ((promptTokens / 1000) * pricing.input) + ((completionTokens / 1000) * pricing.output);
}
function hashPrompt(prompt) {
return "sha256:" + createHash("sha256").update(prompt).digest("hex").substring(0, 16);
}
function createLLMLogger(options) {
var openai = options.client;
var defaultModel = options.model || "gpt-4o";
function chatCompletion(params, context) {
var correlationId = (context && context.correlationId) || uuidv4();
var startTime = Date.now();
var model = params.model || defaultModel;
var promptText = JSON.stringify(params.messages);
var promptHash = hashPrompt(promptText);
logger.debug("llm.request.start", {
correlation_id: correlationId,
model: model,
prompt_hash: promptHash,
prompt_tokens_estimate: Math.ceil(promptText.length / 4),
temperature: params.temperature,
max_tokens: params.max_tokens
});
return openai.chat.completions.create(params)
.then(function(response) {
var latencyMs = Date.now() - startTime;
var usage = response.usage || {};
var cost = estimateCost(model, usage.prompt_tokens, usage.completion_tokens);
var responseText = response.choices[0].message.content || "";
logger.info("llm.completion.success", {
correlation_id: correlationId,
event: "llm.completion",
model: model,
provider: "openai",
prompt_tokens: usage.prompt_tokens,
completion_tokens: usage.completion_tokens,
total_tokens: usage.total_tokens,
latency_ms: latencyMs,
estimated_cost_usd: cost,
temperature: params.temperature,
max_tokens: params.max_tokens,
status: "success",
prompt_hash: promptHash,
response_length: responseText.length,
finish_reason: response.choices[0].finish_reason
});
return response;
})
.catch(function(error) {
var latencyMs = Date.now() - startTime;
var status = "error";
if (error.status === 429) status = "rate_limited";
if (error.code === "ETIMEDOUT" || error.code === "ECONNABORTED") status = "timeout";
logger.error("llm.completion.error", {
correlation_id: correlationId,
event: "llm.completion",
model: model,
provider: "openai",
latency_ms: latencyMs,
status: status,
error_message: error.message,
error_code: error.status || error.code,
prompt_hash: promptHash
});
throw error;
});
}
return {
chatCompletion: chatCompletion
};
}
module.exports = { createLLMLogger: createLLMLogger, logger: logger };
This wrapper gives you consistent, structured logs for every LLM call without polluting your application code.
Log Levels for AI Operations
I use a specific log level strategy for LLM operations that differs from typical web application logging:
- DEBUG: Full prompt and response content. Never enable in production unless actively debugging. These logs are enormous and may contain sensitive data.
- INFO: Every completed LLM call with metadata (tokens, latency, cost, model). This is your primary operational log.
- WARN: Degraded behavior — retries triggered, fallback models used, rate limit approached (80% of quota), responses truncated by max_tokens.
- ERROR: Failed calls — API errors, timeouts, invalid responses, content filter blocks.
// Debug level: full prompt (development only)
logger.debug("llm.prompt.full", {
correlation_id: correlationId,
messages: params.messages // NEVER in production
});
// Info level: standard completion log
logger.info("llm.completion.success", {
correlation_id: correlationId,
model: model,
total_tokens: usage.total_tokens,
latency_ms: latencyMs
});
// Warn level: degradation signals
logger.warn("llm.fallback.triggered", {
correlation_id: correlationId,
original_model: "gpt-4o",
fallback_model: "gpt-4o-mini",
reason: "rate_limit_approaching",
quota_usage_pct: 85
});
// Error level: failures
logger.error("llm.completion.error", {
correlation_id: correlationId,
error_message: "Request too large for gpt-4o-mini",
error_code: 400,
prompt_tokens_estimate: 142000
});
PII Handling in LLM Logs
This is the part most teams get wrong. Logging raw prompts means logging user data — names, emails, addresses, medical information, whatever your users type into your product. You need a redaction layer between the raw input and the log output.
var PII_PATTERNS = [
{ pattern: /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/g, replacement: "[EMAIL_REDACTED]" },
{ pattern: /\b\d{3}[-.]?\d{3}[-.]?\d{4}\b/g, replacement: "[PHONE_REDACTED]" },
{ pattern: /\b\d{3}-\d{2}-\d{4}\b/g, replacement: "[SSN_REDACTED]" },
{ pattern: /\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b/g, replacement: "[CARD_REDACTED]" },
{ pattern: /\b\d{1,5}\s+\w+\s+(street|st|avenue|ave|road|rd|boulevard|blvd|drive|dr|lane|ln)\b/gi, replacement: "[ADDRESS_REDACTED]" }
];
function redactPII(text) {
if (typeof text !== "string") return text;
var redacted = text;
PII_PATTERNS.forEach(function(rule) {
redacted = redacted.replace(rule.pattern, rule.replacement);
});
return redacted;
}
function redactMessages(messages) {
return messages.map(function(msg) {
return {
role: msg.role,
content: redactPII(msg.content)
};
});
}
// Usage: log redacted prompt at debug level
logger.debug("llm.prompt.redacted", {
correlation_id: correlationId,
messages: redactMessages(params.messages)
});
This is a baseline. For regulated industries (healthcare, finance), you should also consider tokenizing user identifiers and storing the mapping in a separate, access-controlled system.
Request Correlation IDs
Correlation IDs tie a single user request to every downstream operation — database queries, cache lookups, LLM calls, and external API calls. Without them, debugging a production issue involving an LLM call is nearly impossible.
var { v4: uuidv4 } = require("uuid");
function correlationMiddleware(req, res, next) {
req.correlationId = req.headers["x-correlation-id"] || uuidv4();
res.setHeader("x-correlation-id", req.correlationId);
next();
}
// Pass correlation ID to every LLM call
app.post("/api/summarize", function(req, res) {
var context = { correlationId: req.correlationId };
llmClient.chatCompletion({
model: "gpt-4o",
messages: [
{ role: "system", content: "Summarize the following text." },
{ role: "user", content: req.body.text }
],
temperature: 0.3,
max_tokens: 512
}, context)
.then(function(response) {
res.json({ summary: response.choices[0].message.content });
})
.catch(function(error) {
res.status(500).json({ error: "Summarization failed" });
});
});
When a user reports an issue, you look up the correlation ID from the response headers, and you can trace the entire lifecycle of that request across every service and log entry.
Log Storage Strategies
You have two main options, and I recommend using both:
PostgreSQL for Queryable Logs
Store structured LLM logs in PostgreSQL when you need to query, aggregate, and report on them. This is essential for cost tracking, model performance analysis, and compliance audits.
CREATE TABLE llm_logs (
id SERIAL PRIMARY KEY,
correlation_id VARCHAR(64) NOT NULL,
trace_id VARCHAR(64),
timestamp TIMESTAMPTZ DEFAULT NOW(),
level VARCHAR(10) NOT NULL,
event VARCHAR(64) NOT NULL,
model VARCHAR(64),
provider VARCHAR(32),
prompt_tokens INTEGER,
completion_tokens INTEGER,
total_tokens INTEGER,
latency_ms INTEGER,
estimated_cost_usd NUMERIC(10, 6),
temperature NUMERIC(3, 2),
max_tokens INTEGER,
status VARCHAR(20),
error_message TEXT,
prompt_hash VARCHAR(80),
response_length INTEGER,
finish_reason VARCHAR(20),
user_id VARCHAR(64),
endpoint VARCHAR(128),
metadata JSONB DEFAULT '{}'
);
CREATE INDEX idx_llm_logs_correlation ON llm_logs(correlation_id);
CREATE INDEX idx_llm_logs_timestamp ON llm_logs(timestamp);
CREATE INDEX idx_llm_logs_model ON llm_logs(model);
CREATE INDEX idx_llm_logs_status ON llm_logs(status);
CREATE INDEX idx_llm_logs_user ON llm_logs(user_id);
CREATE INDEX idx_llm_logs_cost ON llm_logs(estimated_cost_usd);
var { Pool } = require("pg");
var pool = new Pool({
connectionString: process.env.POSTGRES_CONNECTION_STRING
});
function storeLLMLog(entry) {
var sql = "INSERT INTO llm_logs (correlation_id, trace_id, level, event, model, provider, " +
"prompt_tokens, completion_tokens, total_tokens, latency_ms, estimated_cost_usd, " +
"temperature, max_tokens, status, error_message, prompt_hash, response_length, " +
"finish_reason, user_id, endpoint, metadata) " +
"VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16,$17,$18,$19,$20,$21)";
var values = [
entry.correlation_id, entry.trace_id, entry.level, entry.event,
entry.model, entry.provider, entry.prompt_tokens, entry.completion_tokens,
entry.total_tokens, entry.latency_ms, entry.estimated_cost_usd,
entry.temperature, entry.max_tokens, entry.status, entry.error_message,
entry.prompt_hash, entry.response_length, entry.finish_reason,
entry.user_id, entry.endpoint, JSON.stringify(entry.metadata || {})
];
return pool.query(sql, values).catch(function(err) {
// Log storage should never crash the application
console.error("Failed to store LLM log:", err.message);
});
}
File Rotation for Volume
For high-volume environments, also write to rotating log files. Winston handles this natively:
var winston = require("winston");
require("winston-daily-rotate-file");
var rotatingTransport = new winston.transports.DailyRotateFile({
filename: "logs/llm-%DATE%.log",
datePattern: "YYYY-MM-DD",
maxSize: "100m",
maxFiles: "30d",
zippedArchive: true
});
Use file logs as your safety net. If PostgreSQL goes down, you still have the file logs. If a query is too expensive to run on the database, you can grep the files.
Distributed Tracing with OpenTelemetry
Structured logs tell you what happened. Traces tell you how long each step took and how they relate to each other. For LLM calls that involve prompt construction, retrieval-augmented generation, and post-processing, tracing is invaluable.
var opentelemetry = require("@opentelemetry/api");
var tracer = opentelemetry.trace.getTracer("llm-gateway");
function tracedChatCompletion(llmClient, params, context) {
return tracer.startActiveSpan("llm.chat.completion", function(span) {
var correlationId = (context && context.correlationId) || "unknown";
span.setAttribute("llm.model", params.model);
span.setAttribute("llm.temperature", params.temperature || 1.0);
span.setAttribute("llm.max_tokens", params.max_tokens || 0);
span.setAttribute("correlation.id", correlationId);
return llmClient.chatCompletion(params, context)
.then(function(response) {
var usage = response.usage || {};
span.setAttribute("llm.prompt_tokens", usage.prompt_tokens);
span.setAttribute("llm.completion_tokens", usage.completion_tokens);
span.setAttribute("llm.total_tokens", usage.total_tokens);
span.setAttribute("llm.finish_reason", response.choices[0].finish_reason);
span.setStatus({ code: opentelemetry.SpanStatusCode.OK });
span.end();
return response;
})
.catch(function(error) {
span.setStatus({
code: opentelemetry.SpanStatusCode.ERROR,
message: error.message
});
span.recordException(error);
span.end();
throw error;
});
});
}
For a RAG pipeline, create child spans for each stage:
function ragPipeline(query, context) {
return tracer.startActiveSpan("rag.pipeline", function(parentSpan) {
return tracer.startActiveSpan("rag.embed_query", function(embedSpan) {
return embedQuery(query).then(function(embedding) {
embedSpan.end();
return tracer.startActiveSpan("rag.vector_search", function(searchSpan) {
return vectorSearch(embedding).then(function(documents) {
searchSpan.setAttribute("rag.documents_found", documents.length);
searchSpan.end();
return tracer.startActiveSpan("rag.llm_completion", function(llmSpan) {
var prompt = buildPromptWithContext(query, documents);
return tracedChatCompletion(llmClient, {
model: "gpt-4o",
messages: prompt,
temperature: 0.3
}, context).then(function(response) {
llmSpan.end();
parentSpan.end();
return response;
});
});
});
});
});
});
});
}
This gives you a waterfall view in Jaeger or your tracing backend: you can see that the embedding took 120ms, the vector search took 45ms, and the LLM call took 2.8 seconds — and the LLM call is the bottleneck you need to optimize.
Building Log-Based Dashboards
With structured logs in PostgreSQL, building operational dashboards is straightforward SQL:
-- Daily cost by model
SELECT
DATE(timestamp) AS day,
model,
COUNT(*) AS call_count,
SUM(estimated_cost_usd) AS total_cost,
AVG(latency_ms) AS avg_latency,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY latency_ms) AS p95_latency,
SUM(total_tokens) AS total_tokens
FROM llm_logs
WHERE timestamp > NOW() - INTERVAL '30 days'
GROUP BY DATE(timestamp), model
ORDER BY day DESC, total_cost DESC;
-- Error rate by endpoint over the last 24 hours
SELECT
endpoint,
COUNT(*) AS total_calls,
COUNT(*) FILTER (WHERE status = 'error') AS errors,
ROUND(100.0 * COUNT(*) FILTER (WHERE status = 'error') / COUNT(*), 2) AS error_rate_pct
FROM llm_logs
WHERE timestamp > NOW() - INTERVAL '24 hours'
GROUP BY endpoint
ORDER BY error_rate_pct DESC;
-- Hourly token consumption
SELECT
DATE_TRUNC('hour', timestamp) AS hour,
SUM(prompt_tokens) AS total_prompt_tokens,
SUM(completion_tokens) AS total_completion_tokens,
SUM(estimated_cost_usd) AS hourly_cost
FROM llm_logs
WHERE timestamp > NOW() - INTERVAL '7 days'
GROUP BY DATE_TRUNC('hour', timestamp)
ORDER BY hour DESC;
Feed these queries into Grafana, Metabase, or even a simple Express endpoint that returns the data as JSON. The important thing is that someone on your team is looking at these numbers daily.
Log-Based Alerting
Set up alerts for the conditions that actually matter in LLM operations:
// Check for anomalies on a schedule (e.g., every 5 minutes via node-cron)
var cron = require("node-cron");
function checkAlerts() {
// Error rate alert
var errorQuery = "SELECT COUNT(*) FILTER (WHERE status = 'error') AS errors, " +
"COUNT(*) AS total FROM llm_logs WHERE timestamp > NOW() - INTERVAL '5 minutes'";
pool.query(errorQuery).then(function(result) {
var row = result.rows[0];
if (row.total > 10 && (row.errors / row.total) > 0.1) {
sendAlert("LLM Error Rate Alert", "Error rate is " +
Math.round((row.errors / row.total) * 100) + "% in the last 5 minutes. " +
row.errors + " errors out of " + row.total + " calls.");
}
});
// Latency spike alert
var latencyQuery = "SELECT AVG(latency_ms) AS avg_latency, " +
"PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY latency_ms) AS p95 " +
"FROM llm_logs WHERE timestamp > NOW() - INTERVAL '5 minutes'";
pool.query(latencyQuery).then(function(result) {
var row = result.rows[0];
if (row.p95 > 10000) {
sendAlert("LLM Latency Alert", "P95 latency is " + row.p95 + "ms in the last 5 minutes.");
}
});
// Cost anomaly alert
var costQuery = "SELECT SUM(estimated_cost_usd) AS cost_5min FROM llm_logs " +
"WHERE timestamp > NOW() - INTERVAL '5 minutes'";
pool.query(costQuery).then(function(result) {
var cost = result.rows[0].cost_5min || 0;
var threshold = parseFloat(process.env.LLM_COST_ALERT_THRESHOLD || "5.00");
if (cost > threshold) {
sendAlert("LLM Cost Alert", "Spent $" + cost.toFixed(2) + " in the last 5 minutes. " +
"Threshold: $" + threshold.toFixed(2));
}
});
}
function sendAlert(subject, message) {
// Replace with your alerting mechanism: Slack webhook, PagerDuty, email, etc.
logger.error("ALERT: " + subject, { alert: true, message: message });
}
cron.schedule("*/5 * * * *", checkAlerts);
The three alerts I never skip: error rate above 10%, P95 latency above 10 seconds, and cost exceeding the 5-minute budget. Everything else is secondary.
Searching and Filtering LLM Logs
Build a search API so your team can debug LLM issues without direct database access:
function searchLogs(filters) {
var conditions = ["1=1"];
var values = [];
var paramIndex = 1;
if (filters.correlation_id) {
conditions.push("correlation_id = $" + paramIndex++);
values.push(filters.correlation_id);
}
if (filters.model) {
conditions.push("model = $" + paramIndex++);
values.push(filters.model);
}
if (filters.status) {
conditions.push("status = $" + paramIndex++);
values.push(filters.status);
}
if (filters.min_latency) {
conditions.push("latency_ms >= $" + paramIndex++);
values.push(parseInt(filters.min_latency));
}
if (filters.min_cost) {
conditions.push("estimated_cost_usd >= $" + paramIndex++);
values.push(parseFloat(filters.min_cost));
}
if (filters.start_date) {
conditions.push("timestamp >= $" + paramIndex++);
values.push(filters.start_date);
}
if (filters.end_date) {
conditions.push("timestamp <= $" + paramIndex++);
values.push(filters.end_date);
}
if (filters.endpoint) {
conditions.push("endpoint = $" + paramIndex++);
values.push(filters.endpoint);
}
var sql = "SELECT * FROM llm_logs WHERE " + conditions.join(" AND ") +
" ORDER BY timestamp DESC LIMIT " + (parseInt(filters.limit) || 100);
return pool.query(sql, values);
}
Log Retention Policies and Cost Management
LLM logs are large. A single entry with full metadata runs 500-1000 bytes. At 10,000 LLM calls per day, that is 5-10 MB daily. Manageable. At 1 million calls per day, it is 500 MB to 1 GB daily. You need a retention strategy.
-- Partition by month for easy retention management
CREATE TABLE llm_logs_partitioned (
LIKE llm_logs INCLUDING ALL
) PARTITION BY RANGE (timestamp);
CREATE TABLE llm_logs_2026_01 PARTITION OF llm_logs_partitioned
FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');
CREATE TABLE llm_logs_2026_02 PARTITION OF llm_logs_partitioned
FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');
-- Drop old partitions instead of DELETE (instant, no vacuum)
-- DROP TABLE llm_logs_2025_10;
My retention tiers:
- Hot (0-7 days): Full detail in PostgreSQL. Used for active debugging and real-time dashboards.
- Warm (7-90 days): Aggregated daily summaries in PostgreSQL. Individual logs moved to compressed files.
- Cold (90-365 days): Compressed log files on object storage (S3/DigitalOcean Spaces). Required for compliance.
- Archive (1+ year): Monthly cost summaries only. Delete raw logs per your data retention policy.
// Automated aggregation job — run daily
function aggregateDailyLogs() {
var sql = "INSERT INTO llm_logs_daily_summary " +
"(day, model, endpoint, call_count, error_count, total_tokens, total_cost, " +
"avg_latency, p95_latency, p99_latency) " +
"SELECT DATE(timestamp), model, endpoint, COUNT(*), " +
"COUNT(*) FILTER (WHERE status = 'error'), " +
"SUM(total_tokens), SUM(estimated_cost_usd), " +
"AVG(latency_ms), " +
"PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY latency_ms), " +
"PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY latency_ms) " +
"FROM llm_logs WHERE DATE(timestamp) = CURRENT_DATE - INTERVAL '1 day' " +
"GROUP BY DATE(timestamp), model, endpoint " +
"ON CONFLICT (day, model, endpoint) DO NOTHING";
return pool.query(sql);
}
Compliance Logging and Audit Trails
If your LLM makes decisions that affect users — content moderation, loan approvals, hiring recommendations — you need an audit trail that proves what the model saw, what it decided, and why.
function logAuditableDecision(params) {
var auditEntry = {
correlation_id: params.correlationId,
decision_type: params.decisionType, // "content_moderation", "risk_assessment"
model: params.model,
model_version: params.modelVersion, // Snapshot which version was used
input_hash: hashPrompt(JSON.stringify(params.input)),
decision: params.decision, // The actual output/decision
confidence: params.confidence,
prompt_template_version: params.templateVersion, // Track prompt changes
human_override: false,
timestamp: new Date().toISOString(),
retention_days: 2555 // 7 years for financial compliance
};
// Store in a separate, immutable audit table
var sql = "INSERT INTO llm_audit_log (correlation_id, decision_type, model, " +
"model_version, input_hash, decision, confidence, prompt_template_version, " +
"human_override, retention_until) VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9, " +
"NOW() + ($10 || ' days')::INTERVAL)";
return pool.query(sql, [
auditEntry.correlation_id, auditEntry.decision_type, auditEntry.model,
auditEntry.model_version, auditEntry.input_hash, JSON.stringify(auditEntry.decision),
auditEntry.confidence, auditEntry.prompt_template_version,
auditEntry.human_override, auditEntry.retention_days
]);
}
The key principle: log enough to reproduce the decision, but not so much that you violate data minimization requirements. Hash the inputs, store the outputs, track the model and prompt versions.
Exporting Logs to External Systems
Not every team runs their own ELK stack. Here are the most common export patterns:
// Export to Elasticsearch / OpenSearch
var { Client } = require("@elastic/elasticsearch");
var esClient = new Client({ node: process.env.ELASTICSEARCH_URL });
function exportToElasticsearch(logEntry) {
return esClient.index({
index: "llm-logs-" + new Date().toISOString().substring(0, 7),
body: logEntry
});
}
// Export to Datadog via HTTP API
var https = require("https");
function exportToDatadog(logEntry) {
var payload = JSON.stringify({
ddsource: "llm-gateway",
ddtags: "model:" + logEntry.model + ",status:" + logEntry.status,
hostname: require("os").hostname(),
message: JSON.stringify(logEntry),
service: "llm-gateway"
});
var options = {
hostname: "http-intake.logs.datadoghq.com",
path: "/api/v2/logs",
method: "POST",
headers: {
"Content-Type": "application/json",
"DD-API-KEY": process.env.DD_API_KEY
}
};
var req = https.request(options);
req.write(payload);
req.end();
}
// Winston transport for CloudWatch (via winston-cloudwatch)
var WinstonCloudWatch = require("winston-cloudwatch");
logger.add(new WinstonCloudWatch({
logGroupName: "llm-gateway",
logStreamName: function() {
var date = new Date().toISOString().split("T")[0];
return "llm-calls-" + date;
},
awsRegion: process.env.AWS_REGION || "us-east-1",
jsonMessage: true
}));
Complete Working Example
Here is a self-contained Node.js module that ties together everything discussed above — structured logging, PII redaction, correlation IDs, PostgreSQL storage, and a search API.
// llm-observability.js
var express = require("express");
var { Pool } = require("pg");
var { createHash } = require("crypto");
var { v4: uuidv4 } = require("uuid");
var winston = require("winston");
var OpenAI = require("openai");
// ---- Configuration ----
var pool = new Pool({ connectionString: process.env.POSTGRES_CONNECTION_STRING });
var openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
var COST_TABLE = {
"gpt-4o": { input: 0.005, output: 0.015 },
"gpt-4o-mini": { input: 0.00015, output: 0.0006 }
};
// ---- Logger Setup ----
var logger = winston.createLogger({
level: process.env.LOG_LEVEL || "info",
format: winston.format.combine(
winston.format.timestamp(),
winston.format.json()
),
defaultMeta: { service: "llm-gateway" },
transports: [
new winston.transports.Console(),
new winston.transports.File({
filename: "logs/llm-calls.log",
maxsize: 52428800,
maxFiles: 10
})
]
});
// ---- PII Redaction ----
var PII_PATTERNS = [
{ pattern: /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/g, replacement: "[EMAIL]" },
{ pattern: /\b\d{3}[-.]?\d{3}[-.]?\d{4}\b/g, replacement: "[PHONE]" },
{ pattern: /\b\d{3}-\d{2}-\d{4}\b/g, replacement: "[SSN]" },
{ pattern: /\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b/g, replacement: "[CARD]" }
];
function redactPII(text) {
if (typeof text !== "string") return text;
var result = text;
PII_PATTERNS.forEach(function(rule) {
result = result.replace(rule.pattern, rule.replacement);
});
return result;
}
// ---- Utilities ----
function hashPrompt(text) {
return "sha256:" + createHash("sha256").update(text).digest("hex").substring(0, 16);
}
function estimateCost(model, promptTokens, completionTokens) {
var pricing = COST_TABLE[model];
if (!pricing) return null;
return ((promptTokens / 1000) * pricing.input) + ((completionTokens / 1000) * pricing.output);
}
// ---- PostgreSQL Storage ----
function storeLLMLog(entry) {
var sql = "INSERT INTO llm_logs (correlation_id, trace_id, level, event, model, " +
"provider, prompt_tokens, completion_tokens, total_tokens, latency_ms, " +
"estimated_cost_usd, temperature, max_tokens, status, error_message, " +
"prompt_hash, response_length, finish_reason, user_id, endpoint, metadata) " +
"VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16,$17,$18,$19,$20,$21)";
var values = [
entry.correlation_id, entry.trace_id, entry.level, entry.event,
entry.model, entry.provider, entry.prompt_tokens, entry.completion_tokens,
entry.total_tokens, entry.latency_ms, entry.estimated_cost_usd,
entry.temperature, entry.max_tokens, entry.status, entry.error_message,
entry.prompt_hash, entry.response_length, entry.finish_reason,
entry.user_id, entry.endpoint, JSON.stringify(entry.metadata || {})
];
return pool.query(sql, values).catch(function(err) {
logger.error("log_storage_failure", { error: err.message });
});
}
// ---- LLM Wrapper ----
function createLLMClient(options) {
var defaultModel = (options && options.model) || "gpt-4o";
function chatCompletion(params, context) {
var correlationId = (context && context.correlationId) || uuidv4();
var endpoint = (context && context.endpoint) || "unknown";
var userId = (context && context.userId) || null;
var model = params.model || defaultModel;
var startTime = Date.now();
var promptText = JSON.stringify(params.messages);
var pHash = hashPrompt(promptText);
logger.debug("llm.request.start", {
correlation_id: correlationId,
model: model,
prompt_hash: pHash,
messages_redacted: params.messages.map(function(m) {
return { role: m.role, content: redactPII(m.content) };
})
});
return openai.chat.completions.create(params)
.then(function(response) {
var latencyMs = Date.now() - startTime;
var usage = response.usage || {};
var cost = estimateCost(model, usage.prompt_tokens, usage.completion_tokens);
var responseText = response.choices[0].message.content || "";
var logEntry = {
correlation_id: correlationId,
trace_id: (context && context.traceId) || null,
level: "info",
event: "llm.completion",
model: model,
provider: "openai",
prompt_tokens: usage.prompt_tokens,
completion_tokens: usage.completion_tokens,
total_tokens: usage.total_tokens,
latency_ms: latencyMs,
estimated_cost_usd: cost,
temperature: params.temperature || null,
max_tokens: params.max_tokens || null,
status: "success",
error_message: null,
prompt_hash: pHash,
response_length: responseText.length,
finish_reason: response.choices[0].finish_reason,
user_id: userId,
endpoint: endpoint,
metadata: {}
};
logger.info("llm.completion.success", logEntry);
storeLLMLog(logEntry);
return response;
})
.catch(function(error) {
var latencyMs = Date.now() - startTime;
var status = "error";
if (error.status === 429) status = "rate_limited";
if (error.code === "ETIMEDOUT") status = "timeout";
var logEntry = {
correlation_id: correlationId,
trace_id: (context && context.traceId) || null,
level: "error",
event: "llm.completion",
model: model,
provider: "openai",
prompt_tokens: null,
completion_tokens: null,
total_tokens: null,
latency_ms: latencyMs,
estimated_cost_usd: null,
temperature: params.temperature || null,
max_tokens: params.max_tokens || null,
status: status,
error_message: error.message,
prompt_hash: pHash,
response_length: null,
finish_reason: null,
user_id: userId,
endpoint: endpoint,
metadata: { error_code: error.status || error.code }
};
logger.error("llm.completion.error", logEntry);
storeLLMLog(logEntry);
throw error;
});
}
return { chatCompletion: chatCompletion };
}
// ---- Search API ----
function searchLogs(filters) {
var conditions = ["1=1"];
var values = [];
var idx = 1;
if (filters.correlation_id) { conditions.push("correlation_id = $" + idx++); values.push(filters.correlation_id); }
if (filters.model) { conditions.push("model = $" + idx++); values.push(filters.model); }
if (filters.status) { conditions.push("status = $" + idx++); values.push(filters.status); }
if (filters.min_latency) { conditions.push("latency_ms >= $" + idx++); values.push(parseInt(filters.min_latency)); }
if (filters.min_cost) { conditions.push("estimated_cost_usd >= $" + idx++); values.push(parseFloat(filters.min_cost)); }
if (filters.start_date) { conditions.push("timestamp >= $" + idx++); values.push(filters.start_date); }
if (filters.end_date) { conditions.push("timestamp <= $" + idx++); values.push(filters.end_date); }
if (filters.endpoint) { conditions.push("endpoint = $" + idx++); values.push(filters.endpoint); }
if (filters.user_id) { conditions.push("user_id = $" + idx++); values.push(filters.user_id); }
var limit = Math.min(parseInt(filters.limit) || 100, 1000);
var sql = "SELECT * FROM llm_logs WHERE " + conditions.join(" AND ") +
" ORDER BY timestamp DESC LIMIT " + limit;
return pool.query(sql, values);
}
// ---- Express Routes ----
var router = express.Router();
router.use(function(req, res, next) {
req.correlationId = req.headers["x-correlation-id"] || uuidv4();
res.setHeader("x-correlation-id", req.correlationId);
next();
});
// Log search endpoint
router.get("/logs/search", function(req, res) {
searchLogs(req.query)
.then(function(result) {
res.json({ count: result.rows.length, logs: result.rows });
})
.catch(function(err) {
res.status(500).json({ error: "Search failed", message: err.message });
});
});
// Log summary/dashboard endpoint
router.get("/logs/summary", function(req, res) {
var days = parseInt(req.query.days) || 7;
var sql = "SELECT DATE(timestamp) AS day, model, COUNT(*) AS calls, " +
"COUNT(*) FILTER (WHERE status = 'error') AS errors, " +
"SUM(total_tokens) AS tokens, " +
"ROUND(SUM(estimated_cost_usd)::numeric, 4) AS cost, " +
"ROUND(AVG(latency_ms)::numeric, 0) AS avg_latency_ms " +
"FROM llm_logs WHERE timestamp > NOW() - ($1 || ' days')::INTERVAL " +
"GROUP BY DATE(timestamp), model ORDER BY day DESC, cost DESC";
pool.query(sql, [days.toString()])
.then(function(result) {
res.json({ days: days, summary: result.rows });
})
.catch(function(err) {
res.status(500).json({ error: "Summary failed", message: err.message });
});
});
module.exports = {
createLLMClient: createLLMClient,
searchLogs: searchLogs,
storeLLMLog: storeLLMLog,
redactPII: redactPII,
router: router,
logger: logger
};
Usage in your application:
// app.js
var express = require("express");
var observability = require("./llm-observability");
var app = express();
app.use(express.json());
// Mount the log search/dashboard API
app.use("/llm", observability.router);
// Create an instrumented LLM client
var llm = observability.createLLMClient({ model: "gpt-4o" });
// Use it in your routes
app.post("/api/summarize", function(req, res) {
llm.chatCompletion({
model: "gpt-4o-mini",
messages: [
{ role: "system", content: "Summarize the following text concisely." },
{ role: "user", content: req.body.text }
],
temperature: 0.3,
max_tokens: 512
}, {
correlationId: req.correlationId,
endpoint: "/api/summarize",
userId: req.user && req.user.id
})
.then(function(response) {
res.json({ summary: response.choices[0].message.content });
})
.catch(function(error) {
res.status(500).json({ error: "Summarization failed" });
});
});
app.listen(process.env.PORT || 3000);
Every call to llm.chatCompletion now automatically logs structured JSON, stores to PostgreSQL, redacts PII in debug logs, and includes the correlation ID for end-to-end tracing.
Common Issues and Troubleshooting
1. Log Storage Fails Silently
Error: connect ECONNREFUSED 127.0.0.1:5432
If your PostgreSQL connection drops, the storeLLMLog function catches the error and logs it to the console, but the LLM call still succeeds. This is by design — log storage should never break your application. However, you will lose log data during the outage. Solution: add a memory buffer that retries failed inserts when the connection recovers, or write to a local file as a fallback.
2. Winston JSON Circular Reference Error
TypeError: Converting circular structure to JSON
This happens when you accidentally log the full OpenAI response object, which contains circular references. Always extract the specific fields you need before logging. Never pass response directly to logger.info().
3. PII Redaction Misses in Unstructured Input
// User input: "My name is John Smith and I live at [email protected]"
// After redaction: "My name is John Smith and I live at [EMAIL]"
Regex-based PII redaction catches structured patterns (emails, phone numbers, SSNs) but misses names, free-form addresses, and context-dependent sensitive data. For higher-confidence PII detection, integrate a dedicated PII detection service like AWS Comprehend or Microsoft Presidio as an additional redaction layer.
4. Log Volume Causes Disk Pressure
Error: ENOSPC: no space left on device, write
LLM logs at high volume can fill disks fast, especially if you are logging at DEBUG level with full prompt content. Set maxsize and maxFiles on your Winston file transport, use log rotation, and never run DEBUG level in production for more than a few hours during active debugging sessions.
5. Correlation ID Missing in Async Chains
{ "correlation_id": "unknown", "model": "gpt-4o", "status": "success" }
If your correlation ID shows up as "unknown", the context object is not being threaded through async calls properly. Always pass the context explicitly rather than relying on thread-local or global state. In Node.js, use AsyncLocalStorage if you want automatic propagation without passing context manually.
Best Practices
Log every LLM call, no exceptions. Even cache hits should be logged so you can track cache effectiveness and total request volume accurately.
Never log raw prompts in production. Use prompt hashing for deduplication and redacted versions for debugging. Store raw prompts only in development environments with test data.
Include cost estimates on every log entry. When your monthly LLM bill spikes, you need to identify the endpoint, user, or feature responsible within minutes, not days.
Set up P95 latency alerts, not just average latency. LLM call latency has a long tail. Average latency can look fine while 5% of your users experience 15-second waits.
Use separate tables for operational logs and audit logs. Operational logs have short retention and high volume. Audit logs have long retention and legal requirements. Mixing them makes retention policies impossible to enforce cleanly.
Version your prompt templates and log the version. When a prompt change causes a regression in output quality, you need to know which version of the prompt was active for each logged call.
Buffer log writes and batch insert to PostgreSQL. Individual INSERT statements for every LLM call add unnecessary database load. Buffer entries and flush every few seconds or every N entries.
Test your PII redaction regularly. Add unit tests that verify redaction patterns against real-world examples. PII patterns evolve — international phone numbers, new email TLDs, and local address formats all require pattern updates.
Make your log search API read-only and authenticated. LLM logs contain business-critical information about your AI operations. Restrict access to your engineering and ops teams.
References
- Winston Logging Library — Structured logging for Node.js
- OpenTelemetry JavaScript SDK — Distributed tracing instrumentation
- OpenAI Node.js SDK — Official OpenAI client
- PostgreSQL JSONB Documentation — Flexible metadata storage
- OWASP Logging Cheat Sheet — Security considerations for logging
- node-cron — Scheduled task execution for alert checks
- winston-daily-rotate-file — Log rotation transport