Llm Apis

Anthropic Claude API: Complete Developer Guide

Comprehensive guide to the Anthropic Claude API covering messages, streaming, tool use, vision, and production patterns for Node.js.

Anthropic Claude API: Complete Developer Guide

The Anthropic Claude API gives you programmatic access to one of the most capable LLM families available today. Whether you are building a conversational assistant, a document analysis pipeline, or an agentic tool-calling system, Claude's Messages API is the foundation you will work against. This guide covers everything you need to go from first API call to production-grade Node.js integration, including streaming, tool use, vision, prompt caching, and the batch API.

Prerequisites

  • Node.js 18 or later installed
  • An Anthropic API key (sign up at console.anthropic.com)
  • Basic familiarity with async/await patterns in JavaScript
  • npm or yarn for package management

Install the official SDK:

npm install @anthropic-ai/sdk

Getting Started with the Anthropic SDK

The Anthropic Node.js SDK is the officially supported client. It handles authentication, request construction, streaming, retries, and typed responses out of the box. At the time of writing, version 0.39.x is current. Pin your major version in production since the SDK follows semver.

var Anthropic = require("@anthropic-ai/sdk");

var client = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY
});

async function main() {
  var message = await client.messages.create({
    model: "claude-sonnet-4-20250514",
    max_tokens: 1024,
    messages: [
      { role: "user", content: "What is the capital of France?" }
    ]
  });

  console.log(message.content[0].text);
  console.log("Input tokens:", message.usage.input_tokens);
  console.log("Output tokens:", message.usage.output_tokens);
}

main();

Output:

The capital of France is Paris.
Input tokens: 14
Output tokens: 8

The SDK constructor accepts several useful options beyond apiKey:

var client = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
  timeout: 60000,        // 60-second request timeout
  maxRetries: 3,         // automatic retries on transient failures
  defaultHeaders: {
    "anthropic-beta": "prompt-caching-2024-07-31"
  }
});

Authentication and API Key Management

Anthropic uses bearer token authentication. Every request includes your API key in the x-api-key header. The SDK handles this for you, but understanding the mechanism matters for debugging.

Never hardcode your API key. The SDK automatically reads ANTHROPIC_API_KEY from the environment, so you can omit the apiKey parameter entirely:

// Reads ANTHROPIC_API_KEY from environment automatically
var client = new Anthropic();

For applications that manage multiple API keys or need runtime configuration:

var Anthropic = require("@anthropic-ai/sdk");

function createClient(apiKey) {
  if (!apiKey) {
    throw new Error("Anthropic API key is required");
  }
  return new Anthropic({ apiKey: apiKey });
}

// Per-tenant key management
var tenantKeys = {
  "tenant-a": process.env.ANTHROPIC_KEY_TENANT_A,
  "tenant-b": process.env.ANTHROPIC_KEY_TENANT_B
};

var clientA = createClient(tenantKeys["tenant-a"]);
var clientB = createClient(tenantKeys["tenant-b"]);

In production, store your API key in your cloud provider's secrets manager:

# DigitalOcean App Platform
doctl apps update $APP_ID --spec app.yaml
# where app.yaml contains encrypted env vars

# AWS
aws secretsmanager get-secret-value --secret-id anthropic-api-key

# Docker
docker run -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY myapp

API keys are scoped to your organization. You can create multiple keys in the Anthropic Console and rotate them without downtime by supporting two active keys during rotation windows. A .env file with dotenv is fine for local development but not for production.

Messages API Fundamentals

Every interaction with Claude goes through the Messages API. You send an array of messages with alternating user and assistant roles, and Claude returns a response.

System Prompts

System prompts set the behavior, personality, and constraints for Claude. They go in a dedicated system parameter, not inside the messages array:

var response = await client.messages.create({
  model: "claude-sonnet-4-20250514",
  max_tokens: 2048,
  system: "You are a senior Node.js engineer. Answer questions with practical, production-ready code examples. Be concise and opinionated. Prefer CommonJS require() syntax.",
  messages: [
    { role: "user", content: "How should I structure error handling in Express?" }
  ]
});

System prompts can also be an array of content blocks, which becomes important when using prompt caching (covered later):

system: [
  {
    type: "text",
    text: "You are a senior Node.js engineer...",
    cache_control: { type: "ephemeral" }
  }
]

System prompts consume input tokens on every request but are extremely effective for controlling output quality. Use them to define persona, output format, constraints, and domain context.

Multi-Turn Conversations

Claude is stateless. Every request must include the full conversation history. The SDK does not maintain state for you -- this is by design, because it gives you full control over context management.

var Anthropic = require("@anthropic-ai/sdk");

var client = new Anthropic();
var conversationHistory = [];

async function chat(userMessage) {
  conversationHistory.push({
    role: "user",
    content: userMessage
  });

  var response = await client.messages.create({
    model: "claude-sonnet-4-20250514",
    max_tokens: 2048,
    system: "You are a helpful coding assistant.",
    messages: conversationHistory
  });

  var assistantMessage = response.content[0].text;

  conversationHistory.push({
    role: "assistant",
    content: assistantMessage
  });

  return assistantMessage;
}

async function main() {
  var reply1 = await chat("My name is Shane.");
  console.log("Claude:", reply1);

  var reply2 = await chat("What is my name?");
  console.log("Claude:", reply2);
  // Claude: Your name is Shane.
}

main();

Messages must strictly alternate between user and assistant roles. The first message must always be from the user. If you need to prepopulate an assistant response (for example, to force a specific output format), you can include an assistant message followed by another user message.

Response Structure

Every response includes metadata you should inspect:

var response = await client.messages.create({
  model: "claude-sonnet-4-20250514",
  max_tokens: 1024,
  messages: [{ role: "user", content: "Hello" }]
});

console.log("Stop reason:", response.stop_reason);   // "end_turn" | "max_tokens" | "tool_use"
console.log("Input tokens:", response.usage.input_tokens);
console.log("Output tokens:", response.usage.output_tokens);
console.log("Content:", response.content);

The stop_reason field tells you why Claude stopped generating. If it is max_tokens, the response was truncated and you may need to continue. If it is tool_use, Claude wants you to execute a tool and send back results. Only end_turn means Claude is genuinely done.

Streaming Responses

For user-facing applications, streaming delivers a dramatically better experience. Instead of waiting seconds for a complete response, tokens appear as they are generated -- typically within 200-300ms of the request.

var Anthropic = require("@anthropic-ai/sdk");

var client = new Anthropic();

async function streamResponse(prompt) {
  var stream = await client.messages.stream({
    model: "claude-sonnet-4-20250514",
    max_tokens: 2048,
    messages: [{ role: "user", content: prompt }]
  });

  var fullText = "";

  stream.on("text", function(text) {
    process.stdout.write(text);
    fullText += text;
  });

  var finalMessage = await stream.finalMessage();

  console.log("\n\n--- Stream Complete ---");
  console.log("Total tokens:", finalMessage.usage.input_tokens + finalMessage.usage.output_tokens);

  return fullText;
}

streamResponse("Explain the event loop in Node.js in 3 paragraphs.");

The .stream() method returns a MessageStream object that emits several event types:

var stream = await client.messages.stream({
  model: "claude-sonnet-4-20250514",
  max_tokens: 1024,
  messages: [{ role: "user", content: "Hello" }]
});

stream.on("message", function(message) {
  // Fired once when the message starts
  console.log("Message started, id:", message.id);
});

stream.on("contentBlock", function(block) {
  // Fired for each content block (text, tool_use, etc.)
  console.log("Content block type:", block.type);
});

stream.on("text", function(text) {
  // Fired for each text chunk -- this is what you want for UI streaming
  process.stdout.write(text);
});

stream.on("error", function(err) {
  console.error("Stream error:", err.message);
});

stream.on("end", function() {
  console.log("Stream complete");
});

For Express/HTTP streaming to a browser with Server-Sent Events:

var express = require("express");
var Anthropic = require("@anthropic-ai/sdk");
var app = express();
var client = new Anthropic();

app.use(express.json());

app.post("/api/chat", async function(req, res) {
  res.setHeader("Content-Type", "text/event-stream");
  res.setHeader("Cache-Control", "no-cache");
  res.setHeader("Connection", "keep-alive");

  try {
    var stream = await client.messages.stream({
      model: "claude-sonnet-4-20250514",
      max_tokens: 2048,
      messages: req.body.messages
    });

    stream.on("text", function(text) {
      res.write("data: " + JSON.stringify({ text: text }) + "\n\n");
    });

    stream.on("end", function() {
      res.write("data: [DONE]\n\n");
      res.end();
    });

    stream.on("error", function(err) {
      res.write("data: " + JSON.stringify({ error: err.message }) + "\n\n");
      res.end();
    });
  } catch (err) {
    res.status(500).json({ error: err.message });
  }
});

Tool Use (Function Calling)

Tool use is where Claude becomes genuinely powerful for building agents. You define tools with JSON Schema descriptions, and Claude decides when to call them and what arguments to pass. Your application executes the tool and returns results, and Claude incorporates those results into its response.

Defining Tools

var tools = [
  {
    name: "get_weather",
    description: "Get the current weather for a location. Use this when the user asks about weather conditions.",
    input_schema: {
      type: "object",
      properties: {
        location: {
          type: "string",
          description: "City and state/country, e.g. 'San Francisco, CA'"
        },
        units: {
          type: "string",
          enum: ["fahrenheit", "celsius"],
          description: "Temperature units"
        }
      },
      required: ["location"]
    }
  },
  {
    name: "search_database",
    description: "Search the product database by query string. Returns matching products with prices.",
    input_schema: {
      type: "object",
      properties: {
        query: { type: "string", description: "Search query" },
        limit: { type: "integer", description: "Max results to return", default: 10 }
      },
      required: ["query"]
    }
  }
];

Handling Tool Calls

When Claude wants to use a tool, it returns a response with stop_reason: "tool_use" and a tool_use content block. You execute the tool, then send the result back:

var Anthropic = require("@anthropic-ai/sdk");
var client = new Anthropic();

// Your actual tool implementations
var toolHandlers = {
  get_weather: function(input) {
    // In production, call a real weather API
    return {
      temperature: 68,
      condition: "partly cloudy",
      humidity: 55,
      location: input.location
    };
  },
  search_database: function(input) {
    // In production, query your database
    return {
      results: [
        { name: "Widget Pro", price: 29.99 },
        { name: "Widget Basic", price: 9.99 }
      ],
      total: 2
    };
  }
};

async function chatWithTools(userMessage) {
  var messages = [{ role: "user", content: userMessage }];

  while (true) {
    var response = await client.messages.create({
      model: "claude-sonnet-4-20250514",
      max_tokens: 4096,
      tools: tools,
      messages: messages
    });

    // If Claude responds with text only, we are done
    if (response.stop_reason === "end_turn") {
      var textBlock = response.content.find(function(block) {
        return block.type === "text";
      });
      return textBlock ? textBlock.text : "";
    }

    // If Claude wants to use tools, execute them
    if (response.stop_reason === "tool_use") {
      // Add Claude's response (which includes tool_use blocks) to history
      messages.push({ role: "assistant", content: response.content });

      // Execute each tool call and collect results
      var toolResults = [];
      response.content.forEach(function(block) {
        if (block.type === "tool_use") {
          console.log("Calling tool:", block.name, "with:", JSON.stringify(block.input));
          var handler = toolHandlers[block.name];
          var result = handler ? handler(block.input) : { error: "Unknown tool" };
          toolResults.push({
            type: "tool_result",
            tool_use_id: block.id,
            content: JSON.stringify(result)
          });
        }
      });

      // Send tool results back to Claude
      messages.push({ role: "user", content: toolResults });
    }
  }
}

chatWithTools("What is the weather in Denver, CO?").then(function(answer) {
  console.log(answer);
});

Output:

Calling tool: get_weather with: {"location":"Denver, CO","units":"fahrenheit"}
The current weather in Denver, CO is 68°F and partly cloudy with 55% humidity.

Claude can call multiple tools in a single response and chain tool results across turns. The loop pattern is: send messages, check for tool_use blocks, execute tools, send results back, repeat until stop_reason is end_turn. This is the foundation for building sophisticated agents.

Vision Capabilities

Claude can process images passed as base64-encoded data or as URLs. This is useful for analyzing screenshots, reading documents, processing charts, interpreting diagrams, or any visual task.

Sending a Base64 Image

var Anthropic = require("@anthropic-ai/sdk");
var fs = require("fs");
var path = require("path");

var client = new Anthropic();

async function analyzeImage(imagePath, question) {
  var imageBuffer = fs.readFileSync(imagePath);
  var base64Image = imageBuffer.toString("base64");

  var ext = path.extname(imagePath).toLowerCase();
  var mediaTypes = {
    ".jpg": "image/jpeg",
    ".jpeg": "image/jpeg",
    ".png": "image/png",
    ".gif": "image/gif",
    ".webp": "image/webp"
  };
  var mediaType = mediaTypes[ext] || "image/jpeg";

  var response = await client.messages.create({
    model: "claude-sonnet-4-20250514",
    max_tokens: 2048,
    messages: [
      {
        role: "user",
        content: [
          {
            type: "image",
            source: {
              type: "base64",
              media_type: mediaType,
              data: base64Image
            }
          },
          {
            type: "text",
            text: question
          }
        ]
      }
    ]
  });

  return response.content[0].text;
}

analyzeImage("./screenshot.png", "What errors do you see in this terminal output?")
  .then(function(analysis) {
    console.log(analysis);
  });

Sending an Image URL

var response = await client.messages.create({
  model: "claude-sonnet-4-20250514",
  max_tokens: 2048,
  messages: [
    {
      role: "user",
      content: [
        {
          type: "image",
          source: {
            type: "url",
            url: "https://example.com/architecture-diagram.png"
          }
        },
        {
          type: "text",
          text: "Describe this system architecture diagram. What are the potential bottlenecks?"
        }
      ]
    }
  ]
});

You can send up to 20 images in a single request. Each image counts toward your input token usage based on its dimensions -- a typical 1024x1024 image costs roughly 1,600 tokens. Resize images before sending them if you are processing many. It saves real money at scale.

Model Selection

Anthropic offers three model tiers. Choosing the right one per task is the single most impactful cost optimization you can make.

Model Best For Speed Cost (Input/Output per MTok)
claude-opus-4-20250514 Complex reasoning, research, code generation, nuanced analysis Slowest $15 / $75
claude-sonnet-4-20250514 General purpose, balanced quality and speed Medium $3 / $15
claude-haiku-3-5-20241022 Classification, extraction, routing, high-volume simple tasks Fastest $0.80 / $4

My recommendations:

  • Use Haiku for classification, entity extraction, content moderation, routing decisions, and any task where the output is short and structured. At roughly $1/MTok input, you can process enormous volumes affordably.
  • Use Sonnet as your default for user-facing features, code assistance, content generation, and multi-step reasoning. It is the best balance of quality and cost.
  • Use Opus when quality is paramount -- complex code generation, detailed research, nuanced writing, or tasks where errors are expensive. Opus is roughly 5x the cost of Sonnet but meaningfully better on hard tasks.
// Route requests to the appropriate model based on task complexity
function selectModel(taskType) {
  var modelMap = {
    "classify": "claude-haiku-3-5-20241022",
    "extract": "claude-haiku-3-5-20241022",
    "moderate": "claude-haiku-3-5-20241022",
    "chat": "claude-sonnet-4-20250514",
    "summarize": "claude-sonnet-4-20250514",
    "generate_code": "claude-sonnet-4-20250514",
    "complex_analysis": "claude-opus-4-20250514",
    "research": "claude-opus-4-20250514"
  };
  return modelMap[taskType] || "claude-sonnet-4-20250514";
}

Managing Token Usage and Costs

Token counts directly determine your costs. The SDK returns usage data on every response:

var response = await client.messages.create({
  model: "claude-sonnet-4-20250514",
  max_tokens: 1024,
  messages: [{ role: "user", content: "Hello" }]
});

console.log("Input tokens:", response.usage.input_tokens);
console.log("Output tokens:", response.usage.output_tokens);
// Cache-aware fields (when using prompt caching):
// response.usage.cache_creation_input_tokens
// response.usage.cache_read_input_tokens

Build a simple token tracking layer:

var totalInputTokens = 0;
var totalOutputTokens = 0;
var requestCount = 0;

async function trackedRequest(params) {
  var response = await client.messages.create(params);
  totalInputTokens += response.usage.input_tokens;
  totalOutputTokens += response.usage.output_tokens;
  requestCount++;

  if (requestCount % 100 === 0) {
    var inputCost = (totalInputTokens / 1000000) * 3; // Sonnet pricing
    var outputCost = (totalOutputTokens / 1000000) * 15;
    console.log("Cost estimate after " + requestCount + " requests: $" +
      (inputCost + outputCost).toFixed(4));
  }

  return response;
}

Practical tips for reducing token usage:

  • Keep system prompts concise. Every token in the system prompt is charged on every request.
  • Trim conversation history. For long chats, summarize older messages rather than sending the full history.
  • Set max_tokens appropriately. Do not set it to the model maximum (200K for Opus/Sonnet) for every request. If you expect a 200-word response, set max_tokens: 512. This does not directly save money (you pay for actual tokens generated, not the limit), but it constrains Claude's response length.
  • Use prompt caching for repeated system prompts or large context documents.

Error Handling and Retry Patterns

The SDK throws typed errors that you can catch and handle specifically:

var Anthropic = require("@anthropic-ai/sdk");
var client = new Anthropic();

async function robustRequest(params, maxRetries) {
  maxRetries = maxRetries || 3;
  var attempt = 0;

  while (attempt < maxRetries) {
    try {
      var response = await client.messages.create(params);
      return response;
    } catch (err) {
      attempt++;

      if (err instanceof Anthropic.RateLimitError) {
        // 429 - Rate limited. Respect the retry-after header
        var retryAfter = err.headers && err.headers["retry-after"];
        var waitMs = retryAfter ? parseInt(retryAfter, 10) * 1000 : Math.pow(2, attempt) * 1000;
        console.log("Rate limited. Waiting " + waitMs + "ms before retry " + attempt);
        await sleep(waitMs);
        continue;
      }

      if (err instanceof Anthropic.APIConnectionError) {
        // Network error -- retry with backoff
        var backoff = Math.pow(2, attempt) * 1000;
        console.log("Connection error. Retrying in " + backoff + "ms");
        await sleep(backoff);
        continue;
      }

      if (err instanceof Anthropic.InternalServerError) {
        // 500 - Anthropic server error -- retry with backoff
        var backoff = Math.pow(2, attempt) * 1000;
        console.log("Server error. Retrying in " + backoff + "ms");
        await sleep(backoff);
        continue;
      }

      if (err instanceof Anthropic.AuthenticationError) {
        // 401 - Bad API key. Do not retry
        throw new Error("Invalid API key. Check your ANTHROPIC_API_KEY environment variable.");
      }

      if (err instanceof Anthropic.BadRequestError) {
        // 400 - Malformed request. Do not retry
        throw err;
      }

      // Unknown error -- do not retry
      throw err;
    }
  }

  throw new Error("Max retries (" + maxRetries + ") exceeded");
}

function sleep(ms) {
  return new Promise(function(resolve) {
    setTimeout(resolve, ms);
  });
}

The SDK has built-in retries for transient errors (configurable via maxRetries on the client), but I prefer explicit retry logic in production because it gives you control over logging, metrics, and backoff strategies.

// SDK built-in retries (defaults to 2)
var client = new Anthropic({
  maxRetries: 3,
  timeout: 60000 // 60-second timeout
});

Rate Limiting and Backoff Strategies

Anthropic enforces rate limits on both requests per minute (RPM) and tokens per minute (TPM). Limits depend on your usage tier and are published in the API documentation. When you hit a limit, you get a 429 response.

For high-throughput applications, implement a sliding window rate limiter:

function RateLimiter(maxRequestsPerMinute) {
  this.maxRPM = maxRequestsPerMinute;
  this.timestamps = [];
}

RateLimiter.prototype.waitForSlot = function() {
  var self = this;
  return new Promise(function(resolve) {
    function check() {
      var now = Date.now();
      // Remove timestamps older than 1 minute
      self.timestamps = self.timestamps.filter(function(ts) {
        return now - ts < 60000;
      });

      if (self.timestamps.length < self.maxRPM) {
        self.timestamps.push(now);
        resolve();
      } else {
        var oldestInWindow = self.timestamps[0];
        var waitTime = 60000 - (now - oldestInWindow) + 100; // +100ms buffer
        setTimeout(check, waitTime);
      }
    }
    check();
  });
};

// Usage
var limiter = new RateLimiter(50); // 50 RPM

async function rateLimitedRequest(params) {
  await limiter.waitForSlot();
  return client.messages.create(params);
}

For exponential backoff on 429 responses, double the delay on each retry up to a maximum:

async function withExponentialBackoff(fn, maxRetries) {
  var retries = 0;
  var baseDelay = 1000;
  maxRetries = maxRetries || 5;

  while (true) {
    try {
      return await fn();
    } catch (error) {
      if (error instanceof Anthropic.RateLimitError && retries < maxRetries) {
        var delay = baseDelay * Math.pow(2, retries) + Math.random() * 1000;
        console.log("Rate limited. Retrying in " + Math.round(delay) + "ms (attempt " + (retries + 1) + "/" + maxRetries + ")");
        await sleep(delay);
        retries++;
      } else {
        throw error;
      }
    }
  }
}

Prompt Caching for Cost Reduction

Prompt caching lets you mark portions of your prompt as cacheable. When the same content appears in subsequent requests, Anthropic serves it from cache at a 90% discount on input tokens. This is transformative for applications with large system prompts, few-shot examples, or document analysis.

Cache reads cost 10% of base input token pricing. Cache writes cost 25% more than base on the first request, but every subsequent hit saves 90%.

var Anthropic = require("@anthropic-ai/sdk");

var client = new Anthropic();

// A large system prompt or reference document you reuse across many requests
var codingStandards = "You are an expert code reviewer for a Node.js codebase. " +
  "Here are the coding standards you must enforce:\n\n" +
  "1. All functions must have JSDoc comments...\n" +
  "2. Error handling is required for all async operations...\n" +
  // ... imagine 2000+ tokens of coding standards here
  "50. All database queries must use parameterized statements.";

async function cachedReview(code) {
  var response = await client.messages.create({
    model: "claude-sonnet-4-20250514",
    max_tokens: 2048,
    system: [
      {
        type: "text",
        text: codingStandards,
        cache_control: { type: "ephemeral" }
      }
    ],
    messages: [
      { role: "user", content: "Review this code:\n\n```javascript\n" + code + "\n```" }
    ]
  });

  console.log("Cache write tokens:", response.usage.cache_creation_input_tokens || 0);
  console.log("Cache read tokens:", response.usage.cache_read_input_tokens || 0);
  console.log("Regular input tokens:", response.usage.input_tokens);

  return response.content[0].text;
}

On the first call, you will see cache_creation_input_tokens populated. On subsequent calls (within the cache TTL, which is around 5 minutes), you will see cache_read_input_tokens instead, and your costs drop dramatically.

You can also cache large user-provided documents by placing them in a user message content block with cache_control:

messages: [
  {
    role: "user",
    content: [
      {
        type: "text",
        text: largeDocumentText,
        cache_control: { type: "ephemeral" }
      },
      {
        type: "text",
        text: "Summarize the key findings from this document."
      }
    ]
  }
]

When caching pays off:

  • System prompts over 1,024 tokens that repeat across requests
  • Few-shot examples included in every request
  • Large documents being queried multiple times
  • RAG contexts where the retrieved documents are reused within a 5-minute window

Prompt caching has a minimum of 1,024 tokens for Sonnet/Opus and 2,048 for Haiku. Content below these thresholds will not be cached.

Batch API for High-Volume Processing

The Message Batches API lets you submit up to 100,000 requests in a single batch. Batches are processed asynchronously with a 24-hour SLA and cost 50% less than standard API calls. This is ideal for content processing pipelines, bulk classification, data enrichment, and evaluation workloads.

var Anthropic = require("@anthropic-ai/sdk");

var client = new Anthropic();

async function processBatch(items) {
  // Create batch requests
  var requests = items.map(function(item, index) {
    return {
      custom_id: "item-" + index,
      params: {
        model: "claude-haiku-3-5-20241022",
        max_tokens: 256,
        messages: [
          {
            role: "user",
            content: "Classify this support ticket into one of: billing, technical, account, other.\n\nTicket: " + item.text
          }
        ]
      }
    };
  });

  // Submit the batch
  var batch = await client.messages.batches.create({
    requests: requests
  });

  console.log("Batch created:", batch.id);
  console.log("Status:", batch.processing_status);

  return batch;
}

async function pollBatch(batchId) {
  while (true) {
    var batch = await client.messages.batches.retrieve(batchId);
    console.log("Status:", batch.processing_status,
      "| Succeeded:", batch.request_counts.succeeded,
      "| Failed:", batch.request_counts.errored);

    if (batch.processing_status === "ended") {
      return batch;
    }

    // Poll every 30 seconds
    await sleep(30000);
  }
}

async function getBatchResults(batchId) {
  var results = [];
  for await (var result of client.messages.batches.results(batchId)) {
    results.push({
      id: result.custom_id,
      type: result.result.type,
      text: result.result.type === "succeeded"
        ? result.result.message.content[0].text
        : null
    });
  }
  return results;
}

// Full batch workflow
async function classifyTickets(tickets) {
  var batch = await processBatch(tickets);
  var completed = await pollBatch(batch.id);
  var results = await getBatchResults(batch.id);

  results.forEach(function(r) {
    console.log(r.id + ": " + r.text);
  });

  return results;
}

function sleep(ms) {
  return new Promise(function(resolve) { setTimeout(resolve, ms); });
}

The batch API is underutilized. If you are processing more than a few hundred items and latency is not critical, use batches. The 50% discount adds up fast. A million short classifications at Haiku batch pricing would cost roughly $0.40 in input tokens and $2.00 in output tokens.

Complete Working Example

Here is a full Node.js application that ties together multi-turn conversation, tool use, streaming, and proper error handling. This is a command-line customer support assistant that can look up orders, check inventory, and process returns.

var Anthropic = require("@anthropic-ai/sdk");
var readline = require("readline");

var client = new Anthropic({ maxRetries: 2 });

// Define available tools
var tools = [
  {
    name: "lookup_order",
    description: "Look up a customer order by order ID. Returns order status, items, and shipping info.",
    input_schema: {
      type: "object",
      properties: {
        order_id: { type: "string", description: "The order ID, e.g. ORD-12345" }
      },
      required: ["order_id"]
    }
  },
  {
    name: "check_inventory",
    description: "Check current inventory level for a product by SKU.",
    input_schema: {
      type: "object",
      properties: {
        sku: { type: "string", description: "Product SKU" }
      },
      required: ["sku"]
    }
  },
  {
    name: "create_return",
    description: "Initiate a return for an order. Returns a return authorization number.",
    input_schema: {
      type: "object",
      properties: {
        order_id: { type: "string", description: "The order ID" },
        reason: { type: "string", description: "Reason for return" },
        items: {
          type: "array",
          items: { type: "string" },
          description: "SKUs of items to return"
        }
      },
      required: ["order_id", "reason", "items"]
    }
  }
];

// Simulated tool implementations (replace with real database/API calls)
var mockOrders = {
  "ORD-12345": {
    order_id: "ORD-12345",
    status: "delivered",
    delivered_date: "2026-02-08",
    items: [
      { sku: "WDG-001", name: "Widget Pro", quantity: 2, price: 29.99 },
      { sku: "WDG-002", name: "Widget Stand", quantity: 1, price: 14.99 }
    ],
    shipping: { carrier: "UPS", tracking: "1Z999AA10123456784" },
    total: 74.97
  }
};

var mockInventory = { "WDG-001": 142, "WDG-002": 0, "WDG-003": 89 };
var ticketCounter = 1000;

function executeTool(name, input) {
  if (name === "lookup_order") {
    var order = mockOrders[input.order_id];
    if (order) return order;
    return { error: "Order not found: " + input.order_id };
  }

  if (name === "check_inventory") {
    var qty = mockInventory[input.sku];
    return {
      sku: input.sku,
      in_stock: (qty || 0) > 0,
      quantity: qty || 0
    };
  }

  if (name === "create_return") {
    ticketCounter++;
    return {
      return_id: "RET-" + ticketCounter,
      order_id: input.order_id,
      status: "authorized",
      label_url: "https://example.com/return-label/RET-" + ticketCounter,
      refund_estimate: "$" + (mockOrders[input.order_id] ? mockOrders[input.order_id].total : "0.00")
    };
  }

  return { error: "Unknown tool: " + name };
}

// Main conversation handler with streaming and tool use
async function handleMessage(conversationHistory, userInput) {
  conversationHistory.push({ role: "user", content: userInput });

  var maxToolRounds = 5;
  var round = 0;

  while (round < maxToolRounds) {
    round++;

    try {
      // Use streaming for the response
      var stream = await client.messages.stream({
        model: "claude-sonnet-4-20250514",
        max_tokens: 4096,
        system: "You are a customer support agent for WidgetCo. Be helpful, concise, and professional. Use the available tools to look up orders, check inventory, and process returns. Always confirm details with the customer before taking actions like creating returns.",
        tools: tools,
        messages: conversationHistory
      });

      // Collect streamed text
      var fullText = "";
      var isTextResponse = false;

      stream.on("text", function(text) {
        process.stdout.write(text);
        fullText += text;
        isTextResponse = true;
      });

      var finalMessage = await stream.finalMessage();

      if (isTextResponse) {
        process.stdout.write("\n");
      }

      // Add assistant response to history
      conversationHistory.push({ role: "assistant", content: finalMessage.content });

      // Check for tool calls
      var toolUseBlocks = finalMessage.content.filter(function(block) {
        return block.type === "tool_use";
      });

      if (toolUseBlocks.length === 0) {
        // No tool calls -- conversation turn complete
        console.log("[Tokens: " + finalMessage.usage.input_tokens + " in / " + finalMessage.usage.output_tokens + " out]");
        return;
      }

      // Execute tool calls and send results back
      var toolResults = toolUseBlocks.map(function(toolUse) {
        console.log("[Calling tool: " + toolUse.name + "]");
        var result = executeTool(toolUse.name, toolUse.input);
        return {
          type: "tool_result",
          tool_use_id: toolUse.id,
          content: JSON.stringify(result)
        };
      });

      conversationHistory.push({ role: "user", content: toolResults });
      // Loop continues -- Claude will process tool results and respond

    } catch (err) {
      if (err instanceof Anthropic.RateLimitError) {
        console.error("\n[Rate limited. Please wait a moment and try again.]");
      } else if (err instanceof Anthropic.APIConnectionError) {
        console.error("\n[Connection error. Check your network.]");
      } else if (err instanceof Anthropic.AuthenticationError) {
        console.error("\n[Authentication failed. Check your API key.]");
      } else {
        console.error("\n[Error: " + err.message + "]");
      }
      return;
    }
  }

  console.log("[Warning: Maximum tool rounds reached]");
}

// Interactive CLI loop
async function main() {
  var rl = readline.createInterface({
    input: process.stdin,
    output: process.stdout
  });

  var history = [];

  console.log("WidgetCo Support Assistant (type 'quit' to exit)");
  console.log("------------------------------------------------");

  function prompt() {
    rl.question("\nYou: ", function(input) {
      input = input.trim();

      if (!input) {
        prompt();
        return;
      }

      if (input.toLowerCase() === "quit") {
        console.log("Goodbye!");
        rl.close();
        return;
      }

      handleMessage(history, input)
        .then(function() {
          prompt();
        })
        .catch(function(err) {
          console.error("Fatal error:", err.message);
          prompt();
        });
    });
  }

  prompt();
}

main();

Example session:

WidgetCo Support Assistant (type 'quit' to exit)
------------------------------------------------

You: Hi, I need to check on my order ORD-12345
[Calling tool: lookup_order]
I found your order ORD-12345. It was delivered on February 8, 2026 via UPS (tracking: 1Z999AA10123456784). The order contained:

- 2x Widget Pro ($29.99 each)
- 1x Widget Stand ($14.99)

Total: $74.97. Is there anything else I can help you with?
[Tokens: 487 in / 82 out]

You: The widget stand arrived damaged. I'd like to return it.
[Calling tool: create_return]
I've processed the return for your Widget Stand. Here are the details:

- Return ID: RET-1001
- Status: Authorized
- A prepaid return shipping label is available at: https://example.com/return-label/RET-1001
- Estimated refund: $74.97

Please ship the damaged Widget Stand back using the provided label. Your refund will be processed once we receive the item. Is there anything else?
[Tokens: 892 in / 94 out]

Common Issues and Troubleshooting

1. "Invalid API Key" (401 Authentication Error)

Error: 401 {"type":"error","error":{"type":"authentication_error","message":"invalid x-api-key"}}

This means your API key is wrong, expired, or not set. Check that ANTHROPIC_API_KEY is set in your environment and does not have leading/trailing whitespace. A common mistake is copying the key with a newline character from a .env file. Verify with:

node -e "console.log('[' + process.env.ANTHROPIC_API_KEY + ']')"

If you see [undefined] or extra whitespace around the key, that is your problem.

2. "Overloaded" (529 Error)

Error: 529 {"type":"error","error":{"type":"overloaded_error","message":"Overloaded"}}

Anthropic's servers are temporarily at capacity. This is transient and usually resolves within seconds. The SDK's built-in retry logic handles this automatically, but if you have maxRetries set to 0, you will need to implement your own backoff. Do not panic when you see this error -- just retry.

3. "Messages content must be non-empty" (400 Bad Request)

Error: 400 {"type":"error","error":{"type":"invalid_request_error","message":"messages: `messages.0.content` must be a non-empty string or array of content blocks"}}

You are sending an empty string as the message content. This happens when reading user input without trimming, when a variable is undefined, or when accidentally passing null. Always validate input before sending:

if (!userMessage || userMessage.trim().length === 0) {
  console.log("Please enter a message.");
  return;
}

4. Truncated Responses (stop_reason is "max_tokens")

When stop_reason is "max_tokens" instead of "end_turn", your response was cut off mid-sentence because max_tokens was too low. Either increase max_tokens or implement continuation logic:

async function getFullResponse(params) {
  var allContent = "";
  var messages = params.messages.slice();

  while (true) {
    var response = await client.messages.create(
      Object.assign({}, params, { messages: messages })
    );
    var text = response.content[0].text;
    allContent += text;

    if (response.stop_reason !== "max_tokens") {
      break;
    }

    // Continue the conversation
    messages.push({ role: "assistant", content: text });
    messages.push({ role: "user", content: "Continue exactly where you left off." });
  }

  return allContent;
}

5. Tool Use "tool_use_id not found" Error

Error: 400 {"type":"error","error":{"type":"invalid_request_error","message":"tool_use id `toolu_abc123` not found in prior assistant message"}}

This happens when your tool_result references a tool_use_id that does not match any tool_use block in the preceding assistant message. The most common cause is failing to add the full assistant response (including the tool_use blocks) to the conversation history before appending tool results. Make sure you push the entire response.content array as the assistant message, not just the text.

6. Context Window Exceeded

Error: 400 {"type":"error","error":{"type":"invalid_request_error","message":"prompt is too long: 204521 tokens > 200000 maximum"}}

Your conversation history plus system prompt exceeds the model's context window. Implement history pruning -- summarize older messages, drop early turns, or use a sliding window. For Sonnet, the limit is 200K tokens. For long-running chat sessions, this is inevitable without pruning.

Best Practices

  • Always check stop_reason on every response. A stop_reason of max_tokens means truncation. A stop_reason of tool_use means you need to execute tools and continue the loop. Only end_turn means Claude is genuinely done. Ignoring this field is the number one source of subtle bugs in Claude integrations.

  • Use system prompts for behavior, not for data. System prompts should define how Claude behaves. Large reference documents belong in user messages with cache_control for prompt caching benefits.

  • Validate tool inputs before execution. Claude generally produces well-formed JSON matching your schema, but edge cases happen. Validate inputs before executing tools, especially if they interact with databases or external systems. A bad DELETE query from malformed tool input is not something you want to debug in production.

  • Implement conversation history pruning. For long-running chat sessions, do not send the entire history on every request. Summarize older turns, or use a sliding window of the most recent N turns. Unbounded history leads to exploding costs and eventually hitting context window limits.

  • Use structured output for programmatic consumption. When you need Claude to return structured data, describe the exact JSON format in the system prompt. For maximum reliability, use tool use with a single tool that has your desired output schema -- Claude will return data matching the schema as tool input, which is far more reliable than asking for JSON in a text response.

  • Route to the right model per task. Use Haiku for routing, classification, and extraction. Use Sonnet as your default. Use Opus only for genuinely hard tasks. The cost difference between Haiku and Opus is roughly 19x on input and 19x on output. Model routing is the highest-leverage cost optimization.

  • Implement request-level timeouts. The SDK's timeout option prevents requests from hanging indefinitely. Set it based on your expected response size: 30 seconds for short responses, 120 seconds for long-form generation. Long streaming responses need longer timeouts.

  • Cache client instances. Create one Anthropic client and reuse it across your application. The client manages connection pooling internally. Creating a new client per request wastes resources and connection setup time.

  • Log token usage for cost monitoring. Track input_tokens, output_tokens, cache_read_input_tokens, and cache_creation_input_tokens on every response. Aggregate these per user, per feature, or per model to understand your cost structure and spot anomalies early.

  • Use the batch API for offline workloads. If you are processing more than a few hundred items and do not need real-time responses, batches give you a 50% discount. There is no reason to pay full price for bulk classification, data enrichment, or evaluation workloads.

References

Powered by Contentful