State Management in Serverless Applications
Master state management in serverless architectures using DynamoDB, Redis, Step Functions, and event sourcing patterns
State Management in Serverless Applications
Overview
Serverless functions are stateless by design, which means every piece of application state must be explicitly managed through external services. This constraint forces you to think carefully about where state lives, how it flows between invocations, and how to maintain consistency across distributed components. Understanding state management patterns in serverless architectures is the difference between building systems that scale gracefully and building systems that collapse under real-world load.
Prerequisites
- Working knowledge of AWS Lambda and Node.js
- Familiarity with DynamoDB, Redis, and AWS Step Functions concepts
- AWS CLI configured with appropriate IAM permissions
- Node.js 18+ runtime
- Basic understanding of distributed systems challenges (eventual consistency, race conditions)
The Stateless Execution Model
Lambda functions spin up, execute, and shut down. Between invocations, there is no guarantee that any in-memory state will survive. Even when AWS reuses a warm container, you cannot rely on it. This is not a limitation to work around — it is a constraint to embrace.
The stateless model gives you automatic scaling, zero-downtime deployments, and simplified failure recovery. But it shifts responsibility for state management entirely onto you. Every variable, every session, every workflow checkpoint must be persisted somewhere external to the function itself.
There are five fundamental categories of state in serverless applications:
- Persistent state — long-lived data like user profiles, orders, inventory (DynamoDB, RDS)
- Session state — short-lived data tied to a user session (Redis, DynamoDB with TTL)
- Workflow state — multi-step process coordination (Step Functions)
- Transient state — ephemeral computation results (Lambda global scope, S3)
- Event state — immutable log of state changes (EventBridge, Kinesis, DynamoDB Streams)
Each category demands different storage, consistency, and durability tradeoffs.
DynamoDB for Persistent State
DynamoDB is the default state store for most serverless applications, and for good reason. It offers single-digit millisecond latency, scales horizontally without intervention, and integrates natively with Lambda through IAM roles. No connection pooling headaches. No cold start penalty from establishing TCP sessions to a relational database.
The key to effective DynamoDB state management is designing your access patterns first. You need to know how you will query your data before you model it.
var AWS = require("aws-sdk");
var dynamodb = new AWS.DynamoDB.DocumentClient();
function saveCartState(userId, cartItems, metadata) {
var params = {
TableName: "ShoppingCarts",
Item: {
PK: "USER#" + userId,
SK: "CART#active",
items: cartItems,
itemCount: cartItems.length,
totalAmount: calculateTotal(cartItems),
updatedAt: new Date().toISOString(),
ttl: Math.floor(Date.now() / 1000) + (7 * 24 * 60 * 60),
version: metadata.version || 1
},
ConditionExpression: "attribute_not_exists(version) OR version = :expectedVersion",
ExpressionAttributeValues: {
":expectedVersion": metadata.version || 0
}
};
return dynamodb.put(params).promise();
}
function getCartState(userId) {
var params = {
TableName: "ShoppingCarts",
Key: {
PK: "USER#" + userId,
SK: "CART#active"
}
};
return dynamodb.get(params).promise().then(function(result) {
return result.Item || null;
});
}
The ConditionExpression on the put operation implements optimistic locking. Without it, two concurrent Lambda invocations could silently overwrite each other's cart updates. The version field acts as a concurrency control token — every successful write increments it, and every write requires the current version to match.
DynamoDB TTL handles state cleanup automatically. Shopping carts that sit untouched for seven days are garbage collected by DynamoDB without any scheduled Lambda invocations.
Single-Table Design for Related State
Rather than creating separate tables for each entity, use a single-table design with composite keys to co-locate related state:
function saveOrderWithItems(order, items) {
var transactItems = [
{
Put: {
TableName: "AppState",
Item: {
PK: "USER#" + order.userId,
SK: "ORDER#" + order.orderId,
status: "pending",
total: order.total,
createdAt: new Date().toISOString()
}
}
}
];
items.forEach(function(item) {
transactItems.push({
Put: {
TableName: "AppState",
Item: {
PK: "ORDER#" + order.orderId,
SK: "ITEM#" + item.productId,
quantity: item.quantity,
price: item.price,
productName: item.name
}
}
});
});
return dynamodb.transactWrite({ TransactItems: transactItems }).promise();
}
DynamoDB transactions let you atomically write up to 100 items across multiple partition keys. This is how you maintain state consistency without a relational database.
ElastiCache/Redis for Session State
Redis provides sub-millisecond reads and is ideal for session state, rate limiting, and frequently accessed data that does not need durable persistence. The tradeoff is that Redis requires a VPC, which adds cold start latency to Lambda functions.
var Redis = require("ioredis");
var redis;
function getRedisClient() {
if (!redis) {
redis = new Redis({
host: process.env.REDIS_ENDPOINT,
port: 6379,
connectTimeout: 5000,
maxRetriesPerRequest: 2,
retryStrategy: function(times) {
if (times > 3) return null;
return Math.min(times * 200, 1000);
}
});
}
return redis;
}
function setSessionState(sessionId, state, ttlSeconds) {
var client = getRedisClient();
var key = "session:" + sessionId;
var serialized = JSON.stringify(state);
return client.setex(key, ttlSeconds || 3600, serialized);
}
function getSessionState(sessionId) {
var client = getRedisClient();
var key = "session:" + sessionId;
return client.get(key).then(function(data) {
if (!data) return null;
return JSON.parse(data);
});
}
function updateSessionField(sessionId, field, value) {
var client = getRedisClient();
var key = "session:" + sessionId;
return client.pipeline()
.hset(key, field, JSON.stringify(value))
.expire(key, 3600)
.exec();
}
Notice that the Redis client is created outside the handler function and stored in a module-level variable. This takes advantage of Lambda container reuse — the connection persists across warm invocations, avoiding the overhead of reconnecting every time.
Step Functions for Workflow State
When state needs to flow through a multi-step process — checkout workflows, approval chains, data pipelines — Step Functions is the right tool. It manages the state machine, handles retries, tracks progress, and coordinates parallel execution.
{
"Comment": "Checkout workflow managing order state transitions",
"StartAt": "ValidateCart",
"States": {
"ValidateCart": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789:function:validateCart",
"ResultPath": "$.validation",
"Next": "ReserveInventory",
"Catch": [
{
"ErrorEquals": ["CartValidationError"],
"Next": "NotifyCartError",
"ResultPath": "$.error"
}
]
},
"ReserveInventory": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789:function:reserveInventory",
"ResultPath": "$.reservation",
"TimeoutSeconds": 30,
"Retry": [
{
"ErrorEquals": ["States.Timeout", "InventoryServiceError"],
"IntervalSeconds": 2,
"MaxAttempts": 3,
"BackoffRate": 2.0
}
],
"Next": "ProcessPayment"
},
"ProcessPayment": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789:function:processPayment",
"ResultPath": "$.payment",
"Next": "ConfirmOrder",
"Catch": [
{
"ErrorEquals": ["PaymentDeclinedError"],
"Next": "ReleaseInventory",
"ResultPath": "$.error"
}
]
},
"ConfirmOrder": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789:function:confirmOrder",
"End": true
},
"ReleaseInventory": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789:function:releaseInventory",
"Next": "NotifyPaymentFailure"
},
"NotifyCartError": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789:function:notifyUser",
"End": true
},
"NotifyPaymentFailure": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789:function:notifyUser",
"End": true
}
}
}
Step Functions stores the entire execution state. Each state transition passes the output of one Lambda to the input of the next through ResultPath. If the process fails halfway through, Step Functions knows exactly where it stopped and can compensate (releasing inventory when payment fails).
Starting a workflow from a Lambda function:
var AWS = require("aws-sdk");
var stepfunctions = new AWS.StepFunctions();
function startCheckout(userId, cartData) {
var executionId = userId + "-" + Date.now();
var params = {
stateMachineArn: process.env.CHECKOUT_STATE_MACHINE_ARN,
name: executionId,
input: JSON.stringify({
userId: userId,
cartId: cartData.cartId,
items: cartData.items,
totalAmount: cartData.totalAmount,
idempotencyKey: cartData.idempotencyKey
})
};
return stepfunctions.startExecution(params).promise();
}
S3 for Large State Objects
DynamoDB items are limited to 400KB. When state objects exceed that — large configuration files, batch processing results, serialized machine learning feature sets — use S3 as the state store and DynamoDB as the index.
var AWS = require("aws-sdk");
var s3 = new AWS.S3();
var dynamodb = new AWS.DynamoDB.DocumentClient();
function saveLargeState(entityId, stateType, stateData) {
var s3Key = "state/" + entityId + "/" + stateType + "/" + Date.now() + ".json";
var serialized = JSON.stringify(stateData);
return s3.putObject({
Bucket: process.env.STATE_BUCKET,
Key: s3Key,
Body: serialized,
ContentType: "application/json",
ServerSideEncryption: "aws:kms"
}).promise().then(function(s3Result) {
return dynamodb.put({
TableName: "StateIndex",
Item: {
PK: "ENTITY#" + entityId,
SK: "STATE#" + stateType,
s3Bucket: process.env.STATE_BUCKET,
s3Key: s3Key,
sizeBytes: Buffer.byteLength(serialized),
updatedAt: new Date().toISOString(),
etag: s3Result.ETag
}
}).promise();
});
}
function loadLargeState(entityId, stateType) {
return dynamodb.get({
TableName: "StateIndex",
Key: {
PK: "ENTITY#" + entityId,
SK: "STATE#" + stateType
}
}).promise().then(function(indexResult) {
if (!indexResult.Item) return null;
return s3.getObject({
Bucket: indexResult.Item.s3Bucket,
Key: indexResult.Item.s3Key
}).promise().then(function(s3Result) {
return JSON.parse(s3Result.Body.toString());
});
});
}
This pattern keeps DynamoDB fast for lookups while offloading payload storage to S3. The ETag in the index provides a cheap way to detect whether state has changed without downloading the full object.
Distributed Locking in Serverless
When multiple Lambda invocations must coordinate access to shared state, you need distributed locking. DynamoDB conditional writes serve as an effective locking mechanism.
var AWS = require("aws-sdk");
var dynamodb = new AWS.DynamoDB.DocumentClient();
function acquireLock(lockId, ownerId, ttlSeconds) {
var now = Math.floor(Date.now() / 1000);
var expiresAt = now + (ttlSeconds || 30);
var params = {
TableName: "DistributedLocks",
Item: {
lockId: lockId,
ownerId: ownerId,
acquiredAt: new Date().toISOString(),
expiresAt: expiresAt,
ttl: expiresAt
},
ConditionExpression: "attribute_not_exists(lockId) OR expiresAt < :now",
ExpressionAttributeValues: {
":now": now
}
};
return dynamodb.put(params).promise()
.then(function() {
return { acquired: true, ownerId: ownerId };
})
.catch(function(err) {
if (err.code === "ConditionalCheckFailedException") {
return { acquired: false, ownerId: null };
}
throw err;
});
}
function releaseLock(lockId, ownerId) {
var params = {
TableName: "DistributedLocks",
Key: { lockId: lockId },
ConditionExpression: "ownerId = :owner",
ExpressionAttributeValues: {
":owner": ownerId
}
};
return dynamodb.delete(params).promise();
}
function withLock(lockId, ttlSeconds, operation) {
var ownerId = "lambda-" + Date.now() + "-" + Math.random().toString(36).substr(2, 9);
return acquireLock(lockId, ownerId, ttlSeconds)
.then(function(lockResult) {
if (!lockResult.acquired) {
throw new Error("Failed to acquire lock: " + lockId);
}
return operation();
})
.then(function(result) {
return releaseLock(lockId, ownerId).then(function() {
return result;
});
})
.catch(function(err) {
return releaseLock(lockId, ownerId)
.catch(function() { /* ignore release errors */ })
.then(function() { throw err; });
});
}
The withLock wrapper provides a clean API:
exports.handler = function(event, context) {
var userId = event.pathParameters.userId;
return withLock("cart-update-" + userId, 15, function() {
return getCartState(userId)
.then(function(cart) {
cart.items.push(event.body.newItem);
cart.version = cart.version + 1;
return saveCartState(userId, cart.items, { version: cart.version });
});
});
};
The TTL on the lock ensures that if a Lambda crashes before releasing, the lock auto-expires. Without this, a dead invocation would permanently block all other invocations.
Caching Strategies
Lambda Global Scope Caching
Variables declared outside the handler function survive across warm invocations of the same container. This is the fastest cache available — it is in-process memory with zero network overhead.
var cachedConfig = null;
var configLoadedAt = 0;
var CONFIG_TTL_MS = 5 * 60 * 1000;
function loadConfig() {
var now = Date.now();
if (cachedConfig && (now - configLoadedAt) < CONFIG_TTL_MS) {
return Promise.resolve(cachedConfig);
}
return fetchConfigFromDynamoDB().then(function(config) {
cachedConfig = config;
configLoadedAt = now;
return config;
});
}
exports.handler = function(event, context) {
return loadConfig().then(function(config) {
// Use config for request processing
return processRequest(event, config);
});
};
This pattern is ideal for configuration, reference data, and frequently accessed lookups. It is not suitable for state that must be consistent across all concurrent Lambda instances.
Multi-Layer Cache with Redis
For shared caching across Lambda instances, layer in-memory cache with Redis:
var Redis = require("ioredis");
var localCache = {};
var LOCAL_TTL_MS = 30 * 1000;
var redis;
function getRedis() {
if (!redis) {
redis = new Redis(process.env.REDIS_ENDPOINT);
}
return redis;
}
function getCachedState(key) {
var local = localCache[key];
if (local && (Date.now() - local.ts) < LOCAL_TTL_MS) {
return Promise.resolve(local.value);
}
return getRedis().get("cache:" + key).then(function(cached) {
if (cached) {
var parsed = JSON.parse(cached);
localCache[key] = { value: parsed, ts: Date.now() };
return parsed;
}
return fetchFromDynamoDB(key).then(function(data) {
localCache[key] = { value: data, ts: Date.now() };
return getRedis().setex("cache:" + key, 300, JSON.stringify(data))
.then(function() { return data; });
});
});
}
This gives you three tiers: Lambda memory (30 seconds), Redis (5 minutes), DynamoDB (source of truth). Reads hit the fastest available tier.
Idempotency Tokens and Exactly-Once Processing
In serverless architectures, Lambda functions can be invoked multiple times for the same event. API Gateway retries on timeout. SQS redelivers messages on failure. You must design for at-least-once delivery and ensure exactly-once processing through idempotency.
var AWS = require("aws-sdk");
var dynamodb = new AWS.DynamoDB.DocumentClient();
function ensureIdempotent(idempotencyKey, ttlSeconds, operation) {
return dynamodb.get({
TableName: "IdempotencyStore",
Key: { idempotencyKey: idempotencyKey }
}).promise().then(function(existing) {
if (existing.Item) {
if (existing.Item.status === "COMPLETED") {
return existing.Item.result;
}
if (existing.Item.status === "IN_PROGRESS") {
throw new Error("Operation already in progress");
}
}
var now = Math.floor(Date.now() / 1000);
return dynamodb.put({
TableName: "IdempotencyStore",
Item: {
idempotencyKey: idempotencyKey,
status: "IN_PROGRESS",
createdAt: new Date().toISOString(),
ttl: now + (ttlSeconds || 3600)
},
ConditionExpression: "attribute_not_exists(idempotencyKey)"
}).promise()
.then(function() {
return operation();
})
.then(function(result) {
return dynamodb.update({
TableName: "IdempotencyStore",
Key: { idempotencyKey: idempotencyKey },
UpdateExpression: "SET #status = :completed, #result = :result",
ExpressionAttributeNames: {
"#status": "status",
"#result": "result"
},
ExpressionAttributeValues: {
":completed": "COMPLETED",
":result": result
}
}).promise().then(function() {
return result;
});
})
.catch(function(err) {
dynamodb.delete({
TableName: "IdempotencyStore",
Key: { idempotencyKey: idempotencyKey }
}).promise().catch(function() {});
throw err;
});
});
}
Usage in a payment handler:
exports.handler = function(event, context) {
var body = JSON.parse(event.body);
var idempotencyKey = body.idempotencyKey || event.requestContext.requestId;
return ensureIdempotent(idempotencyKey, 86400, function() {
return processPayment(body.orderId, body.amount, body.paymentMethod);
}).then(function(result) {
return { statusCode: 200, body: JSON.stringify(result) };
}).catch(function(err) {
return { statusCode: 409, body: JSON.stringify({ error: err.message }) };
});
};
The idempotency key should be client-generated. If the client retries a request with the same key, they get back the cached result instead of double-charging.
State Consistency Across Microservices
In a serverless microservices architecture, state is distributed across multiple services and tables. Maintaining consistency requires the Saga pattern — a sequence of local transactions with compensating actions for rollback.
function executeSaga(steps) {
var completed = [];
function executeStep(index) {
if (index >= steps.length) {
return Promise.resolve({ success: true, results: completed });
}
var step = steps[index];
return step.execute().then(function(result) {
completed.push({ step: step.name, result: result });
return executeStep(index + 1);
}).catch(function(err) {
console.error("Saga step failed: " + step.name, err);
return compensate(completed).then(function() {
throw new Error("Saga failed at step: " + step.name + " - " + err.message);
});
});
}
function compensate(completedSteps) {
var compensations = completedSteps.reverse().map(function(entry) {
var step = steps.find(function(s) { return s.name === entry.step; });
if (step.compensate) {
return step.compensate(entry.result);
}
return Promise.resolve();
});
return Promise.all(compensations);
}
return executeStep(0);
}
// Usage
var checkoutSaga = [
{
name: "reserveInventory",
execute: function() { return inventoryService.reserve(items); },
compensate: function(result) { return inventoryService.release(result.reservationId); }
},
{
name: "chargePayment",
execute: function() { return paymentService.charge(amount); },
compensate: function(result) { return paymentService.refund(result.transactionId); }
},
{
name: "createShipment",
execute: function() { return shippingService.create(order); },
compensate: function(result) { return shippingService.cancel(result.shipmentId); }
}
];
Each step knows how to undo itself. If payment fails after inventory is reserved, the saga automatically releases the reservation.
Event Sourcing for State Reconstruction
Instead of storing current state, store the events that produced it. This pattern provides a complete audit trail and the ability to reconstruct state at any point in time.
var AWS = require("aws-sdk");
var dynamodb = new AWS.DynamoDB.DocumentClient();
function appendEvent(streamId, eventType, eventData, expectedVersion) {
var eventId = streamId + "#" + Date.now() + "#" + Math.random().toString(36).substr(2, 6);
var params = {
TableName: "EventStore",
Item: {
streamId: streamId,
eventId: eventId,
eventType: eventType,
eventData: eventData,
timestamp: new Date().toISOString(),
version: expectedVersion + 1
},
ConditionExpression: "attribute_not_exists(eventId)"
};
return dynamodb.put(params).promise();
}
function replayEvents(streamId) {
var params = {
TableName: "EventStore",
KeyConditionExpression: "streamId = :sid",
ExpressionAttributeValues: {
":sid": streamId
},
ScanIndexForward: true
};
return dynamodb.query(params).promise().then(function(result) {
return result.Items.reduce(function(state, event) {
return applyEvent(state, event);
}, {});
});
}
function applyEvent(state, event) {
switch (event.eventType) {
case "ITEM_ADDED":
var items = state.items || [];
items.push(event.eventData.item);
return Object.assign({}, state, { items: items });
case "ITEM_REMOVED":
var filtered = (state.items || []).filter(function(item) {
return item.productId !== event.eventData.productId;
});
return Object.assign({}, state, { items: filtered });
case "QUANTITY_UPDATED":
var updated = (state.items || []).map(function(item) {
if (item.productId === event.eventData.productId) {
return Object.assign({}, item, { quantity: event.eventData.quantity });
}
return item;
});
return Object.assign({}, state, { items: updated });
case "CHECKOUT_STARTED":
return Object.assign({}, state, { status: "checking_out" });
case "ORDER_COMPLETED":
return Object.assign({}, state, { status: "completed", orderId: event.eventData.orderId });
default:
return state;
}
}
Event sourcing excels when you need to understand how you arrived at a state, not just what the current state is. It is also the foundation for CQRS (Command Query Responsibility Segregation), where write and read models are separated.
Connection State Management
Lambda functions must manage connections to external services carefully. Opening a new connection on every invocation is wasteful. Leaving connections open indefinitely causes resource leaks.
var AWS = require("aws-sdk");
var Redis = require("ioredis");
var connections = {};
function getConnection(name, factory) {
if (!connections[name] || !connections[name].isReady) {
connections[name] = factory();
connections[name].isReady = true;
}
return connections[name];
}
var dynamodb = getConnection("dynamodb", function() {
return new AWS.DynamoDB.DocumentClient({
maxRetries: 3,
httpOptions: {
connectTimeout: 2000,
timeout: 5000
}
});
});
function getRedisConnection() {
return getConnection("redis", function() {
var client = new Redis({
host: process.env.REDIS_ENDPOINT,
port: 6379,
connectTimeout: 3000,
lazyConnect: true,
keepAlive: 10000
});
client.on("error", function(err) {
console.error("Redis connection error:", err.message);
connections["redis"] = null;
});
client.connect();
return client;
});
}
exports.handler = function(event, context) {
context.callbackWaitsForEmptyEventLoop = false;
var redis = getRedisConnection();
// Process request...
};
Setting callbackWaitsForEmptyEventLoop = false is critical when using persistent connections. Without it, Lambda waits for the Redis connection to close before returning, which defeats the purpose of connection reuse and causes timeouts.
Complete Working Example: Serverless Shopping Cart
This example ties together DynamoDB for persistence, Redis for caching, and Step Functions for the checkout workflow.
Cart Service Lambda
var AWS = require("aws-sdk");
var Redis = require("ioredis");
var dynamodb = new AWS.DynamoDB.DocumentClient();
var stepfunctions = new AWS.StepFunctions();
var redis;
function getRedis() {
if (!redis) {
redis = new Redis({
host: process.env.REDIS_ENDPOINT,
port: 6379,
connectTimeout: 3000,
lazyConnect: true
});
redis.connect();
}
return redis;
}
function addToCart(userId, item) {
var lockId = "cart:" + userId;
return acquireLock(lockId, "add-" + Date.now(), 10)
.then(function(lock) {
if (!lock.acquired) {
throw new Error("Cart is being updated by another request");
}
return getCart(userId).then(function(cart) {
var existingIndex = -1;
cart.items.forEach(function(existing, i) {
if (existing.productId === item.productId) {
existingIndex = i;
}
});
if (existingIndex >= 0) {
cart.items[existingIndex].quantity += item.quantity;
} else {
cart.items.push({
productId: item.productId,
name: item.name,
price: item.price,
quantity: item.quantity,
addedAt: new Date().toISOString()
});
}
cart.version = (cart.version || 0) + 1;
cart.updatedAt = new Date().toISOString();
cart.totalAmount = cart.items.reduce(function(sum, i) {
return sum + (i.price * i.quantity);
}, 0);
return persistCart(userId, cart).then(function() {
return invalidateCartCache(userId).then(function() {
return releaseLock(lockId, lock.ownerId).then(function() {
return cart;
});
});
});
});
});
}
function getCart(userId) {
var cacheKey = "cart:" + userId;
return getRedis().get(cacheKey).then(function(cached) {
if (cached) {
return JSON.parse(cached);
}
return dynamodb.get({
TableName: "ShoppingCarts",
Key: { PK: "USER#" + userId, SK: "CART#active" }
}).promise().then(function(result) {
var cart = result.Item || { items: [], version: 0, totalAmount: 0 };
return getRedis().setex(cacheKey, 300, JSON.stringify(cart))
.then(function() { return cart; });
});
});
}
function persistCart(userId, cart) {
return dynamodb.put({
TableName: "ShoppingCarts",
Item: Object.assign({
PK: "USER#" + userId,
SK: "CART#active",
ttl: Math.floor(Date.now() / 1000) + (7 * 24 * 60 * 60)
}, cart)
}).promise();
}
function invalidateCartCache(userId) {
return getRedis().del("cart:" + userId);
}
function startCheckout(userId) {
return getCart(userId).then(function(cart) {
if (!cart.items || cart.items.length === 0) {
throw new Error("Cannot checkout with an empty cart");
}
var idempotencyKey = userId + "-" + Date.now();
return stepfunctions.startExecution({
stateMachineArn: process.env.CHECKOUT_STATE_MACHINE_ARN,
name: "checkout-" + idempotencyKey,
input: JSON.stringify({
userId: userId,
cart: cart,
idempotencyKey: idempotencyKey,
initiatedAt: new Date().toISOString()
})
}).promise().then(function(execution) {
return {
checkoutId: idempotencyKey,
executionArn: execution.executionArn,
status: "STARTED"
};
});
});
}
exports.handler = function(event, context) {
context.callbackWaitsForEmptyEventLoop = false;
var method = event.httpMethod;
var userId = event.pathParameters.userId;
var action;
if (method === "GET") {
action = getCart(userId);
} else if (method === "POST" && event.resource.indexOf("checkout") >= 0) {
action = startCheckout(userId);
} else if (method === "POST") {
var item = JSON.parse(event.body);
action = addToCart(userId, item);
} else {
action = Promise.resolve({ error: "Method not allowed" });
}
return action.then(function(result) {
return {
statusCode: 200,
headers: { "Content-Type": "application/json" },
body: JSON.stringify(result)
};
}).catch(function(err) {
console.error("Cart error:", err);
return {
statusCode: err.statusCode || 500,
body: JSON.stringify({ error: err.message })
};
});
};
Checkout Step Function Handlers
// validateCart.js
exports.handler = function(event, context) {
var cart = event.cart;
if (!cart.items || cart.items.length === 0) {
throw { name: "CartValidationError", message: "Cart is empty" };
}
var invalidItems = cart.items.filter(function(item) {
return !item.productId || item.quantity < 1 || item.price <= 0;
});
if (invalidItems.length > 0) {
throw {
name: "CartValidationError",
message: "Invalid items found: " + JSON.stringify(invalidItems)
};
}
return {
valid: true,
itemCount: cart.items.length,
totalAmount: cart.totalAmount
};
};
// reserveInventory.js
var AWS = require("aws-sdk");
var dynamodb = new AWS.DynamoDB.DocumentClient();
exports.handler = function(event, context) {
var reservations = event.cart.items.map(function(item) {
return dynamodb.update({
TableName: "Inventory",
Key: { productId: item.productId },
UpdateExpression: "SET reserved = reserved + :qty, available = available - :qty",
ConditionExpression: "available >= :qty",
ExpressionAttributeValues: { ":qty": item.quantity },
ReturnValues: "ALL_NEW"
}).promise();
});
return Promise.all(reservations).then(function(results) {
return {
reservationId: "res-" + Date.now(),
reservedItems: results.length,
expiresAt: new Date(Date.now() + 15 * 60 * 1000).toISOString()
};
});
};
// processPayment.js
exports.handler = function(event, context) {
var amount = event.cart.totalAmount;
var userId = event.userId;
// Integrate with Stripe, PayPal, etc.
return processWithPaymentProvider(userId, amount)
.then(function(result) {
if (!result.success) {
var err = new Error("Payment declined: " + result.reason);
err.name = "PaymentDeclinedError";
throw err;
}
return {
transactionId: result.transactionId,
amount: amount,
processedAt: new Date().toISOString()
};
});
};
Common Issues and Troubleshooting
1. DynamoDB ConditionalCheckFailedException on Concurrent Writes
ConditionalCheckFailedException: The conditional request failed
This happens when two Lambda invocations try to update the same item simultaneously. Your ConditionExpression correctly prevented data corruption, but you need to handle the retry. Implement exponential backoff with jitter:
function retryWithBackoff(operation, maxRetries) {
var attempt = 0;
function execute() {
return operation().catch(function(err) {
if (err.code === "ConditionalCheckFailedException" && attempt < maxRetries) {
attempt++;
var delay = Math.pow(2, attempt) * 100 + Math.random() * 100;
return new Promise(function(resolve) {
setTimeout(resolve, delay);
}).then(execute);
}
throw err;
});
}
return execute();
}
2. Redis ETIMEDOUT in VPC-Attached Lambda
Error: connect ETIMEDOUT 10.0.1.42:6379
Lambda functions in a VPC have longer cold starts (often 5-10 seconds) because AWS must attach an Elastic Network Interface. If your Redis connection timeout is shorter than the cold start, connections fail. Increase connectTimeout to at least 5000ms and configure lazyConnect: true so the connection is established only when the first command is sent.
3. Step Functions Execution Name Conflict
ExecutionAlreadyExists: Execution Already Exists: 'checkout-user123-1707849600000'
Step Functions execution names must be unique within 90 days. If a user retries a checkout and you reuse the same execution name, you get this error. This is actually a feature — it prevents duplicate workflows. Handle it by catching the error and returning the existing execution status:
function startOrGetExecution(name, input) {
return stepfunctions.startExecution({
stateMachineArn: process.env.STATE_MACHINE_ARN,
name: name,
input: JSON.stringify(input)
}).promise().catch(function(err) {
if (err.code === "ExecutionAlreadyExists") {
return stepfunctions.describeExecution({
executionArn: process.env.STATE_MACHINE_ARN.replace(":stateMachine:", ":execution:") + ":" + name
}).promise();
}
throw err;
});
}
4. Lambda Timeout Waiting for callbackWaitsForEmptyEventLoop
Task timed out after 30.00 seconds
Your function finishes processing but Lambda does not return the response. This is almost always caused by an open database or Redis connection keeping the event loop alive. Set context.callbackWaitsForEmptyEventLoop = false at the top of your handler. This tells Lambda to return the response immediately without waiting for the event loop to drain.
5. DynamoDB ProvisionedThroughputExceededException Under Load
ProvisionedThroughputExceededException: The level of configured provisioned throughput for the table was exceeded
If you are using provisioned capacity mode, sudden traffic spikes will exceed your configured RCUs/WCUs. Switch to on-demand capacity mode for unpredictable workloads, or enable DynamoDB auto-scaling with appropriate minimum and maximum values. In code, always implement retries with exponential backoff for throttled requests — the AWS SDK does this by default but you may need to increase maxRetries.
Best Practices
Use TTL everywhere. Every piece of temporary state — locks, sessions, idempotency records, cache entries — must have an expiration. Leaked state accumulates cost and creates stale-data bugs that are painful to diagnose.
Design for idempotency from day one. Assume every Lambda will be invoked at least twice. Use idempotency keys for all write operations, especially payments, inventory changes, and notifications. Retrofitting idempotency into an existing system is significantly harder than building it in from the start.
Prefer DynamoDB transactions over distributed locks. DynamoDB TransactWriteItems gives you ACID guarantees across up to 100 items. This is cheaper, faster, and more reliable than managing your own lock table for most use cases.
Keep Step Functions state small. The maximum payload between states is 256KB. Pass references (IDs, S3 keys) instead of full data objects. Large payloads between states slow down execution and hit size limits unexpectedly.
Cache aggressively but invalidate correctly. Lambda global scope caching is free and fast. Use it for anything that changes infrequently. For shared state, use Redis with short TTLs and explicit invalidation on writes. Stale cache reads are a common source of bugs that only manifest under load.
Set
callbackWaitsForEmptyEventLoop = false. Do this in every Lambda that opens persistent connections. The default behavior causes mysterious timeouts that waste both compute time and debugging effort.Monitor state growth. DynamoDB items have a 400KB limit. Redis keys can grow unbounded. S3 buckets accumulate cost. Set up CloudWatch alarms for table size, cache eviction rates, and state store item counts. State management problems often start small and compound exponentially.
Use DynamoDB Streams for state change propagation. Rather than making synchronous calls to update derived state (search indexes, analytics, notifications), let DynamoDB Streams trigger Lambda functions asynchronously. This decouples your write path from downstream consumers and improves latency.
Version your state schemas. When your state structure changes, include a schema version field. This lets you handle migrations gracefully by detecting old versions and transforming them on read rather than running batch updates across millions of items.
References
- AWS Lambda Execution Environment — official documentation on Lambda container reuse and execution context
- DynamoDB Best Practices — single-table design, partition key strategies, and capacity planning
- AWS Step Functions Developer Guide — state machine definition language and execution semantics
- DynamoDB Transactions — ACID transaction support for multi-item operations
- Lambda and VPC Best Practices — networking configuration for Redis and RDS access
- Idempotency Patterns for Lambda — Amazon Builders Library guide on idempotent API design