Agents

Agent Memory Systems: Short-Term and Long-Term

Build agent memory systems with short-term buffers, long-term PostgreSQL storage, episodic/semantic memory, and vector retrieval in Node.js.

Agent Memory Systems: Short-Term and Long-Term

AI agents without memory are goldfish. Every conversation starts from zero, every lesson learned evaporates the moment the context window fills up, and every user preference has to be re-stated. Building real memory systems — short-term buffers for the current session and long-term persistent stores for accumulated knowledge — is what separates a toy chatbot from an agent that actually improves over time.

This article walks through the architecture of agent memory systems, from in-memory conversation buffers through PostgreSQL-backed episodic and semantic memory with vector retrieval, and shows you how to wire it all together in Node.js.

Prerequisites

  • Node.js v18+ and npm
  • PostgreSQL 15+ with the pgvector extension installed
  • An OpenAI API key (for generating embeddings)
  • Working knowledge of Express.js and SQL
  • Familiarity with LLM prompt construction and token limits

Why Agents Need Memory Beyond the Context Window

Every LLM has a finite context window. Even models advertising 128k or 200k tokens hit a wall eventually. But the real problem is not just size — it is cost and relevance. Stuffing 100k tokens of raw conversation history into every prompt is expensive, slow, and noisy. Most of that history is irrelevant to the current query.

Memory systems solve this by acting as an external brain. Short-term memory holds what matters right now — the current conversation thread, intermediate reasoning steps, scratchpad notes. Long-term memory persists across sessions and stores distilled knowledge: what happened in past conversations, what facts the agent has learned, what tool-usage patterns work best.

The human brain does this naturally. Working memory holds roughly seven items. Important experiences get consolidated into long-term memory during sleep. Irrelevant details get forgotten. Agent memory systems should mirror this architecture.

Short-Term Memory

Short-term memory lives in-process. It exists for the duration of a session and gets discarded (or selectively consolidated) when the session ends.

Conversation Buffer

The simplest form of short-term memory is a conversation buffer — an ordered list of messages in the current interaction.

var EventEmitter = require("events");

function ConversationBuffer(options) {
  options = options || {};
  this.maxMessages = options.maxMessages || 50;
  this.maxTokens = options.maxTokens || 8000;
  this.messages = [];
  this.emitter = new EventEmitter();
}

ConversationBuffer.prototype.add = function (role, content, metadata) {
  var message = {
    role: role,
    content: content,
    timestamp: Date.now(),
    tokenEstimate: Math.ceil(content.length / 4),
    metadata: metadata || {}
  };

  this.messages.push(message);
  this._enforceLimit();
  this.emitter.emit("message_added", message);
  return message;
};

ConversationBuffer.prototype._enforceLimit = function () {
  // Remove oldest messages if we exceed count limit
  while (this.messages.length > this.maxMessages) {
    var removed = this.messages.shift();
    this.emitter.emit("message_evicted", removed);
  }

  // Remove oldest messages if we exceed token budget
  var totalTokens = this._estimateTokens();
  while (totalTokens > this.maxTokens && this.messages.length > 1) {
    var removed = this.messages.shift();
    this.emitter.emit("message_evicted", removed);
    totalTokens = this._estimateTokens();
  }
};

ConversationBuffer.prototype._estimateTokens = function () {
  var total = 0;
  for (var i = 0; i < this.messages.length; i++) {
    total += this.messages[i].tokenEstimate;
  }
  return total;
};

ConversationBuffer.prototype.getMessages = function () {
  return this.messages.map(function (m) {
    return { role: m.role, content: m.content };
  });
};

ConversationBuffer.prototype.getRecent = function (n) {
  return this.messages.slice(-n);
};

ConversationBuffer.prototype.clear = function () {
  this.messages = [];
};

The key design decisions here: a dual limit on both message count and estimated tokens, and an event emitter so that evicted messages can be intercepted by a consolidation layer (more on that later).

Working Memory (Scratchpad)

Beyond conversation history, agents need a scratchpad for intermediate state — partial results, extracted entities, in-progress reasoning chains. This is working memory.

function WorkingMemory() {
  this.entries = {};
  this.created = Date.now();
}

WorkingMemory.prototype.set = function (key, value, ttlMs) {
  this.entries[key] = {
    value: value,
    createdAt: Date.now(),
    expiresAt: ttlMs ? Date.now() + ttlMs : null
  };
};

WorkingMemory.prototype.get = function (key) {
  var entry = this.entries[key];
  if (!entry) return null;
  if (entry.expiresAt && Date.now() > entry.expiresAt) {
    delete this.entries[key];
    return null;
  }
  return entry.value;
};

WorkingMemory.prototype.getAll = function () {
  var self = this;
  var result = {};
  var keys = Object.keys(this.entries);
  keys.forEach(function (key) {
    var val = self.get(key); // triggers TTL check
    if (val !== null) {
      result[key] = val;
    }
  });
  return result;
};

WorkingMemory.prototype.remove = function (key) {
  delete this.entries[key];
};

WorkingMemory.prototype.toPromptString = function () {
  var all = this.getAll();
  var keys = Object.keys(all);
  if (keys.length === 0) return "";
  var lines = keys.map(function (k) {
    var val = typeof all[k] === "string" ? all[k] : JSON.stringify(all[k]);
    return "- " + k + ": " + val;
  });
  return "## Current Working Memory\n" + lines.join("\n");
};

Working memory entries have optional TTLs. An entity extracted from a user message might be relevant for 5 minutes but not for the rest of the session. This prevents stale intermediate state from cluttering prompts.

Long-Term Memory Types

Long-term memory persists across sessions and serves three distinct purposes, mirroring how human long-term memory is categorized in cognitive science.

Episodic Memory

Episodic memory records specific past interactions — what the user asked, what the agent did, what the outcome was. Think of it as a diary. It answers questions like "What did we talk about last Tuesday?" or "What happened the last time I asked about database migrations?"

Semantic Memory

Semantic memory stores facts, knowledge, and generalizations. It is the agent's accumulated knowledge base, detached from specific episodes. "The user prefers TypeScript" or "The production database runs PostgreSQL 15" are semantic memories.

Procedural Memory

Procedural memory captures learned behaviors — which tools work well for which tasks, what prompt patterns produce good results, what sequences of actions reliably solve particular problem types. It is the agent's muscle memory.

Implementing Long-Term Memory with PostgreSQL

PostgreSQL with the pgvector extension gives us relational storage for structured memory records and vector similarity search for semantic retrieval — all in one database.

Schema

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE agent_memories (
    id SERIAL PRIMARY KEY,
    agent_id VARCHAR(64) NOT NULL,
    user_id VARCHAR(64),
    memory_type VARCHAR(20) NOT NULL CHECK (memory_type IN ('episodic', 'semantic', 'procedural')),
    content TEXT NOT NULL,
    summary TEXT,
    embedding vector(1536),
    importance REAL DEFAULT 0.5,
    access_count INTEGER DEFAULT 0,
    last_accessed TIMESTAMPTZ,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    expires_at TIMESTAMPTZ,
    metadata JSONB DEFAULT '{}'::jsonb
);

CREATE INDEX idx_memories_agent ON agent_memories(agent_id);
CREATE INDEX idx_memories_type ON agent_memories(agent_id, memory_type);
CREATE INDEX idx_memories_embedding ON agent_memories
    USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
CREATE INDEX idx_memories_importance ON agent_memories(agent_id, importance DESC);
CREATE INDEX idx_memories_metadata ON agent_memories USING gin (metadata);

The schema stores all three memory types in one table with a discriminator column. The embedding column holds 1536-dimensional vectors from OpenAI's text-embedding-3-small model. The importance score (0.0 to 1.0) lets retrieval prioritize high-value memories.

Long-Term Memory Store

var pg = require("pg");
var fetch = require("node-fetch");

function LongTermMemory(options) {
  this.pool = new pg.Pool({
    connectionString: options.connectionString
  });
  this.agentId = options.agentId;
  this.openaiKey = options.openaiKey;
  this.embeddingModel = options.embeddingModel || "text-embedding-3-small";
}

LongTermMemory.prototype.generateEmbedding = function (text) {
  var self = this;
  return fetch("https://api.openai.com/v1/embeddings", {
    method: "POST",
    headers: {
      "Authorization": "Bearer " + self.openaiKey,
      "Content-Type": "application/json"
    },
    body: JSON.stringify({
      model: self.embeddingModel,
      input: text.substring(0, 8000)
    })
  })
    .then(function (res) { return res.json(); })
    .then(function (data) {
      if (data.error) {
        throw new Error("Embedding error: " + data.error.message);
      }
      return data.data[0].embedding;
    });
};

LongTermMemory.prototype.store = function (memoryType, content, options) {
  var self = this;
  options = options || {};

  return self.generateEmbedding(content).then(function (embedding) {
    var query = [
      "INSERT INTO agent_memories",
      "(agent_id, user_id, memory_type, content, summary, embedding,",
      " importance, expires_at, metadata)",
      "VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9)",
      "RETURNING id"
    ].join(" ");

    var embeddingStr = "[" + embedding.join(",") + "]";

    var params = [
      self.agentId,
      options.userId || null,
      memoryType,
      content,
      options.summary || null,
      embeddingStr,
      options.importance || 0.5,
      options.expiresAt || null,
      JSON.stringify(options.metadata || {})
    ];

    return self.pool.query(query, params);
  }).then(function (result) {
    return result.rows[0].id;
  });
};

Episodic Memory: Storing Past Interactions

Episodic memories capture the full arc of an interaction — what was asked, what the agent did, what the outcome was.

LongTermMemory.prototype.storeEpisode = function (episode) {
  var content = [
    "User request: " + episode.userMessage,
    "Agent action: " + episode.agentAction,
    "Outcome: " + episode.outcome,
    "Tools used: " + (episode.toolsUsed || []).join(", ")
  ].join("\n");

  var summary = "User asked about " + episode.topic +
    ". Agent " + episode.agentAction.substring(0, 100) +
    ". Result: " + episode.outcome.substring(0, 100);

  return this.store("episodic", content, {
    userId: episode.userId,
    summary: summary,
    importance: episode.importance || 0.5,
    metadata: {
      topic: episode.topic,
      toolsUsed: episode.toolsUsed,
      successful: episode.successful,
      sessionId: episode.sessionId
    }
  });
};

Semantic Memory: Knowledge Base with Vector Search

Semantic memories are facts and preferences extracted from conversations. They get stored with embeddings and retrieved via vector similarity.

LongTermMemory.prototype.storeKnowledge = function (fact, options) {
  options = options || {};
  return this.store("semantic", fact, {
    userId: options.userId,
    summary: fact.substring(0, 200),
    importance: options.importance || 0.6,
    metadata: {
      source: options.source || "conversation",
      category: options.category || "general",
      confidence: options.confidence || 0.8
    }
  });
};

LongTermMemory.prototype.searchSemantic = function (query, options) {
  var self = this;
  options = options || {};
  var limit = options.limit || 5;
  var threshold = options.threshold || 0.7;

  return self.generateEmbedding(query).then(function (queryEmbedding) {
    var embeddingStr = "[" + queryEmbedding.join(",") + "]";
    var sql = [
      "SELECT id, content, summary, importance, metadata,",
      "  1 - (embedding <=> $1::vector) AS similarity,",
      "  created_at, access_count",
      "FROM agent_memories",
      "WHERE agent_id = $2",
      "  AND memory_type = $3",
      "  AND (expires_at IS NULL OR expires_at > NOW())",
      "  AND 1 - (embedding <=> $1::vector) > $4",
      "ORDER BY similarity DESC",
      "LIMIT $5"
    ].join(" ");

    return self.pool.query(sql, [embeddingStr, self.agentId, "semantic", threshold, limit]);
  }).then(function (result) {
    // Update access counts
    var ids = result.rows.map(function (r) { return r.id; });
    if (ids.length > 0) {
      self.pool.query(
        "UPDATE agent_memories SET access_count = access_count + 1, last_accessed = NOW() WHERE id = ANY($1)",
        [ids]
      );
    }
    return result.rows;
  });
};

The <=> operator is pgvector's cosine distance. We convert it to similarity with 1 - distance. The threshold parameter (default 0.7) filters out weakly related memories so the agent does not get distracted by noise.

Procedural Memory: Learned Tool Patterns

Procedural memory records what the agent has learned about how to accomplish tasks effectively.

LongTermMemory.prototype.storeToolPattern = function (pattern) {
  var content = [
    "Task type: " + pattern.taskType,
    "Tool sequence: " + pattern.toolSequence.join(" -> "),
    "Success rate: " + pattern.successRate,
    "Avg duration: " + pattern.avgDuration + "ms",
    "Notes: " + (pattern.notes || "none")
  ].join("\n");

  return this.store("procedural", content, {
    importance: Math.min(pattern.successRate, 1.0),
    metadata: {
      taskType: pattern.taskType,
      toolSequence: pattern.toolSequence,
      successRate: pattern.successRate,
      sampleSize: pattern.sampleSize
    }
  });
};

LongTermMemory.prototype.getToolPatterns = function (taskDescription) {
  var self = this;
  return self.generateEmbedding(taskDescription).then(function (embedding) {
    var embeddingStr = "[" + embedding.join(",") + "]";
    var sql = [
      "SELECT content, metadata,",
      "  1 - (embedding <=> $1::vector) AS similarity",
      "FROM agent_memories",
      "WHERE agent_id = $2 AND memory_type = 'procedural'",
      "  AND (metadata->>'successRate')::float > 0.6",
      "ORDER BY similarity DESC, importance DESC",
      "LIMIT 3"
    ].join(" ");

    return self.pool.query(sql, [embeddingStr, self.agentId]);
  }).then(function (result) {
    return result.rows;
  });
};

This is where agents get genuinely useful over time. After the agent discovers that a particular sequence of API calls reliably solves a class of problems, that pattern gets recorded. Future encounters with similar tasks surface the proven approach.

Memory Retrieval Strategies

Retrieval is the critical bottleneck. Storing memories is easy. Getting the right ones back at the right time is the hard part.

Hybrid Scoring

Pure vector similarity is not enough. A memory from five minutes ago about the current topic should outrank a slightly more similar memory from six months ago about something tangential. Hybrid scoring combines multiple signals.

function scoreMemory(memory, queryEmbedding, now) {
  var similarity = memory.similarity || 0;
  var recencyMs = now - new Date(memory.created_at).getTime();
  var recencyDays = recencyMs / (1000 * 60 * 60 * 24);
  var recencyScore = Math.exp(-0.05 * recencyDays); // exponential decay

  var importance = memory.importance || 0.5;
  var accessFrequency = Math.min(memory.access_count / 20, 1.0);

  // Weighted combination
  var score = (
    similarity * 0.40 +
    recencyScore * 0.25 +
    importance * 0.25 +
    accessFrequency * 0.10
  );

  return score;
}

LongTermMemory.prototype.retrieve = function (query, options) {
  var self = this;
  options = options || {};
  var limit = options.limit || 10;
  var finalLimit = options.finalLimit || 5;

  return self.generateEmbedding(query).then(function (queryEmbedding) {
    var embeddingStr = "[" + queryEmbedding.join(",") + "]";
    // Fetch more candidates than needed, then re-rank
    var sql = [
      "SELECT id, content, summary, importance, metadata,",
      "  1 - (embedding <=> $1::vector) AS similarity,",
      "  created_at, access_count, memory_type",
      "FROM agent_memories",
      "WHERE agent_id = $2",
      "  AND (expires_at IS NULL OR expires_at > NOW())",
      "ORDER BY embedding <=> $1::vector",
      "LIMIT $3"
    ].join(" ");

    return self.pool.query(sql, [embeddingStr, self.agentId, limit])
      .then(function (result) {
        var now = Date.now();
        var scored = result.rows.map(function (row) {
          row.compositeScore = scoreMemory(row, queryEmbedding, now);
          return row;
        });

        scored.sort(function (a, b) {
          return b.compositeScore - a.compositeScore;
        });

        return scored.slice(0, finalLimit);
      });
  });
};

The weights here are a starting point. In production, you would tune them based on your agent's domain. A customer support agent might weight recency higher. A research agent might weight importance and similarity higher.

Memory Consolidation

Memory consolidation is the bridge between short-term and long-term memory. When a conversation ends (or a buffer evicts messages), the system decides what is worth persisting.

function MemoryConsolidator(options) {
  this.longTermMemory = options.longTermMemory;
  this.llmClient = options.llmClient;
  this.minImportance = options.minImportance || 0.4;
}

MemoryConsolidator.prototype.consolidate = function (messages, userId) {
  var self = this;

  // Use the LLM to extract memorable content
  var transcript = messages.map(function (m) {
    return m.role + ": " + m.content;
  }).join("\n");

  var extractionPrompt = [
    "Analyze this conversation and extract memories worth preserving.",
    "Return a JSON array of objects with these fields:",
    "- type: 'episodic', 'semantic', or 'procedural'",
    "- content: the memory content",
    "- importance: 0.0 to 1.0",
    "- topic: brief topic label",
    "",
    "Only extract genuinely useful information. Skip small talk and filler.",
    "Episodic: specific interactions with clear outcomes.",
    "Semantic: facts, preferences, or knowledge learned.",
    "Procedural: successful strategies or tool patterns discovered.",
    "",
    "Conversation:",
    transcript
  ].join("\n");

  return self.llmClient.complete(extractionPrompt).then(function (response) {
    var memories;
    try {
      memories = JSON.parse(response);
    } catch (e) {
      console.error("Failed to parse consolidation output:", e.message);
      return [];
    }

    var storePromises = [];
    memories.forEach(function (memory) {
      if (memory.importance < self.minImportance) return;

      var promise = self.longTermMemory.store(memory.type, memory.content, {
        userId: userId,
        importance: memory.importance,
        metadata: { topic: memory.topic, source: "consolidation" }
      });
      storePromises.push(promise);
    });

    return Promise.all(storePromises);
  });
};

The consolidation step uses the LLM itself to decide what is worth remembering. This is expensive but effective. A cheaper alternative is rule-based extraction: always store messages containing entities, always store successful tool invocations, never store greetings.

Forgetting Mechanisms

Memory systems that never forget eventually drown in noise. Intentional forgetting is as important as intentional remembering.

LongTermMemory.prototype.decay = function () {
  // Remove expired memories
  var expiredQuery = "DELETE FROM agent_memories WHERE expires_at IS NOT NULL AND expires_at < NOW()";

  // Reduce importance of old, infrequently accessed memories
  var decayQuery = [
    "UPDATE agent_memories",
    "SET importance = importance * 0.95",
    "WHERE last_accessed < NOW() - INTERVAL '30 days'",
    "  AND access_count < 3",
    "  AND importance > 0.1"
  ].join(" ");

  // Delete memories that have decayed below threshold
  var pruneQuery = "DELETE FROM agent_memories WHERE importance < 0.1 AND access_count < 2";

  var self = this;
  return self.pool.query(expiredQuery)
    .then(function (r) {
      console.log("Expired memories removed:", r.rowCount);
      return self.pool.query(decayQuery);
    })
    .then(function (r) {
      console.log("Memories decayed:", r.rowCount);
      return self.pool.query(pruneQuery);
    })
    .then(function (r) {
      console.log("Low-value memories pruned:", r.rowCount);
    });
};

Run this on a cron schedule — daily or weekly depending on volume. The three-step process mirrors how human memory works: explicit expiration (TTL), gradual fading (decay), and pruning of weak memories.

Memory-Augmented Prompt Construction

The final piece is assembling all memory sources into a coherent prompt that gives the LLM the context it needs without blowing the token budget.

function MemoryAugmentedPromptBuilder(options) {
  this.conversationBuffer = options.conversationBuffer;
  this.workingMemory = options.workingMemory;
  this.longTermMemory = options.longTermMemory;
  this.maxMemoryTokens = options.maxMemoryTokens || 2000;
}

MemoryAugmentedPromptBuilder.prototype.build = function (systemPrompt, userMessage) {
  var self = this;

  return self.longTermMemory.retrieve(userMessage, { finalLimit: 5 })
    .then(function (memories) {
      var parts = [];

      // System prompt
      parts.push({ role: "system", content: systemPrompt });

      // Long-term memory context
      if (memories.length > 0) {
        var memoryBlock = "## Relevant Memories\n";
        memories.forEach(function (mem) {
          var typeLabel = "[" + mem.memory_type.toUpperCase() + "]";
          var score = mem.compositeScore.toFixed(2);
          memoryBlock += typeLabel + " (relevance: " + score + ") " +
            (mem.summary || mem.content.substring(0, 200)) + "\n\n";
        });

        var tokenEstimate = Math.ceil(memoryBlock.length / 4);
        if (tokenEstimate <= self.maxMemoryTokens) {
          parts.push({
            role: "system",
            content: memoryBlock
          });
        }
      }

      // Working memory
      var workingStr = self.workingMemory.toPromptString();
      if (workingStr) {
        parts.push({ role: "system", content: workingStr });
      }

      // Conversation history
      var history = self.conversationBuffer.getMessages();
      history.forEach(function (msg) {
        parts.push(msg);
      });

      // Current user message
      parts.push({ role: "user", content: userMessage });

      return parts;
    });
};

The ordering matters. System prompt first, then long-term memory context, then working memory, then conversation history, then the current message. This puts the most stable context at the top (where the LLM is least likely to lose track of it) and the most dynamic context at the bottom (where recency bias helps).

Complete Working Example

Here is the full agent memory system wired together in a single Express endpoint.

var express = require("express");
var pg = require("pg");

// -- Initialize components --

var pool = new pg.Pool({
  connectionString: process.env.POSTGRES_CONNECTION_STRING
});

var conversationBuffers = {}; // sessionId -> ConversationBuffer
var workingMemories = {};     // sessionId -> WorkingMemory

var longTermMemory = new LongTermMemory({
  connectionString: process.env.POSTGRES_CONNECTION_STRING,
  agentId: "support-agent-v1",
  openaiKey: process.env.OPENAI_API_KEY
});

var consolidator = new MemoryConsolidator({
  longTermMemory: longTermMemory,
  llmClient: { complete: callLLM }, // your LLM wrapper
  minImportance: 0.4
});

function getSessionMemory(sessionId) {
  if (!conversationBuffers[sessionId]) {
    conversationBuffers[sessionId] = new ConversationBuffer({
      maxMessages: 40,
      maxTokens: 6000
    });

    // Wire up consolidation on eviction
    conversationBuffers[sessionId].emitter.on("message_evicted", function (msg) {
      // Batch evicted messages for consolidation
      if (!conversationBuffers[sessionId]._evicted) {
        conversationBuffers[sessionId]._evicted = [];
      }
      conversationBuffers[sessionId]._evicted.push(msg);
    });
  }
  if (!workingMemories[sessionId]) {
    workingMemories[sessionId] = new WorkingMemory();
  }

  return {
    buffer: conversationBuffers[sessionId],
    working: workingMemories[sessionId]
  };
}

var app = express();
app.use(express.json());

app.post("/agent/chat", function (req, res) {
  var sessionId = req.body.sessionId;
  var userId = req.body.userId;
  var userMessage = req.body.message;

  if (!sessionId || !userMessage) {
    return res.status(400).json({ error: "sessionId and message required" });
  }

  var session = getSessionMemory(sessionId);

  // Add user message to buffer
  session.buffer.add("user", userMessage, { userId: userId });

  // Build memory-augmented prompt
  var builder = new MemoryAugmentedPromptBuilder({
    conversationBuffer: session.buffer,
    workingMemory: session.working,
    longTermMemory: longTermMemory,
    maxMemoryTokens: 2000
  });

  var systemPrompt = [
    "You are a helpful support agent.",
    "Use the provided memories to personalize your responses.",
    "If you learn new facts about the user, note them explicitly."
  ].join(" ");

  builder.build(systemPrompt, userMessage)
    .then(function (messages) {
      return callLLM(messages);
    })
    .then(function (response) {
      // Add assistant response to buffer
      session.buffer.add("assistant", response);

      // Extract any new knowledge the agent identified
      extractAndStoreKnowledge(response, userId);

      res.json({
        response: response,
        sessionId: sessionId
      });
    })
    .catch(function (err) {
      console.error("Agent chat error:", err);
      res.status(500).json({ error: "Internal agent error" });
    });
});

// Session end -> consolidate memories
app.post("/agent/end-session", function (req, res) {
  var sessionId = req.body.sessionId;
  var userId = req.body.userId;
  var session = getSessionMemory(sessionId);

  consolidator.consolidate(session.buffer.messages, userId)
    .then(function (stored) {
      // Cleanup
      delete conversationBuffers[sessionId];
      delete workingMemories[sessionId];
      res.json({ consolidated: stored.length });
    })
    .catch(function (err) {
      console.error("Consolidation error:", err);
      res.status(500).json({ error: "Consolidation failed" });
    });
});

function extractAndStoreKnowledge(response, userId) {
  // Simple pattern: if the agent says "I'll remember that" or "noted",
  // extract the preceding user message as semantic memory
  if (/i('ll| will) remember|noted|got it/i.test(response)) {
    var session = Object.values(conversationBuffers)[0]; // simplified
    if (session && session.messages.length >= 2) {
      var lastUserMsg = session.messages[session.messages.length - 2];
      if (lastUserMsg.role === "user") {
        longTermMemory.storeKnowledge(lastUserMsg.content, {
          userId: userId,
          importance: 0.7,
          source: "explicit_acknowledgment"
        }).catch(function (err) {
          console.error("Knowledge storage error:", err);
        });
      }
    }
  }
}

function callLLM(messagesOrPrompt) {
  // Implement your LLM call here (OpenAI, Anthropic, etc.)
  // Return a promise that resolves with the response text
  var fetch = require("node-fetch");

  var messages = Array.isArray(messagesOrPrompt)
    ? messagesOrPrompt
    : [{ role: "user", content: messagesOrPrompt }];

  return fetch("https://api.openai.com/v1/chat/completions", {
    method: "POST",
    headers: {
      "Authorization": "Bearer " + process.env.OPENAI_API_KEY,
      "Content-Type": "application/json"
    },
    body: JSON.stringify({
      model: "gpt-4o",
      messages: messages,
      temperature: 0.7
    })
  })
    .then(function (r) { return r.json(); })
    .then(function (data) {
      return data.choices[0].message.content;
    });
}

// Run memory decay daily
var cron = require("node-cron");
cron.schedule("0 3 * * *", function () {
  console.log("Running memory decay...");
  longTermMemory.decay().catch(function (err) {
    console.error("Memory decay error:", err);
  });
});

var port = process.env.PORT || 3000;
app.listen(port, function () {
  console.log("Agent with memory running on port " + port);
});

This gives you a fully functional agent with conversation buffering, working memory, cross-session episodic and semantic storage, vector-based retrieval with hybrid scoring, LLM-powered consolidation at session end, and automatic memory decay.

Common Issues and Troubleshooting

1. pgvector Extension Not Found

ERROR: could not open extension control file
  "/usr/share/postgresql/15/extension/vector.control": No such file or directory

This means pgvector is not installed on your PostgreSQL server. On Ubuntu: sudo apt install postgresql-15-pgvector. On macOS with Homebrew: brew install pgvector. On managed databases (AWS RDS, DigitalOcean), enable it through the provider's dashboard — not all tiers support it.

2. Embedding Dimension Mismatch

ERROR: expected 1536 dimensions, not 3072

This happens when you switch embedding models without updating the schema. The text-embedding-3-small model produces 1536-dimensional vectors. The text-embedding-3-large model produces 3072. If you change models, you must alter the column: ALTER TABLE agent_memories ALTER COLUMN embedding TYPE vector(3072) and regenerate all existing embeddings. Do not mix dimensions.

3. IVFFlat Index Requires Training Data

ERROR: index "idx_memories_embedding" is not valid

IVFFlat indexes require a minimum number of rows before they work correctly. The lists parameter (set to 100 in our schema) should be roughly sqrt(n) where n is your expected row count. If you have fewer than 100 rows, the index degrades. Either reduce lists (e.g., to 10) for small datasets, or switch to HNSW indexing: USING hnsw (embedding vector_cosine_ops) which works well at any scale.

4. Memory Consolidation Produces Invalid JSON

SyntaxError: Unexpected token < in JSON at position 0

The LLM sometimes returns markdown-wrapped JSON (json ... ) or adds explanatory text before the array. Wrap the parse in a more resilient extractor:

function parseJsonFromLLM(text) {
  // Strip markdown code fences
  var cleaned = text.replace(/```json?\n?/g, "").replace(/```/g, "").trim();

  // Try to find a JSON array
  var match = cleaned.match(/\[[\s\S]*\]/);
  if (match) {
    return JSON.parse(match[0]);
  }
  throw new Error("No JSON array found in LLM output");
}

5. Token Budget Overflows on Memory-Heavy Prompts

Error: This model's maximum context length is 128000 tokens.
  However, your messages resulted in 135421 tokens.

This happens when too many memories get injected. Always enforce a hard token cap in the prompt builder. Count estimated tokens for each memory block and stop adding memories once you hit the budget. The maxMemoryTokens parameter in MemoryAugmentedPromptBuilder handles this, but you should also add per-memory truncation:

var truncatedContent = mem.content.length > 500
  ? mem.content.substring(0, 500) + "..."
  : mem.content;

Best Practices

  • Separate memory types rigorously. Episodic, semantic, and procedural memories have different lifecycles, retrieval patterns, and decay rates. Storing them in one table with a discriminator column is fine for queries, but do not let the boundaries blur in your application logic.

  • Embed summaries, not raw content. Generating embeddings from a 200-word summary produces better retrieval than embedding a 5000-word transcript. The embedding captures the gist, which is what you want for similarity search. Store the full content separately for when you need the details.

  • Budget tokens aggressively for memory context. Reserve no more than 15-20% of your total context window for memory injection. If your model supports 128k tokens, cap memory at 20k. The rest needs to go to the system prompt, conversation history, and leaving room for the response.

  • Run memory decay on a schedule, not inline. Decay and pruning queries touch a lot of rows. Running them inside a request handler adds latency. Use a cron job or a background worker. Daily is a good starting frequency.

  • Use importance scoring as a first-class signal. Not all memories are created equal. A user's stated preference ("I always deploy to us-east-1") is more important than a casual remark ("nice weather today"). Score importance at write time and use it as a retrieval weight.

  • Test retrieval quality, not just retrieval speed. Build a small evaluation set: 20-30 queries where you know which memories should be returned. Run retrieval and score precision/recall. Tune your hybrid scoring weights based on actual results, not intuition.

  • Implement explicit memory commands. Let users say "remember that I prefer dark mode" or "forget what I told you about my old API key." Explicit memory operations build trust and give users control. Parse these commands before they hit the LLM.

  • Keep embedding model versions pinned. Changing from text-embedding-3-small to a different model changes the vector space. Old embeddings become incompatible with new queries. If you must migrate, re-embed everything in a batch job and swap atomically.

  • Log consolidation decisions. When the LLM extracts memories during consolidation, log what it extracted and what it ignored. This is your audit trail for debugging weird agent behavior ("why does it think I like TypeScript?"). Store the raw consolidation output alongside the individual memories.

References

Powered by Contentful