Meeting Summarization with AI
Build meeting summarization with transcription, LLM-powered summaries, action item extraction, and distribution in Node.js.
Meeting Summarization with AI
Overview
Meeting summarization is one of the highest-ROI applications of AI in any organization. The average professional spends 31 hours per month in unproductive meetings, and even productive ones generate action items that get lost in hastily scrawled notes. By building a pipeline that takes raw audio recordings, transcribes them accurately, and produces structured summaries with extracted action items, you can recover enormous amounts of lost context and accountability. This article walks through building that pipeline end-to-end in Node.js, from audio ingestion through distribution of polished summaries.
Prerequisites
- Node.js v18 or later installed
- An API key for a transcription service (OpenAI Whisper, AssemblyAI, or Deepgram)
- An Anthropic API key for Claude (summary generation)
- FFmpeg installed on your system for audio format conversion
- PostgreSQL database for storing summaries
- Basic familiarity with Express.js and REST APIs
Install the core dependencies:
npm install express multer fluent-ffmpeg @anthropic-ai/sdk axios pg nodemailer uuid
The Meeting Summarization Pipeline
The pipeline follows a clear sequence: Audio Input -> Format Conversion -> Transcription -> Transcript Cleaning -> Summary Generation -> Action Item Extraction -> Storage -> Distribution. Each stage is distinct and can be swapped out independently. This modularity is important because transcription providers change pricing constantly, LLM capabilities evolve, and distribution channels vary by organization.
Here is the high-level architecture:
[Audio Upload] -> [FFmpeg Conversion] -> [Transcription API]
|
[Raw Transcript]
|
[Cleaning & Speaker ID]
|
[Chunking if needed]
|
[LLM Summarization]
/ | \
[Executive] [Detailed] [Action Items]
\ | /
[PostgreSQL]
|
[Email / Slack Distribution]
Handling Audio Input
Meetings arrive as audio files in a bewildering array of formats. Zoom produces M4A files, Teams generates OGG, Google Meet exports to WebM, and someone will inevitably hand you a WAV file that is 2 GB. Your first job is to normalize everything into a format your transcription API can handle.
var express = require("express");
var multer = require("multer");
var ffmpeg = require("fluent-ffmpeg");
var path = require("path");
var fs = require("fs");
var uuid = require("uuid");
var upload = multer({
dest: "uploads/raw/",
limits: { fileSize: 500 * 1024 * 1024 }, // 500MB max
fileFilter: function (req, file, cb) {
var allowedTypes = [
"audio/mpeg",
"audio/wav",
"audio/ogg",
"audio/webm",
"audio/mp4",
"audio/x-m4a",
"video/webm",
"video/mp4"
];
if (allowedTypes.indexOf(file.mimetype) === -1) {
return cb(new Error("Unsupported audio format: " + file.mimetype));
}
cb(null, true);
}
});
function convertToWav(inputPath, outputDir) {
return new Promise(function (resolve, reject) {
var outputFile = path.join(outputDir, uuid.v4() + ".wav");
ffmpeg(inputPath)
.audioCodec("pcm_s16le")
.audioFrequency(16000)
.audioChannels(1)
.format("wav")
.on("end", function () {
// Clean up the original upload
fs.unlink(inputPath, function () {});
resolve(outputFile);
})
.on("error", function (err) {
reject(new Error("FFmpeg conversion failed: " + err.message));
})
.save(outputFile);
});
}
The conversion to 16kHz mono WAV is deliberate. Every major transcription API handles WAV well, mono audio avoids stereo confusion when identifying speakers, and 16kHz is the sweet spot between file size and transcription accuracy. Do not send 44.1kHz stereo audio to a transcription API unless you enjoy paying double for the same quality.
For very large files, you may want to split the audio into chunks before transcription:
function splitAudio(inputPath, chunkDurationSeconds) {
return new Promise(function (resolve, reject) {
var outputDir = path.join("uploads", "chunks", uuid.v4());
fs.mkdirSync(outputDir, { recursive: true });
ffmpeg.ffprobe(inputPath, function (err, metadata) {
if (err) return reject(err);
var duration = metadata.format.duration;
var chunks = [];
var chunkCount = Math.ceil(duration / chunkDurationSeconds);
var processed = 0;
for (var i = 0; i < chunkCount; i++) {
(function (index) {
var startTime = index * chunkDurationSeconds;
var chunkPath = path.join(outputDir, "chunk_" + index + ".wav");
chunks.push({ index: index, path: chunkPath, startTime: startTime });
ffmpeg(inputPath)
.setStartTime(startTime)
.setDuration(chunkDurationSeconds)
.audioCodec("pcm_s16le")
.audioFrequency(16000)
.audioChannels(1)
.on("end", function () {
processed++;
if (processed === chunkCount) {
resolve(chunks.sort(function (a, b) { return a.index - b.index; }));
}
})
.on("error", function (err) {
reject(err);
})
.save(chunkPath);
})(i);
}
});
});
}
Transcription Options
You have three serious options for transcription, each with distinct trade-offs.
OpenAI Whisper API
Whisper is the most widely known option. It is simple to use, handles 25MB file uploads (about 25 minutes of mono WAV at 16kHz), and costs $0.006 per minute. The accuracy is excellent for English and very good for most major languages.
var axios = require("axios");
var FormData = require("form-data");
function transcribeWithWhisper(audioPath, apiKey) {
var form = new FormData();
form.append("file", fs.createReadStream(audioPath));
form.append("model", "whisper-1");
form.append("response_format", "verbose_json");
form.append("timestamp_granularities[]", "segment");
return axios.post("https://api.openai.com/v1/audio/transcriptions", form, {
headers: Object.assign(
{ Authorization: "Bearer " + apiKey },
form.getHeaders()
),
maxContentLength: Infinity,
timeout: 300000
}).then(function (response) {
return {
text: response.data.text,
segments: response.data.segments,
language: response.data.language,
duration: response.data.duration
};
});
}
AssemblyAI
AssemblyAI offers native speaker diarization (identifying who said what), which is critical for meeting summarization. It handles files of any size through its async processing model and provides paragraph-level formatting out of the box.
function transcribeWithAssemblyAI(audioPath, apiKey) {
var uploadUrl;
// Step 1: Upload the audio file
return axios.post("https://api.assemblyai.com/v2/upload",
fs.createReadStream(audioPath),
{
headers: {
authorization: apiKey,
"content-type": "application/octet-stream"
}
}
).then(function (uploadResponse) {
uploadUrl = uploadResponse.data.upload_url;
// Step 2: Request transcription with speaker diarization
return axios.post("https://api.assemblyai.com/v2/transcript", {
audio_url: uploadUrl,
speaker_labels: true,
auto_chapters: true,
entity_detection: true
}, {
headers: { authorization: apiKey }
});
}).then(function (transcriptResponse) {
var transcriptId = transcriptResponse.data.id;
return pollTranscription(transcriptId, apiKey);
});
}
function pollTranscription(transcriptId, apiKey) {
return new Promise(function (resolve, reject) {
var interval = setInterval(function () {
axios.get("https://api.assemblyai.com/v2/transcript/" + transcriptId, {
headers: { authorization: apiKey }
}).then(function (response) {
if (response.data.status === "completed") {
clearInterval(interval);
resolve({
text: response.data.text,
utterances: response.data.utterances,
chapters: response.data.chapters,
entities: response.data.entities
});
} else if (response.data.status === "error") {
clearInterval(interval);
reject(new Error("Transcription failed: " + response.data.error));
}
}).catch(function (err) {
clearInterval(interval);
reject(err);
});
}, 5000);
});
}
Deepgram
Deepgram is the fastest option and supports real-time streaming transcription, which matters if you want live meeting summarization. Its pricing is competitive and its speaker diarization is strong.
function transcribeWithDeepgram(audioPath, apiKey) {
var audioData = fs.readFileSync(audioPath);
return axios.post(
"https://api.deepgram.com/v1/listen?model=nova-2&smart_format=true&diarize=true¶graphs=true&utterances=true",
audioData,
{
headers: {
Authorization: "Token " + apiKey,
"Content-Type": "audio/wav"
},
timeout: 300000
}
).then(function (response) {
var result = response.data.results;
return {
text: result.channels[0].alternatives[0].transcript,
paragraphs: result.channels[0].alternatives[0].paragraphs,
utterances: result.utterances
};
});
}
My recommendation: use AssemblyAI for production meeting summarization. The built-in speaker diarization is worth the slightly higher cost, and the async model handles arbitrarily long recordings without chunking gymnastics. Use Deepgram if you need real-time streaming. Use Whisper only if you are already deeply invested in the OpenAI ecosystem and do not need speaker identification.
Transcript Cleaning and Speaker Identification
Raw transcripts are messy. They contain filler words, false starts, repeated phrases, and speaker labels like "Speaker A" instead of actual names. Cleaning them up before summarization dramatically improves output quality.
function cleanTranscript(utterances, speakerMap) {
var cleaned = utterances.map(function (utterance) {
var speakerName = speakerMap[utterance.speaker] || "Speaker " + utterance.speaker;
var text = utterance.text
.replace(/\b(um|uh|like|you know|basically|actually|literally)\b/gi, "")
.replace(/\s{2,}/g, " ")
.trim();
return {
speaker: speakerName,
text: text,
start: utterance.start,
end: utterance.end
};
}).filter(function (utterance) {
return utterance.text.length > 5; // Drop near-empty utterances
});
return cleaned;
}
function formatTranscriptForLLM(cleanedUtterances) {
var formatted = "";
var currentSpeaker = "";
cleanedUtterances.forEach(function (utterance) {
if (utterance.speaker !== currentSpeaker) {
currentSpeaker = utterance.speaker;
formatted += "\n**" + currentSpeaker + ":** ";
}
formatted += utterance.text + " ";
});
return formatted.trim();
}
For speaker identification, you can either accept a manual mapping (participants enter their names before uploading) or use a heuristic approach. One effective pattern is to ask the LLM to identify speakers from context during the summarization step, since people often introduce themselves or are addressed by name.
Generating Structured Meeting Summaries
This is where the real value lives. You feed the cleaned transcript to an LLM and get back structured, actionable output. The prompt engineering here matters enormously. A vague prompt produces vague summaries.
var Anthropic = require("@anthropic-ai/sdk");
var anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
function generateMeetingSummary(transcript, meetingContext) {
var systemPrompt = [
"You are an expert meeting summarizer. You produce precise, actionable meeting summaries.",
"You never invent information that is not in the transcript.",
"You identify action items with specific owners and deadlines when mentioned.",
"You distinguish between decisions made and topics merely discussed.",
"Format all output as valid JSON matching the schema provided."
].join(" ");
var userPrompt = "Here is the meeting transcript:\n\n" + transcript + "\n\n";
if (meetingContext) {
userPrompt += "Meeting context:\n";
userPrompt += "- Title: " + (meetingContext.title || "Unknown") + "\n";
userPrompt += "- Date: " + (meetingContext.date || "Unknown") + "\n";
userPrompt += "- Attendees: " + (meetingContext.attendees || "Unknown") + "\n\n";
}
userPrompt += [
"Produce a JSON object with exactly these fields:",
"{",
' "executiveSummary": "2-3 sentence high-level summary",',
' "keyDecisions": [{"decision": "...", "madeBy": "...", "context": "..."}],',
' "actionItems": [{"task": "...", "owner": "...", "deadline": "...", "priority": "high|medium|low"}],',
' "discussionTopics": [{"topic": "...", "summary": "...", "outcome": "decided|tabled|needs-followup"}],',
' "attendeeSummary": [{"name": "...", "contributions": "brief summary of their key points"}],',
' "followUpMeetingNeeded": true/false,',
' "followUpTopics": ["..."]',
"}"
].join("\n");
return anthropic.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 4096,
system: systemPrompt,
messages: [{ role: "user", content: userPrompt }]
}).then(function (response) {
var content = response.content[0].text;
// Extract JSON from potential markdown code blocks
var jsonMatch = content.match(/```(?:json)?\s*([\s\S]*?)```/);
var jsonStr = jsonMatch ? jsonMatch[1] : content;
return JSON.parse(jsonStr.trim());
});
}
Handling Long Meetings
A one-hour meeting produces roughly 10,000-15,000 words of transcript. That exceeds the practical context window for producing high-quality summaries with most models. The solution is hierarchical summarization: chunk the transcript, summarize each chunk, then produce a final summary from the chunk summaries.
function chunkTranscript(transcript, maxChunkWords) {
maxChunkWords = maxChunkWords || 3000;
var words = transcript.split(/\s+/);
var chunks = [];
var currentChunk = [];
var currentCount = 0;
words.forEach(function (word) {
currentChunk.push(word);
currentCount++;
if (currentCount >= maxChunkWords) {
chunks.push(currentChunk.join(" "));
currentChunk = [];
currentCount = 0;
}
});
if (currentChunk.length > 0) {
chunks.push(currentChunk.join(" "));
}
return chunks;
}
function hierarchicalSummarize(transcript, meetingContext) {
var wordCount = transcript.split(/\s+/).length;
// If short enough, summarize directly
if (wordCount <= 4000) {
return generateMeetingSummary(transcript, meetingContext);
}
// Otherwise, chunk and summarize hierarchically
var chunks = chunkTranscript(transcript, 3000);
var chunkPromises = chunks.map(function (chunk, index) {
return anthropic.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 2048,
messages: [{
role: "user",
content: "Summarize this section (part " + (index + 1) + " of " + chunks.length +
") of a meeting transcript. Include any action items, decisions, and key discussion points.\n\n" + chunk
}]
}).then(function (response) {
return {
index: index,
summary: response.content[0].text
};
});
});
return Promise.all(chunkPromises).then(function (chunkSummaries) {
chunkSummaries.sort(function (a, b) { return a.index - b.index; });
var combinedSummaries = chunkSummaries.map(function (cs) {
return "--- Section " + (cs.index + 1) + " ---\n" + cs.summary;
}).join("\n\n");
return generateMeetingSummary(combinedSummaries, meetingContext);
});
}
This two-pass approach works reliably for meetings up to about three hours. Beyond that, you should consider whether anyone actually needs a summary of a three-hour meeting, or whether the meeting itself was the problem.
Generating Different Summary Formats
Different audiences need different views of the same meeting. Executives want two sentences and the key decisions. Project managers want action items with deadlines. Engineers want the technical details. Generate all three from the same structured data.
function formatExecutiveBrief(summary) {
var brief = "# Executive Brief\n\n";
brief += summary.executiveSummary + "\n\n";
if (summary.keyDecisions.length > 0) {
brief += "## Key Decisions\n\n";
summary.keyDecisions.forEach(function (decision) {
brief += "- **" + decision.decision + "** (by " + decision.madeBy + ")\n";
});
brief += "\n";
}
if (summary.followUpMeetingNeeded) {
brief += "*Follow-up meeting recommended to discuss: " +
summary.followUpTopics.join(", ") + "*\n";
}
return brief;
}
function formatActionItemList(summary) {
var list = "# Action Items\n\n";
list += "| Task | Owner | Deadline | Priority |\n";
list += "|------|-------|----------|----------|\n";
summary.actionItems.forEach(function (item) {
list += "| " + item.task + " | " + item.owner +
" | " + (item.deadline || "TBD") +
" | " + item.priority + " |\n";
});
return list;
}
function formatDetailedNotes(summary) {
var notes = "# Detailed Meeting Notes\n\n";
notes += "## Summary\n\n" + summary.executiveSummary + "\n\n";
notes += "## Discussion Topics\n\n";
summary.discussionTopics.forEach(function (topic) {
notes += "### " + topic.topic + "\n\n";
notes += topic.summary + "\n\n";
notes += "*Outcome: " + topic.outcome + "*\n\n";
});
notes += "## Decisions Made\n\n";
summary.keyDecisions.forEach(function (decision) {
notes += "- **" + decision.decision + "** (" + decision.madeBy + "): " + decision.context + "\n";
});
notes += "\n## Action Items\n\n";
summary.actionItems.forEach(function (item) {
notes += "- [ ] " + item.task + " - *" + item.owner + "*";
if (item.deadline) notes += " (due: " + item.deadline + ")";
notes += "\n";
});
notes += "\n## Attendee Contributions\n\n";
summary.attendeeSummary.forEach(function (attendee) {
notes += "- **" + attendee.name + ":** " + attendee.contributions + "\n";
});
return notes;
}
Integrating with Calendar APIs
Meeting context improves summary quality dramatically. If you know the meeting title, agenda, and attendee list beforehand, the LLM can produce more targeted summaries. The Google Calendar API is the most common integration point.
var google = require("googleapis");
function getMeetingContext(calendarClient, eventId) {
return new Promise(function (resolve, reject) {
calendarClient.events.get({
calendarId: "primary",
eventId: eventId
}, function (err, response) {
if (err) return reject(err);
var event = response.data;
resolve({
title: event.summary,
date: event.start.dateTime || event.start.date,
attendees: (event.attendees || []).map(function (a) {
return a.displayName || a.email;
}).join(", "),
description: event.description || "",
organizer: event.organizer.displayName || event.organizer.email
});
});
});
}
Even without calendar integration, you can extract meeting context from the first few minutes of a transcript where participants typically state the purpose of the meeting and introduce themselves.
Storing Summaries in PostgreSQL
Every summary should be persisted and searchable. PostgreSQL's full-text search capabilities make it an excellent choice for this without requiring a separate search engine.
var Pool = require("pg").Pool;
var pool = new Pool({
connectionString: process.env.POSTGRES_CONNECTION_STRING
});
function createSummaryTable() {
var query = [
"CREATE TABLE IF NOT EXISTS meeting_summaries (",
" id UUID PRIMARY KEY DEFAULT gen_random_uuid(),",
" meeting_title VARCHAR(500),",
" meeting_date TIMESTAMPTZ,",
" attendees TEXT[],",
" executive_summary TEXT,",
" key_decisions JSONB DEFAULT '[]',",
" action_items JSONB DEFAULT '[]',",
" discussion_topics JSONB DEFAULT '[]',",
" attendee_summary JSONB DEFAULT '[]',",
" full_transcript TEXT,",
" duration_seconds INTEGER,",
" search_vector TSVECTOR,",
" created_at TIMESTAMPTZ DEFAULT NOW()",
");",
"",
"CREATE INDEX IF NOT EXISTS idx_summary_search ON meeting_summaries USING GIN(search_vector);",
"CREATE INDEX IF NOT EXISTS idx_summary_date ON meeting_summaries(meeting_date DESC);"
].join("\n");
return pool.query(query);
}
function storeSummary(meetingData, summary, transcript) {
var query = [
"INSERT INTO meeting_summaries",
"(meeting_title, meeting_date, attendees, executive_summary,",
" key_decisions, action_items, discussion_topics, attendee_summary,",
" full_transcript, duration_seconds, search_vector)",
"VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10,",
" to_tsvector('english', $4 || ' ' || $1 || ' ' || $9))",
"RETURNING id"
].join("\n");
var values = [
meetingData.title,
meetingData.date,
meetingData.attendees ? meetingData.attendees.split(", ") : [],
summary.executiveSummary,
JSON.stringify(summary.keyDecisions),
JSON.stringify(summary.actionItems),
JSON.stringify(summary.discussionTopics),
JSON.stringify(summary.attendeeSummary),
transcript,
meetingData.durationSeconds || null
];
return pool.query(query, values).then(function (result) {
return result.rows[0].id;
});
}
function searchSummaries(searchTerm, limit) {
limit = limit || 20;
var query = [
"SELECT id, meeting_title, meeting_date, executive_summary,",
" ts_rank(search_vector, plainto_tsquery('english', $1)) AS relevance",
"FROM meeting_summaries",
"WHERE search_vector @@ plainto_tsquery('english', $1)",
"ORDER BY relevance DESC, meeting_date DESC",
"LIMIT $2"
].join("\n");
return pool.query(query, [searchTerm, limit]).then(function (result) {
return result.rows;
});
}
Distributing Summaries
Summaries are useless if nobody reads them. Push them to where people already are: email and Slack.
var nodemailer = require("nodemailer");
var transporter = nodemailer.createTransport({
host: process.env.SMTP_HOST,
port: parseInt(process.env.SMTP_PORT) || 587,
secure: false,
auth: {
user: process.env.SMTP_USER,
pass: process.env.SMTP_PASS
}
});
function distributeSummaryByEmail(summary, recipients, meetingTitle) {
var actionItemsHtml = summary.actionItems.map(function (item) {
return "<li><strong>" + item.task + "</strong> - " + item.owner +
(item.deadline ? " (due: " + item.deadline + ")" : "") + "</li>";
}).join("\n");
var htmlBody = [
"<h2>Meeting Summary: " + meetingTitle + "</h2>",
"<p>" + summary.executiveSummary + "</p>",
"<h3>Key Decisions</h3>",
"<ul>" + summary.keyDecisions.map(function (d) {
return "<li>" + d.decision + "</li>";
}).join("") + "</ul>",
"<h3>Action Items</h3>",
"<ul>" + actionItemsHtml + "</ul>"
].join("\n");
return transporter.sendMail({
from: process.env.SMTP_FROM || "[email protected]",
to: recipients.join(", "),
subject: "Meeting Summary: " + meetingTitle,
html: htmlBody
});
}
function distributeSummaryToSlack(summary, webhookUrl, meetingTitle) {
var blocks = [
{
type: "header",
text: { type: "plain_text", text: "Meeting Summary: " + meetingTitle }
},
{
type: "section",
text: { type: "mrkdwn", text: summary.executiveSummary }
},
{ type: "divider" },
{
type: "section",
text: {
type: "mrkdwn",
text: "*Action Items:*\n" + summary.actionItems.map(function (item) {
return "- " + item.task + " (" + item.owner + ")";
}).join("\n")
}
}
];
return axios.post(webhookUrl, { blocks: blocks });
}
Real-Time Summarization During Meetings
For live meetings, you can build a streaming pipeline using Deepgram's WebSocket API. The transcript accumulates in real time, and you generate interim summaries at regular intervals.
var WebSocket = require("ws");
function startRealtimeTranscription(audioStream, apiKey, onTranscript) {
var ws = new WebSocket("wss://api.deepgram.com/v1/listen?model=nova-2&diarize=true&punctuate=true&interim_results=false", {
headers: { Authorization: "Token " + apiKey }
});
var fullTranscript = "";
var lastSummaryLength = 0;
var summaryInterval = 5000; // words between summaries
ws.on("open", function () {
console.log("Deepgram WebSocket connected");
audioStream.on("data", function (chunk) {
if (ws.readyState === WebSocket.OPEN) {
ws.send(chunk);
}
});
});
ws.on("message", function (data) {
var response = JSON.parse(data);
if (response.channel && response.channel.alternatives[0]) {
var transcript = response.channel.alternatives[0].transcript;
if (transcript) {
fullTranscript += " " + transcript;
onTranscript(transcript, fullTranscript);
// Generate interim summary every N words
var wordCount = fullTranscript.split(/\s+/).length;
if (wordCount - lastSummaryLength > summaryInterval) {
lastSummaryLength = wordCount;
generateMeetingSummary(fullTranscript, null).then(function (summary) {
console.log("Interim summary generated at", wordCount, "words");
});
}
}
}
});
ws.on("close", function () {
console.log("Deepgram WebSocket closed");
});
return {
stop: function () {
ws.close();
return fullTranscript;
}
};
}
Privacy Considerations
Meeting recordings contain sensitive information. People discuss personnel issues, salary figures, competitive strategy, and occasionally say things they regret. You need to take privacy seriously.
First, always get consent. Many jurisdictions require all-party consent for recording. Your application should enforce this by requiring explicit confirmation before processing any audio.
Second, do not store audio longer than necessary. Once the transcript is generated, delete the audio file. Transcripts are searchable, auditable, and vastly smaller than audio files. If your organization requires audio retention, store it encrypted with access controls separate from the summaries.
Third, consider redacting sensitive information from summaries. You can instruct the LLM to identify and mask PII, salary figures, and other sensitive data during summarization.
function redactSensitiveContent(summary) {
var sensitivePatterns = [
{ pattern: /\$[\d,]+(?:\.\d{2})?/g, replacement: "[AMOUNT REDACTED]" },
{ pattern: /\b\d{3}[-.]?\d{2}[-.]?\d{4}\b/g, replacement: "[SSN REDACTED]" },
{ pattern: /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z]{2,}\b/gi, replacement: "[EMAIL REDACTED]" }
];
var redacted = JSON.stringify(summary);
sensitivePatterns.forEach(function (p) {
redacted = redacted.replace(p.pattern, p.replacement);
});
return JSON.parse(redacted);
}
Fourth, implement role-based access to summaries. Not everyone who attended a meeting should necessarily see the full summary, especially for HR or leadership meetings.
Complete Working Example
Here is the complete Express application that ties everything together:
var express = require("express");
var multer = require("multer");
var ffmpeg = require("fluent-ffmpeg");
var path = require("path");
var fs = require("fs");
var uuid = require("uuid");
var axios = require("axios");
var Anthropic = require("@anthropic-ai/sdk");
var Pool = require("pg").Pool;
var nodemailer = require("nodemailer");
var app = express();
app.use(express.json());
var anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
var pool = new Pool({ connectionString: process.env.POSTGRES_CONNECTION_STRING });
var upload = multer({
dest: "uploads/raw/",
limits: { fileSize: 500 * 1024 * 1024 }
});
// Initialize database table
pool.query([
"CREATE TABLE IF NOT EXISTS meeting_summaries (",
" id UUID PRIMARY KEY DEFAULT gen_random_uuid(),",
" meeting_title VARCHAR(500),",
" meeting_date TIMESTAMPTZ,",
" executive_summary TEXT,",
" action_items JSONB DEFAULT '[]',",
" key_decisions JSONB DEFAULT '[]',",
" discussion_topics JSONB DEFAULT '[]',",
" full_transcript TEXT,",
" created_at TIMESTAMPTZ DEFAULT NOW()",
")"
].join("\n")).catch(function (err) {
console.error("Failed to create table:", err.message);
});
// Upload and summarize endpoint
app.post("/api/meetings/summarize", upload.single("audio"), function (req, res) {
if (!req.file) {
return res.status(400).json({ error: "No audio file provided" });
}
var meetingTitle = req.body.title || "Untitled Meeting";
var meetingDate = req.body.date || new Date().toISOString();
var recipients = req.body.recipients ? req.body.recipients.split(",") : [];
var slackWebhook = req.body.slackWebhook || null;
var convertedPath;
var transcriptText;
var summaryResult;
// Step 1: Convert audio
convertAudio(req.file.path)
.then(function (wavPath) {
convertedPath = wavPath;
console.log("Audio converted:", wavPath);
// Step 2: Transcribe
return transcribe(wavPath, process.env.ASSEMBLYAI_API_KEY);
})
.then(function (transcript) {
transcriptText = transcript.text;
console.log("Transcription complete:", transcriptText.length, "characters");
// Clean up audio file
fs.unlink(convertedPath, function () {});
// Step 3: Generate summary
return hierarchicalSummarize(transcriptText, {
title: meetingTitle,
date: meetingDate
});
})
.then(function (summary) {
summaryResult = summary;
console.log("Summary generated with", summary.actionItems.length, "action items");
// Step 4: Store in database
return pool.query(
"INSERT INTO meeting_summaries (meeting_title, meeting_date, executive_summary, action_items, key_decisions, discussion_topics, full_transcript) VALUES ($1, $2, $3, $4, $5, $6, $7) RETURNING id",
[meetingTitle, meetingDate, summary.executiveSummary, JSON.stringify(summary.actionItems), JSON.stringify(summary.keyDecisions), JSON.stringify(summary.discussionTopics), transcriptText]
);
})
.then(function (dbResult) {
var summaryId = dbResult.rows[0].id;
// Step 5: Distribute
var distributionPromises = [];
if (recipients.length > 0) {
distributionPromises.push(
sendEmailSummary(summaryResult, recipients, meetingTitle)
);
}
if (slackWebhook) {
distributionPromises.push(
sendSlackSummary(summaryResult, slackWebhook, meetingTitle)
);
}
return Promise.all(distributionPromises).then(function () {
return summaryId;
});
})
.then(function (summaryId) {
res.json({
id: summaryId,
summary: summaryResult,
formats: {
executive: formatExecutiveBrief(summaryResult),
actionItems: formatActionItemList(summaryResult),
detailed: formatDetailedNotes(summaryResult)
}
});
})
.catch(function (err) {
console.error("Summarization pipeline failed:", err);
// Clean up files on error
if (req.file && req.file.path) fs.unlink(req.file.path, function () {});
if (convertedPath) fs.unlink(convertedPath, function () {});
res.status(500).json({ error: "Failed to process meeting: " + err.message });
});
});
// Search summaries endpoint
app.get("/api/meetings/search", function (req, res) {
var query = req.query.q;
if (!query) {
return res.status(400).json({ error: "Search query required" });
}
pool.query(
"SELECT id, meeting_title, meeting_date, executive_summary FROM meeting_summaries WHERE to_tsvector('english', executive_summary || ' ' || meeting_title) @@ plainto_tsquery('english', $1) ORDER BY meeting_date DESC LIMIT 20",
[query]
).then(function (result) {
res.json({ results: result.rows });
}).catch(function (err) {
res.status(500).json({ error: err.message });
});
});
// Get summary by ID
app.get("/api/meetings/:id", function (req, res) {
pool.query("SELECT * FROM meeting_summaries WHERE id = $1", [req.params.id])
.then(function (result) {
if (result.rows.length === 0) {
return res.status(404).json({ error: "Summary not found" });
}
res.json(result.rows[0]);
})
.catch(function (err) {
res.status(500).json({ error: err.message });
});
});
function convertAudio(inputPath) {
return new Promise(function (resolve, reject) {
var outputPath = path.join("uploads", "converted", uuid.v4() + ".wav");
fs.mkdirSync(path.dirname(outputPath), { recursive: true });
ffmpeg(inputPath)
.audioCodec("pcm_s16le")
.audioFrequency(16000)
.audioChannels(1)
.format("wav")
.on("end", function () {
fs.unlink(inputPath, function () {});
resolve(outputPath);
})
.on("error", function (err) {
reject(new Error("Audio conversion failed: " + err.message));
})
.save(outputPath);
});
}
function transcribe(audioPath, apiKey) {
return axios.post("https://api.assemblyai.com/v2/upload",
fs.createReadStream(audioPath),
{ headers: { authorization: apiKey, "content-type": "application/octet-stream" } }
).then(function (uploadRes) {
return axios.post("https://api.assemblyai.com/v2/transcript",
{ audio_url: uploadRes.data.upload_url, speaker_labels: true },
{ headers: { authorization: apiKey } }
);
}).then(function (transcriptRes) {
return waitForTranscript(transcriptRes.data.id, apiKey);
});
}
function waitForTranscript(id, apiKey) {
return new Promise(function (resolve, reject) {
var poll = setInterval(function () {
axios.get("https://api.assemblyai.com/v2/transcript/" + id, {
headers: { authorization: apiKey }
}).then(function (res) {
if (res.data.status === "completed") {
clearInterval(poll);
resolve({ text: res.data.text, utterances: res.data.utterances });
} else if (res.data.status === "error") {
clearInterval(poll);
reject(new Error("Transcription error: " + res.data.error));
}
}).catch(function (err) { clearInterval(poll); reject(err); });
}, 5000);
});
}
function hierarchicalSummarize(transcript, context) {
var words = transcript.split(/\s+/).length;
if (words <= 4000) {
return generateSummary(transcript, context);
}
var chunks = [];
var wordsArr = transcript.split(/\s+/);
for (var i = 0; i < wordsArr.length; i += 3000) {
chunks.push(wordsArr.slice(i, i + 3000).join(" "));
}
return Promise.all(chunks.map(function (chunk, idx) {
return anthropic.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 2048,
messages: [{ role: "user", content: "Summarize this meeting transcript section (" + (idx + 1) + "/" + chunks.length + "). Include action items and decisions.\n\n" + chunk }]
}).then(function (r) { return r.content[0].text; });
})).then(function (summaries) {
return generateSummary(summaries.join("\n\n---\n\n"), context);
});
}
function generateSummary(text, context) {
var prompt = "Meeting transcript:\n\n" + text + "\n\n";
if (context) {
prompt += "Title: " + (context.title || "Unknown") + "\nDate: " + (context.date || "Unknown") + "\n\n";
}
prompt += 'Return a JSON object: {"executiveSummary":"...","keyDecisions":[{"decision":"...","madeBy":"...","context":"..."}],"actionItems":[{"task":"...","owner":"...","deadline":"...","priority":"high|medium|low"}],"discussionTopics":[{"topic":"...","summary":"...","outcome":"decided|tabled|needs-followup"}],"attendeeSummary":[{"name":"...","contributions":"..."}],"followUpMeetingNeeded":bool,"followUpTopics":["..."]}';
return anthropic.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 4096,
system: "You are an expert meeting summarizer. Only include information from the transcript. Output valid JSON only.",
messages: [{ role: "user", content: prompt }]
}).then(function (r) {
var raw = r.content[0].text;
var match = raw.match(/```(?:json)?\s*([\s\S]*?)```/);
return JSON.parse((match ? match[1] : raw).trim());
});
}
function formatExecutiveBrief(s) {
var out = "# Executive Brief\n\n" + s.executiveSummary + "\n\n## Decisions\n\n";
s.keyDecisions.forEach(function (d) { out += "- " + d.decision + "\n"; });
return out;
}
function formatActionItemList(s) {
var out = "# Action Items\n\n| Task | Owner | Deadline | Priority |\n|------|-------|----------|----------|\n";
s.actionItems.forEach(function (a) {
out += "| " + a.task + " | " + a.owner + " | " + (a.deadline || "TBD") + " | " + a.priority + " |\n";
});
return out;
}
function formatDetailedNotes(s) {
var out = "# Detailed Notes\n\n" + s.executiveSummary + "\n\n## Topics\n\n";
s.discussionTopics.forEach(function (t) {
out += "### " + t.topic + "\n" + t.summary + "\n*" + t.outcome + "*\n\n";
});
return out;
}
function sendEmailSummary(summary, recipients, title) {
var transporter = nodemailer.createTransport({
host: process.env.SMTP_HOST,
port: parseInt(process.env.SMTP_PORT) || 587,
auth: { user: process.env.SMTP_USER, pass: process.env.SMTP_PASS }
});
return transporter.sendMail({
from: process.env.SMTP_FROM,
to: recipients.join(", "),
subject: "Meeting Summary: " + title,
html: "<h2>" + title + "</h2><p>" + summary.executiveSummary + "</p>"
});
}
function sendSlackSummary(summary, webhookUrl, title) {
return axios.post(webhookUrl, {
blocks: [
{ type: "header", text: { type: "plain_text", text: title } },
{ type: "section", text: { type: "mrkdwn", text: summary.executiveSummary } }
]
});
}
var PORT = process.env.PORT || 3000;
app.listen(PORT, function () {
console.log("Meeting summarization service running on port " + PORT);
});
Test it with curl:
curl -X POST http://localhost:3000/api/meetings/summarize \
-F "[email protected]" \
-F "title=Sprint Planning - Feb 12" \
-F "[email protected]" \
-F "slackWebhook=https://hooks.slack.com/services/xxx/yyy/zzz"
Common Issues and Troubleshooting
1. FFmpeg not found or conversion fails
Error: Cannot find ffmpeg
Error: spawn ffmpeg ENOENT
FFmpeg must be installed system-wide and available on your PATH. On Ubuntu, run sudo apt install ffmpeg. On macOS, use brew install ffmpeg. On Windows, download from ffmpeg.org and add the bin directory to your PATH. Verify with ffmpeg -version. If you are running in Docker, add RUN apt-get update && apt-get install -y ffmpeg to your Dockerfile.
2. Transcription API returns empty or garbled text
{ "text": "", "status": "completed" }
This usually means the audio quality is too poor or the format is wrong. Verify your audio contains actual speech by playing it back. Check that your FFmpeg conversion is producing valid output by running ffplay uploads/converted/your-file.wav. If the audio is very quiet, add an audio filter to normalize volume: .audioFilters('volume=2.0') in your FFmpeg chain. Also ensure the audio is not DRM-protected, which will cause silent conversion failures.
3. LLM returns invalid JSON in summary
SyntaxError: Unexpected token 'H' in JSON at position 0
This happens when the LLM returns a preamble like "Here is the JSON..." before the actual object. The regex extraction in the code handles the markdown code block case, but sometimes the model returns bare JSON with a text prefix. Add more robust parsing:
function parseJSONResponse(text) {
// Try direct parse first
try { return JSON.parse(text); } catch (e) {}
// Try extracting from code block
var match = text.match(/```(?:json)?\s*([\s\S]*?)```/);
if (match) {
try { return JSON.parse(match[1].trim()); } catch (e) {}
}
// Try finding JSON object boundaries
var start = text.indexOf("{");
var end = text.lastIndexOf("}");
if (start !== -1 && end !== -1) {
try { return JSON.parse(text.substring(start, end + 1)); } catch (e) {}
}
throw new Error("Could not parse JSON from LLM response: " + text.substring(0, 100));
}
4. Large file uploads timing out
Error: Request timeout of 120000ms exceeded
Error: PayloadTooLargeError: request entity too large
For files over 100MB, you need to increase both the multer limit and the Express request timeout. Additionally, set your reverse proxy (nginx, etc.) to allow large uploads. In nginx, add client_max_body_size 500M; to your server block. In Express, set app.use(express.json({ limit: '500mb' })). For the transcription API call, increase the axios timeout to at least 5 minutes for large files.
5. Speaker diarization produces too many or too few speakers
{ "utterances": [{ "speaker": "A", ... }, { "speaker": "B", ... }, { "speaker": "C", ... }, { "speaker": "D", ... }, { "speaker": "E", ... }] }
// Expected 3 speakers, got 5
Speaker diarization is inherently imperfect. Background noise, overlapping speech, and similar voices cause false speaker splits. If you know the expected speaker count, pass it as a hint to AssemblyAI (speakers_expected: 3) or Deepgram (diarize_version=latest&diarize=true). Post-process by merging speakers whose utterances never overlap and who appear to be the same person based on consecutive short utterances.
Best Practices
Always delete audio files after transcription. They are large, contain sensitive information, and you do not need them once you have the transcript. Set up a cron job to clean the uploads directory as a safety net.
Use speaker diarization from the transcription provider, not the LLM. Asking the LLM to identify speakers from undifferentiated text is unreliable. The transcription API has access to audio features (pitch, cadence, spatial positioning) that text-only models cannot use.
Set a maximum meeting duration and reject recordings that exceed it. A four-hour meeting recording will produce a poor summary regardless of your pipeline. Set a reasonable limit like 90 minutes and encourage users to split longer sessions.
Cache transcription results aggressively. Transcription is the most expensive step in both time and money. Hash the audio file and check for existing transcripts before sending to the API. This also protects against accidental duplicate processing.
Generate summaries asynchronously and notify users when complete. A 60-minute meeting takes 3-5 minutes to transcribe and another 30-60 seconds to summarize. Do not make users wait on a synchronous HTTP request. Accept the upload, return a job ID, and push the summary via webhook, email, or Slack when it is ready.
Version your prompt templates. The quality of your summaries depends entirely on your prompts. Store them in a database or configuration file with version numbers so you can A/B test improvements and roll back if a change degrades quality.
Include timestamps in action items when possible. If a speaker says "let's revisit this next Tuesday," the system should surface that deadline. The transcript timestamps from diarization help anchor these references to actual calendar dates when combined with the meeting date.
Test with real meetings, not synthetic audio. Synthetic test audio is clean, well-paced, and has perfect pronunciation. Real meetings have crosstalk, background noise, people eating lunch while talking, and speakers who mumble. Your pipeline needs to handle all of this gracefully.
References
- AssemblyAI Documentation - Transcription API with speaker diarization
- OpenAI Whisper API - Speech-to-text transcription
- Deepgram Documentation - Real-time and batch transcription
- Anthropic Claude API - LLM for summary generation
- FFmpeg Documentation - Audio format conversion
- PostgreSQL Full Text Search - Searchable summary storage
- Nodemailer - Email distribution from Node.js