SEO

How to Build Your Own SEO Crawler Tools with Agentic AI in 2026

Shane Larson

Wed Mar 04 2026 00:00:00 GMT+0000 (Coordinated Universal Time)

Build production-grade SEO crawler tools for free using Claude Code and Open Claw. Five real projects with runnable JavaScript code. No subscriptions.

Stop paying $259/year for Screaming Frog. Build something better with Claude Code and Open Claw.

I've been building developer tools for years, and I'll tell you something that might make the SEO tool industry uncomfortable: the core technology behind most SEO crawlers isn't complex. What is complex is packaging it up nicely and charging you a recurring subscription for it. In 2026, agentic AI tools like Claude Code and Open Claw have changed the equation so dramatically that a solo developer can build production-grade SEO tooling in a weekend.

This isn't a listicle. This isn't a "top 10 SEO tools" roundup written by someone who's never opened a terminal. This is a practitioner's guide to actually building SEO crawler tools from scratch, using modern agentic development workflows that compress weeks of work into hours.

Let's get into it.

Why Build Your Own SEO Crawler?

Before we start writing code, let's talk about why you'd want to do this in the first place.

The commercial SEO crawler market is dominated by a handful of players. Screaming Frog runs £199/year and caps the free version at 500 URLs. Sitebulb, Lumar, and DeepCrawl all operate on similar subscription models. These are solid tools — I'm not going to pretend otherwise. But they come with real limitations that matter when you're a developer or technical SEO who wants to go deeper.

First, you can't customize them. When Screaming Frog decides what constitutes an "SEO issue," you're stuck with their definition. Your e-commerce site might have completely different priorities than a SaaS marketing site, but the tool doesn't care. Second, the data stays locked in their ecosystem. Sure, you can export CSVs, but try building a real-time dashboard that pulls from Screaming Frog data programmatically. Third — and this is the big one — you learn nothing by clicking buttons in someone else's GUI. Building your own crawler teaches you how search engines actually work at a mechanical level, which makes you a better SEO practitioner.

The open-source ecosystem has started catching up. Projects like LibreCrawl (a Python/Flask-based crawler positioning itself as a free Screaming Frog alternative), Crowl (built on Scrapy), and Greenflare are all actively maintained. But here's what none of those projects have: an agentic AI backbone that can analyze results, generate recommendations, and iterate on the crawl strategy autonomously.

That's what we're going to build.

The Agentic AI Toolkit: Claude Code and Open Claw

If you're not familiar with the agentic AI landscape in 2026, here's the short version: we've moved past the era of "chat with an AI and copy-paste the output." Agentic tools live in your development environment, execute multi-step tasks autonomously, and can interact with external services through protocols like MCP (Model Context Protocol).

Claude Code: Your Terminal-Native AI Engineer

Claude Code is Anthropic's command-line tool that brings Claude directly into your terminal. Unlike the web chat interface, Claude Code can read and write files, execute shell commands, manage git repos, and orchestrate complex multi-step workflows. For building SEO tools, this is transformative.

Here's why: instead of asking an AI to "write me a web crawler" and then spending three hours debugging the output, you can tell Claude Code to build a crawler, test it against a real URL, analyze the results, fix any bugs it finds, and iterate — all in a single session. It's pair programming where your partner never gets tired and has encyclopedic knowledge of every HTTP specification ever written.

The key features that matter for SEO tool development:

MCP Integration — Claude Code can connect to external data sources and services through MCP servers. This means your SEO tools can pull live data from Google Search Console, analytics platforms, or even your CMS directly. SE Ranking recently launched an MCP server specifically for connecting Claude Code to their SEO data, which gives you a sense of where the industry is heading.

Custom Skills and Slash Commands — You can package repeatable workflows as "skills" that Claude Code executes on command. Built an SEO audit workflow? Save it as a skill. Next time, you just type /seo-audit https://yoursite.com and the entire pipeline runs autonomously.

Sub-Agents — For complex crawls, you can spin up specialized sub-agents that handle different aspects of the analysis in parallel. One agent handles technical SEO checks, another analyzes content quality, a third examines link structure. The orchestrating agent synthesizes everything into a cohesive report.

Open Claw: The 24/7 Autonomous Agent

Open Claw (formerly Clawdbot, then briefly Modeboard — it's had a turbulent naming history thanks to a cease-and-desist from Anthropic) is a different beast entirely. It's an open-source system that runs Claude Code indefinitely on a server or VPS, turning it into a persistent AI assistant that operates 24/7.

For SEO, this means you can set up an Open Claw instance that:

Crawls your site on a schedule and alerts you via Telegram when new issues appear
Monitors competitor sites for structural changes, new pages, or shifted keyword targeting
Watches Google Search Console for traffic drops and automatically investigates potential causes
Runs continuous A/B analysis on title tags and meta descriptions based on click-through rate data

The setup involves running Open Claw on a VPS (many people are buying Mac Minis specifically for this), connecting it to your communication channel of choice (Telegram and WhatsApp are the most popular), and configuring the tasks you want it to handle. The Claude Code subscription ($20–$200/month depending on usage tier) powers the underlying AI.

The security implications are worth thinking about carefully — you're giving an autonomous agent access to your infrastructure and data. Follow the principle of least privilege: give it only the access it needs for each specific task.

Project 1: Building a Technical SEO Crawler from Scratch

Let's build something real. We're going to create a Node.js-based SEO crawler that fetches pages, extracts key SEO elements, identifies issues, and generates an actionable report. Then we'll supercharge it with Claude Code integration for AI-powered analysis.

The Foundation: A Concurrent HTTP Crawler

// crawler.js
const https = require('https');
const http = require('http');
const { URL } = require('url');
const { JSDOM } = require('jsdom');

class SEOCrawler {
  constructor(startUrl, options = {}) {
    this.startUrl = new URL(startUrl);
    this.domain = this.startUrl.hostname;
    this.maxPages = options.maxPages || 500;
    this.concurrency = options.concurrency || 5;
    this.timeout = options.timeout || 10000;
    this.respectRobotsTxt = options.respectRobotsTxt !== false;

    this.visited = new Set();
    this.queue = [startUrl];
    this.results = [];
    this.errors = [];
    this.activeRequests = 0;
    this.robotsRules = null;
  }

  async crawl() {
    if (this.respectRobotsTxt) {
      await this.fetchRobotsTxt();
    }

    return new Promise((resolve) => {
      const interval = setInterval(() => {
        while (
          this.activeRequests < this.concurrency &&
          this.queue.length > 0 &&
          this.visited.size < this.maxPages
        ) {
          const url = this.queue.shift();
          if (!this.visited.has(url)) {
            this.visited.add(url);
            this.activeRequests++;
            this.fetchAndAnalyze(url).then(() => {
              this.activeRequests--;
            });
          }
        }

        if (this.activeRequests === 0 && this.queue.length === 0) {
          clearInterval(interval);
          resolve({
            pages: this.results,
            errors: this.errors,
            stats: this.generateStats()
          });
        }
      }, 100);
    });
  }

  async fetchRobotsTxt() {
    try {
      const robotsUrl = `${this.startUrl.protocol}//${this.domain}/robots.txt`;
      const content = await this.fetch(robotsUrl);
      this.robotsRules = this.parseRobotsTxt(content.body);
    } catch (e) {
      this.robotsRules = { disallowed: [] };
    }
  }

  parseRobotsTxt(content) {
    const lines = content.split('\n');
    const disallowed = [];
    let isRelevantAgent = false;

    for (const line of lines) {
      const trimmed = line.trim().toLowerCase();
      if (trimmed.startsWith('user-agent:')) {
        const agent = trimmed.split(':')[1].trim();
        isRelevantAgent = agent === '*' || agent === 'seobot';
      } else if (isRelevantAgent && trimmed.startsWith('disallow:')) {
        const path = line.split(':').slice(1).join(':').trim();
        if (path) disallowed.push(path);
      }
    }

    return { disallowed };
  }

  isAllowedByRobots(url) {
    if (!this.robotsRules) return true;
    const path = new URL(url).pathname;
    return !this.robotsRules.disallowed.some(rule =>
      path.startsWith(rule)
    );
  }

  async fetchAndAnalyze(url) {
    try {
      if (!this.isAllowedByRobots(url)) {
        this.results.push({
          url,
          blocked: true,
          blockedBy: 'robots.txt'
        });
        return;
      }

      const response = await this.fetch(url);
      const analysis = this.analyzePage(url, response);
      this.results.push(analysis);

      // Extract and queue internal links
      if (response.contentType?.includes('text/html')) {
        const links = this.extractLinks(url, response.body);
        for (const link of links) {
          if (!this.visited.has(link) && this.visited.size < this.maxPages) {
            this.queue.push(link);
          }
        }
      }
    } catch (error) {
      this.errors.push({ url, error: error.message });
    }
  }

  fetch(url) {
    return new Promise((resolve, reject) => {
      const protocol = url.startsWith('https') ? https : http;
      const startTime = Date.now();

      const req = protocol.get(url, {
        timeout: this.timeout,
        headers: {
          'User-Agent': 'GrizzlyPeakSEOBot/1.0 (+https://grizzlypeaksoftware.com/bot)',
          'Accept': 'text/html,application/xhtml+xml',
          'Accept-Encoding': 'identity'
        }
      }, (res) => {
        // Handle redirects
        if ([301, 302, 303, 307, 308].includes(res.statusCode)) {
          const redirectUrl = new URL(res.headers.location, url).href;
          resolve({
            statusCode: res.statusCode,
            redirectUrl,
            headers: res.headers,
            responseTime: Date.now() - startTime,
            body: '',
            contentType: res.headers['content-type']
          });
          return;
        }

        let body = '';
        res.on('data', chunk => body += chunk);
        res.on('end', () => {
          resolve({
            statusCode: res.statusCode,
            headers: res.headers,
            responseTime: Date.now() - startTime,
            body,
            contentType: res.headers['content-type']
          });
        });
      });

      req.on('error', reject);
      req.on('timeout', () => {
        req.destroy();
        reject(new Error('Request timeout'));
      });
    });
  }

  analyzePage(url, response) {
    const result = {
      url,
      statusCode: response.statusCode,
      responseTime: response.responseTime,
      contentType: response.contentType,
      redirectUrl: response.redirectUrl || null,
      issues: []
    };

    if (response.redirectUrl) {
      result.issues.push({
        type: 'redirect',
        severity: response.statusCode === 301 ? 'info' : 'warning',
        message: `${response.statusCode} redirect to ${response.redirectUrl}`
      });
      return result;
    }

    if (response.statusCode !== 200) {
      result.issues.push({
        type: 'status_code',
        severity: response.statusCode >= 500 ? 'critical' : 'error',
        message: `HTTP ${response.statusCode}`
      });
      return result;
    }

    if (!response.contentType?.includes('text/html')) {
      return result;
    }

    // Parse the HTML
    const dom = new JSDOM(response.body);
    const doc = dom.window.document;

    // Title analysis
    const title = doc.querySelector('title')?.textContent?.trim();
    result.title = title;
    if (!title) {
      result.issues.push({
        type: 'missing_title',
        severity: 'critical',
        message: 'Page has no title tag'
      });
    } else if (title.length > 60) {
      result.issues.push({
        type: 'title_too_long',
        severity: 'warning',
        message: `Title is ${title.length} characters (recommended: under 60)`
      });
    } else if (title.length < 30) {
      result.issues.push({
        type: 'title_too_short',
        severity: 'warning',
        message: `Title is only ${title.length} characters (recommended: 30-60)`
      });
    }

    // Meta description analysis
    const metaDesc = doc.querySelector('meta[name="description"]')?.getAttribute('content')?.trim();
    result.metaDescription = metaDesc;
    if (!metaDesc) {
      result.issues.push({
        type: 'missing_meta_description',
        severity: 'error',
        message: 'Page has no meta description'
      });
    } else if (metaDesc.length > 160) {
      result.issues.push({
        type: 'meta_description_too_long',
        severity: 'warning',
        message: `Meta description is ${metaDesc.length} characters (recommended: under 160)`
      });
    }

    // Heading analysis
    const h1s = doc.querySelectorAll('h1');
    result.h1Count = h1s.length;
    result.h1Text = Array.from(h1s).map(h => h.textContent.trim());
    if (h1s.length === 0) {
      result.issues.push({
        type: 'missing_h1',
        severity: 'error',
        message: 'Page has no H1 tag'
      });
    } else if (h1s.length > 1) {
      result.issues.push({
        type: 'multiple_h1',
        severity: 'warning',
        message: `Page has ${h1s.length} H1 tags (recommended: exactly 1)`
      });
    }

    // Check heading hierarchy
    const headings = doc.querySelectorAll('h1, h2, h3, h4, h5, h6');
    let lastLevel = 0;
    for (const heading of headings) {
      const level = parseInt(heading.tagName[1]);
      if (level > lastLevel + 1) {
        result.issues.push({
          type: 'heading_hierarchy_skip',
          severity: 'warning',
          message: `Heading jumps from H${lastLevel} to H${level}`
        });
        break;
      }
      lastLevel = level;
    }

    // Image analysis
    const images = doc.querySelectorAll('img');
    let missingAlt = 0;
    for (const img of images) {
      if (!img.getAttribute('alt') && !img.getAttribute('alt') === '') {
        missingAlt++;
      }
    }
    result.imageCount = images.length;
    if (missingAlt > 0) {
      result.issues.push({
        type: 'images_missing_alt',
        severity: 'warning',
        message: `${missingAlt} of ${images.length} images missing alt text`
      });
    }

    // Canonical tag
    const canonical = doc.querySelector('link[rel="canonical"]')?.getAttribute('href');
    result.canonical = canonical;
    if (!canonical) {
      result.issues.push({
        type: 'missing_canonical',
        severity: 'warning',
        message: 'Page has no canonical tag'
      });
    }

    // Meta robots
    const metaRobots = doc.querySelector('meta[name="robots"]')?.getAttribute('content');
    result.metaRobots = metaRobots;
    if (metaRobots && (metaRobots.includes('noindex') || metaRobots.includes('nofollow'))) {
      result.issues.push({
        type: 'noindex_nofollow',
        severity: 'critical',
        message: `Meta robots contains: ${metaRobots}`
      });
    }

    // Open Graph tags
    const ogTitle = doc.querySelector('meta[property="og:title"]')?.getAttribute('content');
    const ogDesc = doc.querySelector('meta[property="og:description"]')?.getAttribute('content');
    const ogImage = doc.querySelector('meta[property="og:image"]')?.getAttribute('content');
    if (!ogTitle || !ogDesc || !ogImage) {
      const missing = [];
      if (!ogTitle) missing.push('og:title');
      if (!ogDesc) missing.push('og:description');
      if (!ogImage) missing.push('og:image');
      result.issues.push({
        type: 'missing_og_tags',
        severity: 'info',
        message: `Missing Open Graph tags: ${missing.join(', ')}`
      });
    }

    // Content length
    const bodyText = doc.body?.textContent?.replace(/\s+/g, ' ').trim();
    result.wordCount = bodyText ? bodyText.split(/\s+/).length : 0;
    if (result.wordCount < 300) {
      result.issues.push({
        type: 'thin_content',
        severity: 'warning',
        message: `Page has only ${result.wordCount} words (thin content)`
      });
    }

    // Response time
    if (response.responseTime > 3000) {
      result.issues.push({
        type: 'slow_response',
        severity: 'warning',
        message: `Response time: ${response.responseTime}ms (over 3 seconds)`
      });
    }

    // Schema/structured data
    const jsonLdScripts = doc.querySelectorAll('script[type="application/ld+json"]');
    result.hasStructuredData = jsonLdScripts.length > 0;
    if (jsonLdScripts.length === 0) {
      result.issues.push({
        type: 'no_structured_data',
        severity: 'info',
        message: 'No JSON-LD structured data found'
      });
    }

    // Internal and external link counts
    const links = doc.querySelectorAll('a[href]');
    let internalLinks = 0;
    let externalLinks = 0;
    let brokenAnchors = 0;
    for (const link of links) {
      const href = link.getAttribute('href');
      if (href.startsWith('#')) {
        brokenAnchors++; // We count them, but they're not necessarily broken
        continue;
      }
      try {
        const linkUrl = new URL(href, url);
        if (linkUrl.hostname === this.domain) {
          internalLinks++;
        } else {
          externalLinks++;
        }
      } catch (e) {
        // Invalid URL
      }
    }
    result.internalLinks = internalLinks;
    result.externalLinks = externalLinks;

    return result;
  }

  extractLinks(currentUrl, html) {
    const links = [];
    try {
      const dom = new JSDOM(html);
      const anchors = dom.window.document.querySelectorAll('a[href]');
      for (const anchor of anchors) {
        try {
          const href = anchor.getAttribute('href');
          if (href.startsWith('#') || href.startsWith('mailto:') || href.startsWith('tel:')) {
            continue;
          }
          const absoluteUrl = new URL(href, currentUrl).href;
          const parsed = new URL(absoluteUrl);
          if (parsed.hostname === this.domain && parsed.protocol.startsWith('http')) {
            // Normalize: remove fragment, trailing slash
            parsed.hash = '';
            const normalized = parsed.href.replace(/\/$/, '');
            links.push(normalized);
          }
        } catch (e) {
          // Invalid URL, skip
        }
      }
    } catch (e) {
      // Parse error, skip
    }
    return [...new Set(links)];
  }

  generateStats() {
    const statusCodes = {};
    let totalResponseTime = 0;
    let responseTimeCount = 0;

    for (const page of this.results) {
      const code = page.statusCode || 'unknown';
      statusCodes[code] = (statusCodes[code] || 0) + 1;
      if (page.responseTime) {
        totalResponseTime += page.responseTime;
        responseTimeCount++;
      }
    }

    const allIssues = this.results.flatMap(p => p.issues || []);
    const issueCounts = {};
    for (const issue of allIssues) {
      issueCounts[issue.type] = (issueCounts[issue.type] || 0) + 1;
    }

    return {
      totalPages: this.results.length,
      totalErrors: this.errors.length,
      statusCodes,
      avgResponseTime: responseTimeCount > 0
        ? Math.round(totalResponseTime / responseTimeCount)
        : 0,
      issueSummary: issueCounts
    };
  }
}

module.exports = SEOCrawler;

Running It

// run-crawl.js
const SEOCrawler = require('./crawler');

async function main() {
  const url = process.argv[2] || 'https://example.com';
  console.log(`Starting crawl of ${url}...`);

  const crawler = new SEOCrawler(url, {
    maxPages: 100,
    concurrency: 3,
    respectRobotsTxt: true
  });

  const results = await crawler.crawl();

  console.log('\n=== CRAWL COMPLETE ===');
  console.log(`Pages crawled: ${results.stats.totalPages}`);
  console.log(`Errors: ${results.stats.totalErrors}`);
  console.log(`Avg response time: ${results.stats.avgResponseTime}ms`);
  console.log('\nStatus codes:', results.stats.statusCodes);
  console.log('\nIssue summary:', results.stats.issueSummary);

  // Write full results to JSON
  const fs = require('fs');
  fs.writeFileSync(
    'crawl-results.json',
    JSON.stringify(results, null, 2)
  );
  console.log('\nFull results written to crawl-results.json');
}

main().catch(console.error);

That's your foundation. About 300 lines of JavaScript, and you've got a concurrent web crawler that respects robots.txt, extracts every major SEO element, identifies 15+ categories of issues, and generates a statistical summary. No subscription required.

Project 2: Supercharging the Crawler with Claude Code

Here's where things get interesting. The crawler above does what Screaming Frog does at a basic level — it fetches pages and checks boxes. But what if the crawler could think about what it finds?

Setting Up the Claude Code Integration

Using Claude Code's MCP integration, you can pipe your crawl results directly into Claude for AI-powered analysis. Here's how to set up a slash command that runs a full audit:

First, create a CLAUDE.md file in your project root:

# SEO Crawler Project

## Commands
- /crawl [url] - Crawl a website and generate an SEO audit
- /analyze [file] - Analyze existing crawl results with AI
- /compare [file1] [file2] - Compare two crawls to track changes

## Workflow
1. Run the crawler against the target URL
2. Parse the JSON results
3. Identify patterns across all pages (not just individual issues)
4. Generate a prioritized action plan based on impact and effort
5. Save the report as markdown

## Analysis Priorities
- Critical issues first (noindex on important pages, 5xx errors)
- Quick wins (missing meta descriptions, duplicate titles)
- Strategic improvements (content gaps, internal linking opportunities)
- Performance optimizations (slow pages, large images)

Then create a skill file at .claude/skills/seo-audit.md:

# SEO Audit Skill

When running an SEO audit:

1. Execute the crawler: `node run-crawl.js [url]`
2. Read the crawl-results.json file
3. Analyze patterns across all pages:
   - Are there duplicate titles? Which ones?
   - Is there a consistent internal linking structure?
   - Are there orphan pages (no internal links pointing to them)?
   - What's the crawl depth distribution?
   - Are there redirect chains?
4. Generate a markdown report with:
   - Executive summary (3-5 sentences)
   - Critical issues (fix immediately)
   - Quick wins (low effort, high impact)
   - Strategic recommendations (longer-term)
   - Technical debt (nice to fix but not urgent)
5. Save as `audit-report-[domain]-[date].md`

Now when you're in Claude Code, you can type:

/crawl https://yoursite.com

And Claude Code will run the crawler, analyze the results, identify patterns that a simple checker would miss (like "your product pages all have nearly identical meta descriptions" or "your blog posts have great internal linking but your landing pages are orphaned"), and produce an actionable report.

The AI Analysis Layer

Here's the key code that bridges your crawler output to Claude's analysis:

// ai-analyzer.js
const fs = require('fs');

function prepareAnalysisPrompt(crawlResults) {
  const { pages, errors, stats } = crawlResults;

  // Build a structured summary for Claude
  const criticalPages = pages.filter(p =>
    p.issues?.some(i => i.severity === 'critical')
  );
  const pagesWithIssues = pages.filter(p => p.issues?.length > 0);

  // Find duplicate titles
  const titleMap = {};
  for (const page of pages) {
    if (page.title) {
      if (!titleMap[page.title]) titleMap[page.title] = [];
      titleMap[page.title].push(page.url);
    }
  }
  const duplicateTitles = Object.entries(titleMap)
    .filter(([, urls]) => urls.length > 1);

  // Find redirect chains
  const redirectMap = {};
  for (const page of pages) {
    if (page.redirectUrl) {
      redirectMap[page.url] = page.redirectUrl;
    }
  }
  const chains = findRedirectChains(redirectMap);

  // Calculate crawl depth
  const depthMap = calculateCrawlDepth(pages);

  // Find orphan pages (pages with no internal links pointing to them)
  const linkedPages = new Set();
  // This would need link source tracking in the crawler
  // For now, we flag pages at depth > 3

  return {
    summary: {
      totalPages: stats.totalPages,
      totalErrors: stats.totalErrors,
      avgResponseTime: stats.avgResponseTime,
      statusCodes: stats.statusCodes,
      issueSummary: stats.issueSummary
    },
    criticalPages: criticalPages.map(p => ({
      url: p.url,
      issues: p.issues.filter(i => i.severity === 'critical')
    })),
    duplicateTitles,
    redirectChains: chains,
    thinContentPages: pages
      .filter(p => p.wordCount && p.wordCount < 300)
      .map(p => ({ url: p.url, wordCount: p.wordCount })),
    slowPages: pages
      .filter(p => p.responseTime > 2000)
      .map(p => ({ url: p.url, responseTime: p.responseTime })),
    missingStructuredData: pages
      .filter(p => p.hasStructuredData === false)
      .map(p => p.url),
    depthDistribution: depthMap
  };
}

function findRedirectChains(redirectMap) {
  const chains = [];
  for (const [start, target] of Object.entries(redirectMap)) {
    const chain = [start];
    let current = target;
    while (redirectMap[current]) {
      chain.push(current);
      current = redirectMap[current];
      if (chain.includes(current)) {
        chain.push(current + ' (LOOP)');
        break;
      }
    }
    chain.push(current);
    if (chain.length > 2) {
      chains.push(chain);
    }
  }
  return chains;
}

function calculateCrawlDepth(pages) {
  // Simplified depth calculation based on URL path segments
  const depths = {};
  for (const page of pages) {
    try {
      const path = new URL(page.url).pathname;
      const depth = path.split('/').filter(Boolean).length;
      depths[depth] = (depths[depth] || 0) + 1;
    } catch (e) {
      // skip
    }
  }
  return depths;
}

module.exports = { prepareAnalysisPrompt };

The prepareAnalysisPrompt function doesn't just dump raw data at the AI — it pre-processes the crawl results to surface patterns that matter. Duplicate titles, redirect chains, thin content pages, and crawl depth distribution are all calculated before the AI ever sees the data. This makes the AI's analysis dramatically better because it's working with structured insights rather than raw page-by-page data.

Project 3: Building a Keyword Rank Tracker

SEO crawling is only half the picture. You also need to know where you rank. Let's build a lightweight rank tracker that runs on a schedule and tracks position changes over time.

// rank-tracker.js
const https = require('https');

class RankTracker {
  constructor(options = {}) {
    this.dataFile = options.dataFile || 'rankings.json';
    this.history = this.loadHistory();
  }

  loadHistory() {
    const fs = require('fs');
    try {
      return JSON.parse(fs.readFileSync(this.dataFile, 'utf8'));
    } catch (e) {
      return { keywords: {}, snapshots: [] };
    }
  }

  saveHistory() {
    const fs = require('fs');
    fs.writeFileSync(this.dataFile, JSON.stringify(this.history, null, 2));
  }

  // Uses Google Search Console API via MCP
  // In practice, you'd connect this to GSC or a SERP API
  async checkRankings(keywords, domain) {
    const snapshot = {
      date: new Date().toISOString(),
      domain,
      rankings: {}
    };

    for (const keyword of keywords) {
      // In a real implementation, you'd hit a SERP API here.
      // For Claude Code integration, you'd use an MCP server
      // connected to Google Search Console.
      //
      // Example with a SERP checking approach:
      const result = await this.checkKeywordPosition(keyword, domain);
      snapshot.rankings[keyword] = result;

      // Track historical data
      if (!this.history.keywords[keyword]) {
        this.history.keywords[keyword] = [];
      }
      this.history.keywords[keyword].push({
        date: snapshot.date,
        position: result.position,
        url: result.url
      });
    }

    this.history.snapshots.push(snapshot);
    this.saveHistory();
    return snapshot;
  }

  async checkKeywordPosition(keyword, domain) {
    // Placeholder for actual SERP checking logic.
    // In production, use:
    // 1. Google Search Console API (free, your own data)
    // 2. SerpAPI or similar (paid, competitor data too)
    // 3. Custom scraping (risky, against Google TOS)
    return {
      keyword,
      position: null,
      url: null,
      note: 'Connect to GSC MCP server for real data'
    };
  }

  generateReport() {
    const report = [];

    for (const [keyword, history] of Object.entries(this.history.keywords)) {
      if (history.length < 2) continue;

      const latest = history[history.length - 1];
      const previous = history[history.length - 2];
      const change = previous.position && latest.position
        ? previous.position - latest.position
        : null;

      report.push({
        keyword,
        currentPosition: latest.position,
        previousPosition: previous.position,
        change,
        trend: change > 0 ? 'improving' : change < 0 ? 'declining' : 'stable',
        url: latest.url,
        dataPoints: history.length
      });
    }

    // Sort by biggest drops first (most urgent)
    return report.sort((a, b) => (a.change || 0) - (b.change || 0));
  }
}

module.exports = RankTracker;

Connecting to Google Search Console via MCP

The real power comes when you connect this to live data. With Claude Code's MCP integration, you can set up a Google Search Console connection:

// claude-code-gsc-config example
// Add to your .claude/mcp_servers.json:
{
  "google-search-console": {
    "command": "node",
    "args": ["./mcp-servers/gsc-server.js"],
    "env": {
      "GSC_CREDENTIALS": "./credentials.json",
      "GSC_SITE_URL": "https://yoursite.com"
    }
  }
}

Now Claude Code can query your Search Console data directly:

> What keywords am I ranking for that have high impressions but low CTR?

And it'll pull the data, analyze it, and suggest title tag improvements for the specific URLs that are underperforming. This is the kind of workflow that would take a human SEO analyst an hour of exporting CSVs, building pivot tables, and cross-referencing data. Claude Code does it in seconds.

Project 4: An Open Claw-Powered SEO Monitoring Agent

This is the most powerful use case. We're going to set up an autonomous agent that monitors your site's SEO health continuously and alerts you when something goes wrong.

The Architecture

The idea is simple: Open Claw runs on your VPS, executes scheduled crawls, compares results to previous baselines, and messages you on Telegram when it finds something worth your attention.

// monitor.js - The SEO Monitoring Agent
const SEOCrawler = require('./crawler');
const fs = require('fs');
const path = require('path');

class SEOMonitor {
  constructor(config) {
    this.sites = config.sites; // Array of URLs to monitor
    this.dataDir = config.dataDir || './monitoring-data';
    this.alertThresholds = config.alertThresholds || {
      newBrokenLinks: 1,       // Alert on any new broken link
      positionDrop: 3,         // Alert if any keyword drops 3+ positions
      responseTimeSpike: 2000, // Alert if avg response time exceeds 2s
      newNoindex: 1,           // Alert on any new noindex page
      uptimeFailure: 1         // Alert on any unreachable page
    };

    if (!fs.existsSync(this.dataDir)) {
      fs.mkdirSync(this.dataDir, { recursive: true });
    }
  }

  async runCheck(siteUrl) {
    const domain = new URL(siteUrl).hostname;
    const timestamp = new Date().toISOString().split('T')[0];
    const resultFile = path.join(
      this.dataDir,
      `${domain}-${timestamp}.json`
    );

    console.log(`[${new Date().toISOString()}] Crawling ${siteUrl}...`);

    const crawler = new SEOCrawler(siteUrl, {
      maxPages: 200,
      concurrency: 3
    });

    const results = await crawler.crawl();

    // Save current results
    fs.writeFileSync(resultFile, JSON.stringify(results, null, 2));

    // Compare with previous crawl
    const previousFile = this.findPreviousCrawl(domain, timestamp);
    if (previousFile) {
      const previous = JSON.parse(fs.readFileSync(previousFile, 'utf8'));
      return this.compareResults(domain, previous, results);
    }

    return { domain, alerts: [], message: 'First crawl — baseline established.' };
  }

  findPreviousCrawl(domain, currentDate) {
    const files = fs.readdirSync(this.dataDir)
      .filter(f => f.startsWith(domain) && !f.includes(currentDate))
      .sort()
      .reverse();
    return files.length > 0
      ? path.join(this.dataDir, files[0])
      : null;
  }

  compareResults(domain, previous, current) {
    const alerts = [];

    // Check for new broken links
    const prevBroken = new Set(
      previous.pages
        .filter(p => p.statusCode >= 400)
        .map(p => p.url)
    );
    const newBroken = current.pages
      .filter(p => p.statusCode >= 400 && !prevBroken.has(p.url));

    if (newBroken.length >= this.alertThresholds.newBrokenLinks) {
      alerts.push({
        severity: 'critical',
        type: 'new_broken_links',
        message: `${newBroken.length} new broken links detected`,
        details: newBroken.map(p => `${p.url} (${p.statusCode})`)
      });
    }

    // Check for new noindex pages
    const prevNoindex = new Set(
      previous.pages
        .filter(p => p.metaRobots?.includes('noindex'))
        .map(p => p.url)
    );
    const newNoindex = current.pages
      .filter(p => p.metaRobots?.includes('noindex') && !prevNoindex.has(p.url));

    if (newNoindex.length >= this.alertThresholds.newNoindex) {
      alerts.push({
        severity: 'critical',
        type: 'new_noindex_pages',
        message: `${newNoindex.length} pages newly set to noindex`,
        details: newNoindex.map(p => p.url)
      });
    }

    // Check response time changes
    const prevAvg = previous.stats.avgResponseTime;
    const currentAvg = current.stats.avgResponseTime;
    if (currentAvg > this.alertThresholds.responseTimeSpike &&
        currentAvg > prevAvg * 1.5) {
      alerts.push({
        severity: 'warning',
        type: 'response_time_spike',
        message: `Average response time spiked from ${prevAvg}ms to ${currentAvg}ms`,
        details: current.pages
          .filter(p => p.responseTime > 3000)
          .map(p => `${p.url} (${p.responseTime}ms)`)
      });
    }

    // Check for disappeared pages
    const currentUrls = new Set(current.pages.map(p => p.url));
    const disappeared = previous.pages
      .filter(p => p.statusCode === 200 && !currentUrls.has(p.url))
      .map(p => p.url);

    if (disappeared.length > 0) {
      alerts.push({
        severity: 'warning',
        type: 'disappeared_pages',
        message: `${disappeared.length} previously accessible pages are no longer found`,
        details: disappeared
      });
    }

    // Check for new pages (informational)
    const prevUrls = new Set(previous.pages.map(p => p.url));
    const newPages = current.pages
      .filter(p => p.statusCode === 200 && !prevUrls.has(p.url));

    if (newPages.length > 0) {
      alerts.push({
        severity: 'info',
        type: 'new_pages_detected',
        message: `${newPages.length} new pages detected`,
        details: newPages.map(p => p.url)
      });
    }

    return { domain, alerts };
  }

  formatAlertMessage(checkResult) {
    if (checkResult.alerts.length === 0) {
      return `✅ ${checkResult.domain}: All clear. No new issues detected.`;
    }

    let message = `🔍 SEO Alert for ${checkResult.domain}\n\n`;

    const criticals = checkResult.alerts.filter(a => a.severity === 'critical');
    const warnings = checkResult.alerts.filter(a => a.severity === 'warning');
    const infos = checkResult.alerts.filter(a => a.severity === 'info');

    if (criticals.length > 0) {
      message += '🔴 CRITICAL:\n';
      for (const alert of criticals) {
        message += `  ${alert.message}\n`;
        for (const detail of alert.details.slice(0, 5)) {
          message += `    → ${detail}\n`;
        }
        if (alert.details.length > 5) {
          message += `    ... and ${alert.details.length - 5} more\n`;
        }
      }
      message += '\n';
    }

    if (warnings.length > 0) {
      message += '🟡 WARNINGS:\n';
      for (const alert of warnings) {
        message += `  ${alert.message}\n`;
        for (const detail of alert.details.slice(0, 3)) {
          message += `    → ${detail}\n`;
        }
      }
      message += '\n';
    }

    if (infos.length > 0) {
      message += 'ℹ️ INFO:\n';
      for (const alert of infos) {
        message += `  ${alert.message}\n`;
      }
    }

    return message;
  }
}

module.exports = SEOMonitor;

Connecting to Telegram via Open Claw

With Open Claw running on your VPS, you configure it to run this monitor on a schedule and send results via Telegram. The beauty of Open Claw is that you can literally tell it in natural language:

"Every morning at 6 AM Alaska time, run the SEO monitor against grizzlypeaksoftware.com. If there are any critical or warning alerts, send them to me immediately on Telegram. If everything is clean, just send a brief all-clear message. Also, on Fridays, include a summary of all changes from the past week."

Open Claw handles the scheduling, the execution, and the notification delivery. You don't have to write a cron job or set up a notification service — the agent does it all.

Project 5: Content Gap Analyzer

Here's a tool that combines crawling with AI analysis to find content opportunities your competitors have that you don't.

// content-gap.js
const SEOCrawler = require('./crawler');

class ContentGapAnalyzer {
  constructor() {
    this.siteData = {};
  }

  async analyzeSite(url) {
    const crawler = new SEOCrawler(url, {
      maxPages: 300,
      concurrency: 3
    });
    const results = await crawler.crawl();

    const pages = results.pages
      .filter(p => p.statusCode === 200 && p.title)
      .map(p => ({
        url: p.url,
        title: p.title,
        h1: p.h1Text,
        wordCount: p.wordCount,
        hasStructuredData: p.hasStructuredData,
        path: new URL(p.url).pathname
      }));

    return {
      domain: new URL(url).hostname,
      pages,
      topicClusters: this.identifyTopicClusters(pages),
      contentDepth: this.analyzeContentDepth(pages)
    };
  }

  identifyTopicClusters(pages) {
    // Group pages by URL path segments
    const clusters = {};
    for (const page of pages) {
      const segments = page.path.split('/').filter(Boolean);
      const category = segments[0] || 'root';
      if (!clusters[category]) {
        clusters[category] = [];
      }
      clusters[category].push({
        url: page.url,
        title: page.title,
        wordCount: page.wordCount
      });
    }
    return clusters;
  }

  analyzeContentDepth(pages) {
    const depths = {};
    for (const page of pages) {
      const segments = page.path.split('/').filter(Boolean);
      const depth = segments.length;
      if (!depths[depth]) depths[depth] = 0;
      depths[depth]++;
    }
    return depths;
  }

  async findGaps(yourSiteUrl, competitorUrls) {
    console.log(`Analyzing your site: ${yourSiteUrl}`);
    const yourData = await this.analyzeSite(yourSiteUrl);

    const competitorData = [];
    for (const url of competitorUrls) {
      console.log(`Analyzing competitor: ${url}`);
      competitorData.push(await this.analyzeSite(url));
    }

    // Compare topic clusters
    const yourTopics = new Set(Object.keys(yourData.topicClusters));
    const competitorTopics = new Set();
    for (const comp of competitorData) {
      for (const topic of Object.keys(comp.topicClusters)) {
        competitorTopics.add(topic);
      }
    }

    const missingTopics = [...competitorTopics].filter(
      t => !yourTopics.has(t)
    );

    // Compare content volume per topic
    const thinTopics = [];
    for (const topic of yourTopics) {
      const yourCount = yourData.topicClusters[topic].length;
      for (const comp of competitorData) {
        const compCount = comp.topicClusters[topic]?.length || 0;
        if (compCount > yourCount * 2) {
          thinTopics.push({
            topic,
            yourPages: yourCount,
            competitorPages: compCount,
            competitor: comp.domain
          });
        }
      }
    }

    return {
      yourSite: yourData,
      competitors: competitorData,
      gaps: {
        missingTopics,
        thinTopics,
        recommendations: this.generateRecommendations(
          missingTopics,
          thinTopics
        )
      }
    };
  }

  generateRecommendations(missingTopics, thinTopics) {
    const recs = [];

    for (const topic of missingTopics) {
      recs.push({
        priority: 'high',
        type: 'missing_topic',
        message: `Create content for the "${topic}" topic area. Competitors have coverage here but you don't.`
      });
    }

    for (const thin of thinTopics) {
      recs.push({
        priority: 'medium',
        type: 'thin_coverage',
        message: `Expand "${thin.topic}" content. You have ${thin.yourPages} pages but ${thin.competitor} has ${thin.competitorPages}.`
      });
    }

    return recs.sort((a, b) => {
      const priority = { high: 0, medium: 1, low: 2 };
      return priority[a.priority] - priority[b.priority];
    });
  }
}

module.exports = ContentGapAnalyzer;

When you run this through Claude Code, you can take it further:

> Run a content gap analysis between grizzlypeaksoftware.com and 
> the top 3 competitors you find in search results for 
> "software engineer career change." Then suggest 10 article 
> titles I should write to fill the gaps, prioritized by 
> keyword difficulty and search volume.

Claude Code will execute the crawler against all four sites, run the gap analysis, cross-reference with search volume data (via an MCP-connected keyword tool), and produce a prioritized content roadmap. That's a task that would take a content strategist a full day, done in minutes.

Pulling It All Together: The Agentic SEO Workflow

Here's how all these pieces fit together in a real-world workflow:

Daily (automated via Open Claw):

Morning crawl of your site to check for new issues
Rank tracking check for your target keywords
Competitor monitoring for structural changes
Telegram alert with any critical findings

Weekly (triggered via Claude Code):

Full technical audit with AI-powered analysis
Content gap analysis against top competitors
Performance trend analysis (are response times improving?)
Weekly SEO health report generated as markdown

Monthly (manual with Claude Code assistance):

Deep-dive content strategy review
Link structure optimization analysis
Schema markup audit and generation
Crawl comparison: this month vs. last month

On-demand (Claude Code slash commands):

/crawl [url] — Quick crawl with AI analysis
/audit [url] — Full technical SEO audit
/gaps [url] — Content gap analysis
/keywords — Keyword research and prioritization

The total cost of this setup: your Claude Code subscription ($20–$200/month), a $5/month VPS for Open Claw, and zero dollars in SEO tool subscriptions. You get more customization, deeper analysis, and AI-powered insights than any commercial tool provides.

Practical Tips from the Trenches

I've been building and using these tools across multiple sites, and here are the lessons that don't make it into tutorials:

Respect rate limits. Your crawler isn't Google. If you hammer a site with 50 concurrent requests, you'll get IP-banned and potentially cause real problems for smaller sites. Keep concurrency at 3-5 requests and add polite delays between requests. The respectRobotsTxt flag in our crawler isn't optional — it's an ethical requirement.

Store everything. Crawl data is surprisingly useful in aggregate. That crawl you did three months ago becomes invaluable when you're trying to figure out when a particular issue was introduced. Use timestamped JSON files, or better yet, pipe results into a PostgreSQL database where you can run historical queries.

Don't trust the AI blindly. Claude Code is remarkably good at analyzing crawl data and generating recommendations, but it can hallucinate patterns that don't exist or miss context that a human SEO would immediately recognize. Use AI as a force multiplier, not a replacement for your judgment. Review the recommendations before acting on them.

Test against your own sites first. Before you crawl a competitor's site, make sure your tools work correctly on your own. You'll catch bugs faster and you won't accidentally hammer someone else's server with a broken crawler.

Build incrementally. You don't need all five projects on day one. Start with the basic crawler, get comfortable with it, then layer on the AI analysis. Add monitoring once you have a stable crawl pipeline. The content gap analyzer is a nice-to-have, not a must-have.

The Future: Where Agentic SEO Is Heading

We're at the very beginning of agentic SEO tooling. Right now, Open Claw and Claude Code are powerful but still require meaningful setup and configuration. Within a year, I expect we'll see:

One-click SEO agent deployments. Someone will package the monitoring agent, the crawler, and the analysis pipeline into a single deployable unit that non-developers can spin up.

Real-time SERP integration. As MCP servers mature, we'll see direct connections to Google Search Console, Bing Webmaster Tools, and third-party SERP APIs that let agents react to ranking changes in real-time.

Multi-agent SEO teams. Instead of one agent doing everything, you'll have specialized agents — a crawler agent, a content agent, a link building agent, a technical audit agent — coordinated by an orchestrator. The GitHub project wshobson/agents already has 112 specialized agents including an SEO specialist, and this is just the beginning.

Autonomous content optimization. Agents that can not only identify content gaps but draft optimized content, test it against the existing top-ranking pages, and iteratively improve it based on performance data. We're not there yet in a way that preserves authentic voice, but the trajectory is clear.

The bottom line: if you're a developer who does SEO, there has never been a better time to build your own tools. The combination of modern JavaScript, open-source crawling libraries, and agentic AI gives you capabilities that were enterprise-only just two years ago. And unlike commercial tools, your custom-built stack grows with you, adapts to your specific needs, and costs a fraction of the subscription treadmill.

Stop renting your SEO tools. Start building them.

Shane builds developer tools and writes about the intersection of software engineering and AI at Grizzly Peak Software. He's based in Alaska, where the Wi-Fi is questionable but the mountain views make up for it.

How to Build Your Own SEO Crawler Tools with Agentic AI in 2026

Why Build Your Own SEO Crawler?

The Agentic AI Toolkit: Claude Code and Open Claw

Claude Code: Your Terminal-Native AI Engineer

Open Claw: The 24/7 Autonomous Agent

Project 1: Building a Technical SEO Crawler from Scratch

The Foundation: A Concurrent HTTP Crawler

Running It

Project 2: Supercharging the Crawler with Claude Code

Setting Up the Claude Code Integration

The AI Analysis Layer

Project 3: Building a Keyword Rank Tracker

Connecting to Google Search Console via MCP

Project 4: An Open Claw-Powered SEO Monitoring Agent

The Architecture

Connecting to Telegram via Open Claw

Project 5: Content Gap Analyzer

Pulling It All Together: The Agentic SEO Workflow

Practical Tips from the Trenches

The Future: Where Agentic SEO Is Heading

Quick Links

Recent Articles

Need Expert Help?

The Fundamentals of Training an LLM: A Python & PyTorch Guide

Why Build Your Own SEO Crawler?

The Agentic AI Toolkit: Claude Code and Open Claw

Claude Code: Your Terminal-Native AI Engineer

Open Claw: The 24/7 Autonomous Agent

Project 1: Building a Technical SEO Crawler from Scratch

The Foundation: A Concurrent HTTP Crawler

Running It

Project 2: Supercharging the Crawler with Claude Code

Setting Up the Claude Code Integration

The AI Analysis Layer

Project 3: Building a Keyword Rank Tracker

Connecting to Google Search Console via MCP

Project 4: An Open Claw-Powered SEO Monitoring Agent

The Architecture

Connecting to Telegram via Open Claw

Project 5: Content Gap Analyzer

Pulling It All Together: The Agentic SEO Workflow

Practical Tips from the Trenches

The Future: Where Agentic SEO Is Heading

Quick Links

Recent Articles

Need Expert Help?