AI

From Zero to MVP: 12-Hour Build Using Only Free-Tier LLMs in 2026

I wanted to see if it was possible to build something genuinely useful — not a toy, not a demo, an actual tool someone would pay for — using nothing but...

I wanted to see if it was possible to build something genuinely useful — not a toy, not a demo, an actual tool someone would pay for — using nothing but free-tier AI services. No credit card charges. No API billing surprises. Just whatever the major providers give away for free in 2026.

So I set a timer for 12 hours, picked a project idea, and started building. Here's exactly what happened.

Cap'n Crunch: The Whistle That Started It All — John Draper and the Birth of Hacking

Cap'n Crunch: The Whistle That Started It All — John Draper and the Birth of Hacking

A cereal box whistle hacked AT&T's phone network. John Draper's story—from engineering genius to Apple's prehistory to complicated downfall. The full truth.

Learn More

The Idea

I'd been annoyed for weeks by a specific problem: whenever I write a new article for Grizzly Peak Software, I need to generate meta descriptions, Open Graph tags, structured data, and social media post text. It's tedious, repetitive work that I keep doing manually because I never bothered to automate it.

The MVP: a tool where you paste in a blog article (title + markdown content), and it generates all the SEO metadata, social media posts for three platforms, and JSON-LD structured data. Simple input, useful output, clear value.

I called it "MetaForge" because naming things is hard and I had eleven hours and fifty-eight minutes of real work to do.


The Free-Tier Landscape in 2026

Before I started coding, I spent about twenty minutes surveying what's actually available for free. This landscape changes constantly, but here's what I was working with in early 2026:

Google Gemini (free tier):

  • Gemini 1.5 Flash: 15 requests per minute, 1 million tokens per day
  • Gemini 2.0 Flash: generous free tier through AI Studio
  • Best free-tier offering by a wide margin in terms of volume

Mistral (free tier):

  • Mistral Small and Codestral available through La Plateforme
  • Limited but usable — around 500K tokens per month on the free plan
  • Good for code generation tasks specifically

Claude (free tier):

  • Limited free usage through claude.ai for manual work
  • No free API tier for automated usage (Anthropic charges from the first API call)
  • Useful for planning and manual testing, not for the production pipeline

OpenAI (free tier):

  • GPT-3.5-Turbo was available on free tier for a while but has been deprecated
  • GPT-4o mini has limited free access through the API with prepaid credits for new accounts
  • Less generous than Gemini for sustained free usage

Groq (free tier):

  • Extremely fast inference on open-source models (Llama, Mixtral)
  • Free tier with rate limits: 30 requests per minute on most models
  • Great for tasks that need speed over maximum intelligence

HuggingFace Inference API (free tier):

  • Free inference on thousands of open-source models
  • Rate-limited but functional for low-traffic tools
  • Good fallback option

The clear winner for a free-tier build is Gemini, with Groq as a strong secondary option. Mistral fills a niche for code-specific tasks. The rest are useful for development but not practical for a production free-tier pipeline.


Hour 0-1: Architecture Decisions Under Pressure

With 12 hours, I can't afford to make wrong architectural choices and backtrack. Here's what I decided quickly:

Backend: Node.js with Express. I know it cold, I have boilerplate ready, and it plays well with every AI SDK.

Frontend: Server-rendered Pug templates with Bootstrap. No React, no build step, no client-side complexity. The tool has two pages: an input form and a results page. A single-page app would be overengineering.

AI Provider: Gemini 2.0 Flash as the primary model. It's fast, it's free, and it's smart enough for text generation tasks. Groq with Llama 3 as a fallback if Gemini's rate limits become a problem.

Database: None. For an MVP, I'm storing nothing. The tool takes input, processes it, returns output. State is for version 2.

Hosting: My existing DigitalOcean droplet. It already runs Nginx and PM2. Adding another Express app takes five minutes.

mkdir metaforge
cd metaforge
npm init -y
npm install express pug @google/generative-ai groq-sdk dotenv
mkdir views routes utils
touch app.js .env

Scaffold time: eight minutes. I already had my Express boilerplate template on disk.


Hour 1-3: The Core Generation Engine

The heart of MetaForge is a function that takes article content and produces structured metadata. This is where the free-tier constraints start mattering.

var { GoogleGenerativeAI } = require("@google/generative-ai");

var genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY);

function generateMetadata(title, content) {
  var model = genAI.getGenerativeModel({ model: "gemini-2.0-flash" });

  var prompt = [
    "You are an SEO and content marketing expert.",
    "Given the following blog article, generate metadata in JSON format.",
    "",
    "Article Title: " + title,
    "Article Content:",
    content.substring(0, 8000),
    "",
    "Generate the following in valid JSON:",
    "{",
    '  "metaDescription": "150-160 char SEO meta description",',
    '  "ogTitle": "Open Graph title (under 60 chars)",',
    '  "ogDescription": "Open Graph description (under 200 chars)",',
    '  "twitterText": "Tweet-length post with relevant hashtags",',
    '  "linkedinPost": "Professional LinkedIn post (2-3 paragraphs)",',
    '  "redditTitle": "Reddit-style title (conversational, not clickbait)",',
    '  "keywords": ["array", "of", "5-8", "seo", "keywords"],',
    '  "jsonLd": { structured data object for BlogPosting schema }',
    "}",
    "",
    "Rules:",
    "- Meta description must be 150-160 characters exactly",
    "- Include actual content insights, not generic descriptions",
    "- Twitter post should feel natural, not promotional",
    "- LinkedIn post should add professional context",
    "- JSON-LD should follow schema.org BlogPosting specification",
    "- Return ONLY valid JSON, no markdown fencing"
  ].join("\n");

  return model.generateContent(prompt).then(function(result) {
    var text = result.response.text();
    // Strip markdown code fences if the model adds them
    text = text.replace(/```json\n?/g, "").replace(/```\n?/g, "").trim();
    return JSON.parse(text);
  });
}

module.exports = { generateMetadata: generateMetadata };

This worked on the first try for simple articles. It broke on the third test — the model returned JSON with trailing commas, which JavaScript's JSON.parse doesn't accept. Classic LLM issue.

The JSON Reliability Problem

This is the first real lesson of building with free-tier models: you can't trust the output format. Paid API tiers on Claude and GPT-4 have features like structured output modes and JSON mode that guarantee valid JSON. Free-tier Gemini does not guarantee this.

My fix was a cleanup function:

function cleanJsonResponse(text) {
  // Remove markdown fences
  text = text.replace(/```json\n?/g, "").replace(/```\n?/g, "").trim();

  // Remove trailing commas before closing braces/brackets
  text = text.replace(/,\s*([}\]])/g, "$1");

  // Try parsing
  try {
    return JSON.parse(text);
  } catch (e) {
    // Try extracting JSON from surrounding text
    var match = text.match(/\{[\s\S]*\}/);
    if (match) {
      try {
        var cleaned = match[0].replace(/,\s*([}\]])/g, "$1");
        return JSON.parse(cleaned);
      } catch (e2) {
        return null;
      }
    }
    return null;
  }
}

Ugly? Yes. Necessary? Absolutely. When you're working with free-tier models that don't support structured output modes, you need defensive parsing. This function handled about 95% of the malformed responses I encountered during testing.


Hour 3-5: Building the Interface

With the generation engine working, I built the web interface. Two pages: input and results.

The input page is a form with two fields — title and content. The results page displays all the generated metadata in copyable sections.

var express = require("express");
var router = express.Router();
var generator = require("../utils/generator");

router.get("/", function(req, res) {
  res.render("input", { title: "MetaForge - Article Metadata Generator" });
});

router.post("/generate", function(req, res) {
  var articleTitle = req.body.title;
  var articleContent = req.body.content;

  if (!articleTitle || !articleContent) {
    return res.render("input", {
      title: "MetaForge",
      error: "Both title and content are required."
    });
  }

  generator.generateMetadata(articleTitle, articleContent)
    .then(function(metadata) {
      if (!metadata) {
        return res.render("input", {
          title: "MetaForge",
          error: "Failed to generate metadata. Please try again."
        });
      }
      res.render("results", {
        title: "Results - MetaForge",
        articleTitle: articleTitle,
        metadata: metadata
      });
    })
    .catch(function(err) {
      console.error("Generation error:", err.message);
      res.render("input", {
        title: "MetaForge",
        error: "An error occurred. The AI service may be rate-limited. Try again in a minute."
      });
    });
});

module.exports = router;

I spent probably too long on making the results page look decent — copy-to-clipboard buttons for each field, syntax-highlighted JSON-LD preview, character counts next to the meta description. But these are the details that make a tool feel usable versus feeling like a prototype.


Hour 5-7: The Groq Fallback and Rate Limit Dance

Around hour 5, I started hitting Gemini's rate limits during testing. Fifteen requests per minute sounds like a lot until you're rapidly iterating on prompts and testing edge cases.

This is where Groq saved the build:

var Groq = require("groq-sdk");

var groqClient = new Groq({ apiKey: process.env.GROQ_API_KEY });

function generateMetadataGroq(title, content) {
  return groqClient.chat.completions.create({
    model: "llama-3.3-70b-versatile",
    messages: [
      {
        role: "system",
        content: "You are an SEO expert. Return only valid JSON, no explanation."
      },
      {
        role: "user",
        content: "Generate SEO metadata for this article.\n\n" +
          "Title: " + title + "\n\n" +
          "Content: " + content.substring(0, 6000) + "\n\n" +
          "Return JSON with: metaDescription (150-160 chars), ogTitle, " +
          "ogDescription, twitterText, linkedinPost, redditTitle, " +
          "keywords (array), jsonLd (BlogPosting schema)"
      }
    ],
    temperature: 0.7,
    max_tokens: 2000
  }).then(function(completion) {
    var text = completion.choices[0].message.content;
    return cleanJsonResponse(text);
  });
}

Then a wrapper that tries Gemini first and falls back to Groq:

function generateWithFallback(title, content) {
  return generateMetadata(title, content)
    .then(function(result) {
      if (result) return { data: result, provider: "gemini" };
      // Gemini returned bad JSON, try Groq
      return generateMetadataGroq(title, content).then(function(groqResult) {
        return { data: groqResult, provider: "groq" };
      });
    })
    .catch(function(err) {
      console.log("Gemini failed (" + err.message + "), falling back to Groq");
      return generateMetadataGroq(title, content).then(function(groqResult) {
        return { data: groqResult, provider: "groq" };
      });
    });
}

The quality difference between Gemini Flash and Llama 3 70B on Groq is noticeable but not dramatic for this use case. Gemini produces slightly better social media copy. Llama 3 is more reliable about JSON formatting. Both produce usable meta descriptions and structured data.


Hour 7-9: Quality Problems and Prompt Engineering

Here's where I hit the wall that every free-tier builder hits: quality isn't quite good enough out of the box.

Problem 1: Meta descriptions were generic. The model kept producing descriptions like "Learn about API development and best practices in this comprehensive guide." That's SEO filler, not a useful meta description.

Fix: I added three example input/output pairs to the prompt. Few-shot examples dramatically improve output quality on free-tier models. This is the single most effective prompt engineering technique I know.

var fewShotExamples = [
  {
    title: "Why I Stopped Using Docker for Local Development",
    metaDescription: "After 5 years of Docker Compose for local dev, " +
      "I switched to native tooling. Here's what broke, what improved, " +
      "and why the tradeoffs surprised me."
  },
  {
    title: "Building a Job Board That Costs $4/Month to Run",
    metaDescription: "A technical breakdown of building a job aggregator " +
      "with AI classification on a $4/month server, processing 500+ " +
      "listings daily with Claude Haiku."
  }
];

Adding these examples to the prompt increased the meta description quality from "usable after editing" to "usable as-is" about 70% of the time.

Problem 2: JSON-LD was often invalid. The models would generate plausible-looking JSON-LD that failed Google's structured data testing tool — wrong property names, missing required fields, incorrect nesting.

Fix: I included the exact schema template in the prompt and told the model to fill in the blanks rather than generate from scratch:

var jsonLdTemplate = {
  "@context": "https://schema.org",
  "@type": "BlogPosting",
  "headline": "FILL_IN",
  "description": "FILL_IN",
  "author": {
    "@type": "Person",
    "name": "FILL_IN"
  },
  "keywords": "FILL_IN",
  "datePublished": "FILL_IN_ISO8601",
  "publisher": {
    "@type": "Organization",
    "name": "FILL_IN"
  }
};

Template-filling is more reliable than open-ended generation, especially on smaller/free-tier models. This is a pattern I keep coming back to.

Problem 3: LinkedIn posts sounded robotic. Gemini Flash in particular has a tendency toward formal, buzzword-heavy language in professional contexts.

Fix: Added "Write like a real engineer sharing something interesting, not like a LinkedIn influencer" to the prompt. Crude, but effective. Sometimes the best prompt engineering is just being blunt about what you don't want.


Hour 9-11: Polish, Error Handling, and Deployment

The last productive hours went to making the tool resilient:

Rate limit handling: Added exponential backoff on 429 responses and a user-facing message explaining why generation might be slow. Free-tier users understand rate limits; they don't understand mysterious failures.

function retryWithBackoff(fn, maxRetries, delay) {
  return fn().catch(function(err) {
    if (maxRetries <= 0) throw err;
    if (err.status === 429 || err.message.indexOf("rate") !== -1) {
      console.log("Rate limited, retrying in " + delay + "ms");
      return new Promise(function(resolve) {
        setTimeout(function() {
          resolve(retryWithBackoff(fn, maxRetries - 1, delay * 2));
        }, delay);
      });
    }
    throw err;
  });
}

Input validation: Max content length of 10,000 characters (free-tier context windows are limited), basic XSS prevention, trim whitespace.

Deployment: Added a PM2 process config, updated Nginx to proxy the new app on a subdomain, and verified SSL was working. This took about twenty minutes because I've done it dozens of times.


Hour 11-12: Testing with Real Articles

I fed MetaForge ten real articles from Grizzly Peak Software. Results:

  • Meta descriptions: 7 out of 10 usable without editing. 3 needed minor tweaks (one was too long, two were too vague).
  • Twitter posts: 8 out of 10 good. The model sometimes adds too many hashtags.
  • LinkedIn posts: 6 out of 10 usable. Still slightly robotic despite the prompt fix.
  • JSON-LD: 9 out of 10 valid per Google's testing tool. One had an incorrect date format.
  • Reddit titles: 9 out of 10 good. The model is surprisingly good at sounding like a real Reddit post.

Overall, the tool saves me about 15-20 minutes per article. Across 500 articles, that's a lot of hours.


What Free Tier Can and Can't Do

Let me be direct about this, because there's too much hype about building entire businesses on free AI tiers.

What free tier handles well:

  • Text generation tasks that don't require maximum quality
  • Internal tools where "good enough" output gets human-reviewed
  • Low-traffic applications (under a few hundred requests per day)
  • Prototyping and validating ideas before committing to paid tiers
  • Batch processing during off-peak hours

What free tier can't do:

  • High-reliability production services (rate limits will bite you at the worst time)
  • Tasks requiring the best available model quality (GPT-4, Claude Opus)
  • Real-time applications where latency matters (free tiers are deprioritized)
  • Anything requiring guaranteed uptime or SLAs
  • High-volume processing (you'll hit daily token limits)

The honest math: MetaForge on free tier can handle maybe 50-100 articles per day before hitting rate limits across Gemini and Groq combined. If I needed to process 1,000 articles in a day, I'd need paid tiers. For my personal use of 2-3 articles per week? Free tier is more than enough.

The free tier is best understood as a development and validation tool, not a production foundation. Build your MVP on free tier to prove the concept works. Then budget for paid API access when you have users.


The Build Diary Summary

| Time | Activity | Status | |------|----------|--------| | 0:00-0:20 | Survey free-tier landscape | Done | | 0:20-1:00 | Architecture decisions, scaffold | Done | | 1:00-3:00 | Core generation engine | Done (with JSON parsing hacks) | | 3:00-5:00 | Web interface | Done | | 5:00-7:00 | Groq fallback, rate limit handling | Done | | 7:00-9:00 | Prompt engineering, quality fixes | Done | | 9:00-11:00 | Polish, error handling, deploy | Done | | 11:00-12:00 | Testing with real articles | Done |

Total cost: $0.00 in AI API charges. $0.00 in additional hosting (used existing server). About $4 worth of coffee.


What I'd Do Differently

If I ran this experiment again, I'd change three things.

Start with Groq instead of Gemini. Groq's inference speed on Llama 3 is exceptional, and the JSON output is more reliable. Gemini has the larger free tier, but Groq's speed advantage matters more during rapid development.

Build the prompt engineering test harness first. I spent too much time testing prompts through the web interface. A simple script that runs the same article through the prompt ten times and compares outputs would have saved at least an hour.

Skip LinkedIn post generation. It was the lowest-quality output and the hardest to get right. For an MVP, focus on the outputs that work well (meta descriptions, structured data, Twitter/Reddit posts) and add the marginal features later.


The Takeaway

You can absolutely build a useful MVP on free-tier AI in 2026. The models are good enough for text generation tasks, the free tiers are generous enough for low-volume use, and the multi-provider fallback pattern handles rate limits gracefully.

But don't confuse "possible" with "optimal." Free tier forces you into workarounds — JSON parsing hacks, fallback chains, defensive error handling — that you wouldn't need with paid APIs. The code is uglier. The reliability is lower. The output quality is a step below what paid tiers deliver.

The right play is exactly what I did: build the MVP on free tier to validate the idea, then upgrade to paid APIs for the features and reliability that matter. The 12 hours proved the concept works. The next investment is $20/month in API costs to make it production-grade.

That's a pretty good deal.


Shane Larson is a software engineer and the founder of Grizzly Peak Software. He writes from a cabin in Alaska about building useful things with AI, usually while his free-tier rate limits reset. More at grizzlypeaksoftware.com.

Powered by Contentful