AI

Wildlife Photography Meets Computer Vision: Experiments from an Alaskan Cabin

I have a trail camera problem. Specifically, I have fourteen trail cameras scattered across the woods around my cabin in Caswell Lakes, Alaska, and every...

I have a trail camera problem. Specifically, I have fourteen trail cameras scattered across the woods around my cabin in Caswell Lakes, Alaska, and every week they produce somewhere between two thousand and eight thousand images. Most of those images are trees. Wind-blown branches triggering the motion sensor. Shadows moving across snow. The occasional squirrel doing something unremarkable.

Buried in that mountain of noise are the images I actually care about — a moose and her calf at the creek, a lynx crossing the property at dusk, the black bear that keeps testing my garbage setup. Finding those images manually takes hours. So naturally, I built a computer vision pipeline to do it for me.

Work Smarter with Claude Code: Automate Tasks, Manage Projects, and Run Operations—No Coding Required

Work Smarter with Claude Code: Automate Tasks, Manage Projects, and Run Operations—No Coding Required

AI that sees your files and does the work. Organize chaos, automate tasks, escape spreadsheet hell. No coding required. Practical guide for knowledge workers.

Learn More

The Trail Camera Data Problem

If you've ever used trail cameras for wildlife monitoring, you know the signal-to-noise ratio is brutal. Motion-activated cameras are not smart. They trigger on anything that moves — branches, rain, shifting light, your own dog wandering past for the fortieth time that day.

Commercial trail camera software exists, but it's mostly designed for hunters tracking deer. I'm not hunting. I'm photographing. And the species I care about in southcentral Alaska — moose, black bear, lynx, wolverine, fox, eagles — don't map neatly to the whitetail-focused classifiers that come bundled with hunting camera apps.

I needed something that could:

  1. Sort through thousands of images and discard the blanks
  2. Identify which species was in each image
  3. Rate image quality for photography purposes (composition, lighting, clarity)
  4. Tag images with metadata for a searchable archive

That's a computer vision pipeline. And in 2026, building one from scratch is surprisingly accessible.


Architecture: Keeping It Simple

I built this as a Node.js application that runs on my local machine. The images stay on my hard drive — I'm not sending thousands of wildlife photos to the cloud. The only external API call is to an LLM for the quality assessment step, and even that's optional.

var fs = require("fs");
var path = require("path");
var sharp = require("sharp");
var tf = require("@tensorflow/tfjs-node");
var cocoSsd = require("@tensorflow-models/coco-ssd");

var IMAGE_DIR = process.env.TRAIL_CAM_DIR || "./trail_images";
var OUTPUT_DIR = process.env.OUTPUT_DIR || "./classified";
var CONFIDENCE_THRESHOLD = 0.45;

var model = null;

function loadModel() {
  return cocoSsd.load().then(function (loadedModel) {
    model = loadedModel;
    console.log("COCO-SSD model loaded");
    return model;
  });
}

function preprocessImage(imagePath) {
  return sharp(imagePath)
    .resize(640, 480, { fit: "inside" })
    .removeAlpha()
    .raw()
    .toBuffer({ resolveWithObject: true })
    .then(function (result) {
      var tensor = tf.tensor3d(
        new Uint8Array(result.data),
        [result.info.height, result.info.width, 3]
      );
      return { tensor: tensor, width: result.info.width, height: result.info.height };
    });
}

function detectAnimals(imagePath) {
  return preprocessImage(imagePath).then(function (imageData) {
    return model.detect(imageData.tensor).then(function (predictions) {
      imageData.tensor.dispose();

      var animalClasses = [
        "bird", "cat", "dog", "horse", "sheep", "cow",
        "elephant", "bear", "zebra", "giraffe",
      ];

      var animals = predictions.filter(function (pred) {
        return (
          animalClasses.indexOf(pred.class) !== -1 &&
          pred.score >= CONFIDENCE_THRESHOLD
        );
      });

      return {
        imagePath: imagePath,
        detections: animals,
        hasAnimal: animals.length > 0,
        topPrediction: animals.length > 0 ? animals[0] : null,
      };
    });
  });
}

Right away you'll notice the obvious limitation: COCO-SSD doesn't know what a moose is. Its animal vocabulary is limited to common categories that are heavily represented in the COCO training dataset. A bear might be detected as "bear" (if you're lucky) or might be missed entirely. A moose will never be labeled "moose" — at best it gets classified as a generic large object.

This is where the project got interesting.


The Classification Gap: From COCO to Alaska

The pretrained COCO-SSD model is a good first-pass filter. It answers the question "is there an animal-shaped thing in this image?" with reasonable accuracy. But for species-level identification, I needed something more.

I tried three approaches, and I'll tell you honestly which ones worked and which ones didn't.

Approach 1: Fine-tuning on local wildlife data. This is the theoretically correct answer. Collect labeled images of Alaskan wildlife, fine-tune a model, deploy it. In practice, I didn't have enough labeled data to make this work well. I had maybe 200 decent moose images, 80 bear images, 30 lynx images. That's nowhere near enough for reliable fine-tuning, especially with the variable lighting and image quality of trail cameras.

Approach 2: Using a vision-language model for classification. This is what actually worked. After the COCO-SSD pass identifies that an image contains something animal-like, I send a cropped region to GPT-4o for species identification.

var OpenAI = require("openai");
var openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

function classifySpecies(imagePath, boundingBox) {
  return sharp(imagePath)
    .extract({
      left: Math.round(boundingBox[0]),
      top: Math.round(boundingBox[1]),
      width: Math.round(boundingBox[2]),
      height: Math.round(boundingBox[3]),
    })
    .resize(512, 512, { fit: "inside" })
    .toBuffer()
    .then(function (croppedBuffer) {
      var base64Image = croppedBuffer.toString("base64");

      return openai.chat.completions.create({
        model: "gpt-4o",
        messages: [
          {
            role: "user",
            content: [
              {
                type: "text",
                text:
                  "Identify the wildlife species in this trail camera image from " +
                  "southcentral Alaska. Respond with JSON only: " +
                  '{"species": "common name", "confidence": 0.0-1.0, ' +
                  '"behavior": "brief description of what the animal is doing", ' +
                  '"count": number_of_individuals}',
              },
              {
                type: "image_url",
                image_url: {
                  url: "data:image/jpeg;base64," + base64Image,
                },
              },
            ],
          },
        ],
        max_tokens: 200,
      });
    })
    .then(function (response) {
      try {
        return JSON.parse(response.choices[0].message.content);
      } catch (e) {
        return { species: "unknown", confidence: 0, behavior: "unclassified", count: 0 };
      }
    });
}

Approach 3: A hybrid local classifier. I trained a lightweight MobileNet-based classifier on a combination of my own images and publicly available wildlife datasets from iNaturalist. This handles the common cases (moose, bear, eagle) well enough to avoid API calls for 70% of detections, falling back to GPT-4o only for ambiguous or rare species.

The hybrid approach ended up being the winner for cost and speed. Running every detection through GPT-4o gets expensive fast when you're processing thousands of images a week. At roughly $0.01 per image with the vision model, two thousand animal detections per week would cost $80/month. The hybrid approach cuts that to about $20/month by only sending the difficult cases to the API.


Image Quality Scoring for Photography

Here's where this project diverges from a typical wildlife monitoring system. I don't just want to know what's in the image — I want to know if the image is worth keeping from a photography perspective.

Trail camera photos are, objectively, not great photography. Low resolution, fixed focal length, harsh infrared flash, terrible composition. But occasionally you get something magical — a lynx caught mid-stride in soft evening light, a moose silhouetted against dawn fog. Those images are worth finding.

function scoreImageQuality(imagePath) {
  return sharp(imagePath)
    .stats()
    .then(function (stats) {
      var metadata;
      return sharp(imagePath)
        .metadata()
        .then(function (meta) {
          metadata = meta;

          var brightnessScore = 0;
          var avgBrightness =
            (stats.channels[0].mean +
              stats.channels[1].mean +
              stats.channels[2].mean) /
            3;

          if (avgBrightness > 30 && avgBrightness < 220) {
            brightnessScore = 1 - Math.abs(avgBrightness - 128) / 128;
          }

          var contrastScore = 0;
          var avgStdDev =
            (stats.channels[0].stdev +
              stats.channels[1].stdev +
              stats.channels[2].stdev) /
            3;
          contrastScore = Math.min(avgStdDev / 80, 1);

          var sharpnessScore = 0;
          return sharp(imagePath)
            .greyscale()
            .convolve({
              width: 3,
              height: 3,
              kernel: [0, -1, 0, -1, 4, -1, 0, -1, 0],
            })
            .stats()
            .then(function (edgeStats) {
              sharpnessScore = Math.min(edgeStats.channels[0].stdev / 40, 1);

              var resolution = metadata.width * metadata.height;
              var resolutionScore = Math.min(resolution / 4000000, 1);

              var totalScore =
                brightnessScore * 0.25 +
                contrastScore * 0.25 +
                sharpnessScore * 0.30 +
                resolutionScore * 0.20;

              return {
                overall: Math.round(totalScore * 100),
                brightness: Math.round(brightnessScore * 100),
                contrast: Math.round(contrastScore * 100),
                sharpness: Math.round(sharpnessScore * 100),
                resolution: Math.round(resolutionScore * 100),
                isNightShot: avgBrightness < 50,
              };
            });
        });
    });
}

The quality scoring uses basic image statistics — brightness distribution, contrast (standard deviation of pixel values), edge detection for sharpness, and resolution. It's crude compared to what a real computational photography system would do, but it's effective at separating the completely unusable images (pitch black, motion blur, blown-out flash) from the potentially interesting ones.

I weight sharpness highest at 30% because motion blur is the number one killer of trail camera images. An animal walking past triggers the camera, but the shutter speed isn't fast enough to freeze the motion. The Laplacian edge detection catches this — blurry images have low edge variance.


The Pipeline in Action

The full pipeline runs as a batch process that I kick off after pulling SD cards from the cameras.

function processImageBatch(directory) {
  var imageFiles = fs.readdirSync(directory).filter(function (f) {
    return /\.(jpg|jpeg|png)$/i.test(f);
  });

  console.log("Processing " + imageFiles.length + " images...");

  var results = {
    total: imageFiles.length,
    withAnimals: 0,
    species: {},
    highQuality: [],
    errors: 0,
  };

  var processQueue = imageFiles.map(function (file) {
    return path.join(directory, file);
  });

  function processNext(index) {
    if (index >= processQueue.length) {
      return Promise.resolve(results);
    }

    var imagePath = processQueue[index];

    return detectAnimals(imagePath)
      .then(function (detection) {
        if (!detection.hasAnimal) {
          return processNext(index + 1);
        }

        results.withAnimals += 1;

        return classifySpecies(imagePath, detection.topPrediction.bbox)
          .then(function (classification) {
            if (!results.species[classification.species]) {
              results.species[classification.species] = 0;
            }
            results.species[classification.species] += 1;

            return scoreImageQuality(imagePath).then(function (quality) {
              if (quality.overall > 60) {
                results.highQuality.push({
                  path: imagePath,
                  species: classification.species,
                  behavior: classification.behavior,
                  quality: quality.overall,
                  count: classification.count,
                });
              }

              var destDir = path.join(
                OUTPUT_DIR,
                classification.species.toLowerCase().replace(/\s+/g, "_")
              );
              if (!fs.existsSync(destDir)) {
                fs.mkdirSync(destDir, { recursive: true });
              }
              fs.copyFileSync(
                imagePath,
                path.join(destDir, path.basename(imagePath))
              );

              if (index % 100 === 0) {
                console.log(
                  "Processed " + index + "/" + processQueue.length +
                  " (" + results.withAnimals + " animals found)"
                );
              }

              return processNext(index + 1);
            });
          });
      })
      .catch(function (err) {
        console.error("Error processing " + imagePath + ": " + err.message);
        results.errors += 1;
        return processNext(index + 1);
      });
  }

  return processNext(0);
}

A typical weekly run looks like this: 4,000 images in, 300-500 animal detections, 15-30 high-quality photographs. The processing takes about 45 minutes on my local machine. Most of that time is the COCO-SSD inference — the API calls to GPT-4o for the ambiguous cases are actually the faster part.


What the Cameras Have Shown Me

Six months of running this system has produced a wildlife dataset I never could have built manually. Some findings that genuinely surprised me:

Moose are on my property far more than I realized. I see moose maybe once or twice a week in person. The cameras show they're crossing through almost every night, usually between 2 and 4 AM. There's a cow with twins who follows the exact same path along the creek bed every single night. I never would have known without the cameras.

The lynx is nocturnal and consistent. I have one lynx (possibly two — the classifier has trouble distinguishing individuals) that appears every three to four days. Always at night, always traveling east to west. The image quality is terrible because of the infrared flash, but the consistency of the sighting data is remarkable.

Bears trigger more false positives than any other species. When the classifier isn't sure, it often guesses "bear." Dark shadows, stumps, even a garbage bag that blew across the yard once — all classified as bears. I had to add a secondary validation step specifically for bear detections with confidence below 0.7.

Eagles are almost impossible to capture on trail cameras. They move too fast and are usually too far from the camera. I have hundreds of "bird" detections that the quality scorer correctly flags as useless — blurry shapes against sky. For raptor photography, trail cameras are the wrong tool entirely.


The Failed Experiments

Not everything worked, and I think the failures are more instructive than the successes.

Real-time alerting was a bust. I tried setting up a cellular trail camera that would send images to my pipeline in real time, with push notifications when something interesting showed up. The cellular connection in Caswell Lakes is too unreliable. Images would queue up for hours and then dump all at once, and by then the animal was long gone. Plus the cellular data costs were absurd for image transmission.

Individual animal identification doesn't work at trail camera resolution. I tried training a model to distinguish between "the big bull moose" and "the smaller bull moose" based on antler shape and body proportions. Trail camera image quality is simply too low for this kind of fine-grained recognition. You need high-resolution photographs from consistent angles, which is the opposite of what trail cameras provide.

Automated composition scoring is mostly meaningless for trail cameras. I built a rule-of-thirds analyzer and a background complexity scorer. They consistently rated images as "poor composition" because, of course, a fixed camera bolted to a tree doesn't compose shots. I stripped these features out and kept only the technical quality metrics (sharpness, exposure, resolution) that are actually informative.


Building a Searchable Archive

The classified images get stored in a directory structure organized by species, but I also built a simple web interface for browsing and searching the archive.

var express = require("express");
var app = express();

var db = [];

function addToArchive(entry) {
  db.push({
    id: db.length + 1,
    path: entry.path,
    species: entry.species,
    behavior: entry.behavior,
    quality: entry.quality,
    count: entry.count,
    date: extractDateFromFilename(entry.path),
    camera: extractCameraId(entry.path),
    isNight: entry.isNight || false,
  });
}

app.get("/api/search", function (req, res) {
  var species = req.query.species;
  var minQuality = parseInt(req.query.minQuality) || 0;
  var camera = req.query.camera;
  var startDate = req.query.startDate;
  var endDate = req.query.endDate;

  var results = db.filter(function (entry) {
    if (species && entry.species.toLowerCase() !== species.toLowerCase()) {
      return false;
    }
    if (entry.quality < minQuality) return false;
    if (camera && entry.camera !== camera) return false;
    if (startDate && entry.date < startDate) return false;
    if (endDate && entry.date > endDate) return false;
    return true;
  });

  results.sort(function (a, b) {
    return b.quality - a.quality;
  });

  res.json({
    total: results.length,
    results: results.slice(0, 50),
  });
});

It's nothing fancy — an in-memory array with filtering, no database required. The archive currently holds about 8,000 classified images across six months. I can search for "lynx photos with quality above 50 from camera 7 in January" and get results in milliseconds. Compared to scrolling through an SD card, it's transformative.


Cost and Practicality

Let me break down what this actually costs to run, because I think cost transparency matters in AI projects.

Hardware: Fourteen trail cameras at roughly $80 each, so about $1,120 total. These are Browning Strike Force cameras — nothing fancy, but reliable in extreme cold.

Compute: All inference runs locally on my desktop machine (RTX 3070). No cloud GPU costs. The TensorFlow.js inference is CPU-bound but fast enough for batch processing.

API costs: Roughly $20/month for GPT-4o vision calls on the ambiguous detections. This could be reduced further by improving the local classifier, but honestly twenty bucks a month for species identification that I'd otherwise do manually feels like a bargain.

Storage: The images consume about 15GB per month. A 2TB external drive handles a year with room to spare. Total cost: $60.

My time: About 30 minutes per week swapping SD cards and kicking off the pipeline. Down from 4-6 hours of manual sorting. This is the real savings.


What I'd Do Differently

If I were starting this project from scratch today, I'd change a few things.

I'd invest more time in building a proper labeled dataset early on. The iNaturalist data was helpful but the domain gap between professional wildlife photography and trail camera images is significant. A purpose-built training set of 500+ labeled trail camera images per species would make the local classifier good enough to eliminate the API dependency entirely.

I'd use a more modern detection model. COCO-SSD was the path of least resistance, but YOLOv8 or RT-DETR would give better detection accuracy with comparable speed. The COCO-SSD model's limited animal vocabulary is its biggest weakness.

I'd build the archive on SQLite instead of an in-memory array. It works fine now, but as the dataset grows past 20,000 images, I'll want proper indexing and persistent storage.


The Intersection of Technology and Place

There's something I want to say about this project that goes beyond the technical details. I built it because I live in a place where wildlife is part of daily life, not a novelty. The moose that walk through my yard aren't nature documentary subjects — they're neighbors. Occasionally annoying neighbors who eat my landscaping, but neighbors nonetheless.

Computer vision didn't change my relationship with the wildlife around me. What it did was make the invisible visible. The nighttime movements I never saw, the patterns I never noticed, the species passing through that I had no idea were there. Fourteen cameras running for six months revealed a world happening on my own property that I was completely unaware of.

That's what I find most compelling about these kinds of projects. Not the technology itself, but what the technology lets you see. The moose have been walking that creek bed at 3 AM for years. The lynx has been crossing my property every four days probably since before I moved here. The data was always there. I just didn't have the tools to collect and process it.

Now I do. And that feels worth the twenty bucks a month.


Shane Larson is a software engineer, author, and the founder of Grizzly Peak Software. He writes code and classifies wildlife from a cabin in Caswell Lakes, Alaska, where the moose outnumber the humans and nobody complains about the Wi-Fi.

Powered by Contentful