AI Integration & Development

AI Coding in Teams: Why Source Control, CI/CD, and Shared Design Matter More Now, Not Less

Shane Larson

Sun May 10 2026 17:13:00 GMT+0000 (Coordinated Universal Time)

AI made individuals faster and team delivery flat. Here are the engineering disciplines that separate teams that ship from teams that drown.

There is a story going around that AI coding tools have made traditional software engineering disciplines optional. That source control hygiene is fussy overhead, that CI/CD is for organizations that don't trust their developers, that architecture documents are bureaucratic theater, and that the new way to ship software is to point an agent at a problem and let it cook.

The story is half right. AI tools have genuinely changed how individual engineers work. They have not changed how teams of engineers ship software, and the data on what happens when teams pretend otherwise is now in.

The DORA 2025 State of AI-assisted Software Development Report surveyed nearly 5,000 developers and pulled telemetry from over 10,000 more across 4,000+ teams. Adoption is essentially universal at 95%. Individual productivity gains are large and real: 21% more tasks completed, 98% more pull requests merged, 80%+ self-reported gains. And the team-level delivery metrics? Flat. Or worse. In some segments of the Faros telemetry, incidents per pull request rose by as much as 242%. Stalled work climbed 26%. Only about 24% of developers report a "great deal" of trust in the output of the tools they use every day.

The DORA authors call this the AI Productivity Paradox. Their explanation is uncomfortable and correct: AI is not a productivity tool, it is an amplifier. Healthy teams ship more good software. Unhealthy teams ship more bad software faster.

This piece is about which engineering practices stop being negotiable when AI is in the mix.

The single-player game is not the team game

A single engineer with Claude Code or Cursor on a Saturday afternoon is in a single-player game. The whole codebase fits in their head. There is nobody else committing. There is no review queue, no deployment process, no on-call rotation. Velocity is bounded only by typing speed and judgment.

Twelve engineers with the same tools, against the same codebase, are in a team game. The bottleneck is not typing speed. It never was. The bottleneck is coordination, coherence, and the cost of misunderstanding. AI does not make any of those bottlenecks smaller. In most teams, it makes them larger, because the throughput at the typing step has gone up while the throughput at every other step has stayed flat.

The widely-circulated post by Rishabh Kumar titled "Copilot Made Me Faster — And Quietly Broke My Team" documents the pattern in the cleanest form I have seen. Kumar championed the rollout. He wrote the leadership proposal that got the licenses approved. Several months in, he found himself in a room with four engineers who could not explain functions they had merged into production. The functions passed tests. The features worked. Nobody on the team could tell him, when asked, what the code actually did. He wrote:

I reverted it because four engineers, sitting with me in a room, could not tell me what their own functions actually did.

That sentence is the whole problem in one line. The AI did not generate bad code. It generated code that worked. The team simply stopped being able to reason about its own system. You cannot operate a system you do not understand. You can only react to it.

The disciplines below are how teams avoid arriving in that room.

Source control discipline becomes load-bearing

Pre-AI, sloppy git habits were a mild annoyance. Everyone knew which engineers wrote 3,000-line commits with the message "wip" and quietly worked around them in code review. The blast radius was bounded by how fast a human could write that much code.

Post-AI, the same engineer can produce a 3,000-line commit in twenty minutes and a 12,000-line one before lunch. The blast radius is now whatever the team's review capacity is, which is the same as it was last year, which means the queue depth grows until somebody starts approving things they have not actually read.

The discipline is not new. The stakes are.

Concretely:

Atomic commits. Each commit does one thing and the commit message says what. If the AI produced fifteen unrelated changes in one pass, that is fifteen commits, not one. The reviewer needs to be able to bisect this in three months when something breaks.
Small pull requests. Hard cap somewhere between 200 and 400 lines for normal review. Above that the reviewer's attention drops measurably and they start trusting the diff instead of reading it. AI makes it easy to blow through this; the team has to push back deliberately.
Branch hygiene. One branch, one purpose. Long-lived feature branches with AI-generated experiments piled on top become unmergeable in a way that is genuinely worse than the human equivalent, because nobody can reconstruct intent from the commits.
Repository trust boundaries. This one is new and most teams have not absorbed it yet. Check Point Research disclosed in February 2026 that malicious .claude/settings.json hooks could trigger remote code execution simply by opening a cloned repository (CVE-2025-59536 and related). MCP server configs could bypass trust dialogs. ANTHROPIC_BASE_URL could be hijacked to proxy API keys. The Check Point researchers wrote: "Developers inherently trust project configuration files — they're viewed as metadata rather than executable code, so they rarely undergo the same security scrutiny as application code during code reviews." That trust is no longer safe. Treat AI tool config files as executable code in review. CI should scan for them on every PR.

CI/CD becomes the immune system

The single most important shift is that CI/CD has stopped being a quality enhancement and become the primary defense against confidently-wrong code.

GitClear's 2025 analysis of 211 million changed lines of code across Google, Microsoft, Meta, and enterprise repos found that code clones grew roughly 4× between 2021 and 2024, from 8.3% to 12.3% of changed lines. Copy-pasted code overtook genuinely refactored code for the first time in the dataset's history. Refactoring as a share of all changes dropped from about 25% to under 10%.

Translation: more code is shipping, more of it is duplicated, less of it is being cleaned up, and review velocity has not kept pace. The only thing standing between this output and production is the automated pipeline.

Minimum viable immune system for a Node.js team:

# .github/workflows/ci.yml — illustrative
name: CI
on: [pull_request]
jobs:
  verify:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: '22', cache: 'npm' }
      - run: npm ci
      - run: npm run lint        # eslint + prettier
      - run: npm run typecheck   # tsc --noEmit, even in JS projects via JSDoc
      - run: npm test -- --coverage
      - run: npm audit --audit-level=high
      - run: npx gitleaks detect --source . --no-git
      - run: npx better-npm-audit audit

That pipeline is unremarkable. The point is that every line of it is now non-optional. Lint catches the small drifts in style that signal generated code nobody read. Type checking catches the made-up function signatures the AI confidently invents. Tests catch the happy-path-only logic the AI is biased to produce. Secret scanning catches the API keys the AI helpfully hardcoded into the example config. Dependency scanning catches the typo-squatted package the AI plausibly suggested.

If a team does not have all of these gates, AI tooling is going to find the gaps for them, at scale, in production.

Scope discipline or the AI builds the wrong thing, beautifully

AI assistants are exceptionally good at executing on a stated request. They are exceptionally bad at noticing when the request is wrong. They will produce a perfectly competent implementation of a feature that should not exist, gold-plate a configuration system that solves a problem the team does not have, or refactor a module the ticket did not ask them to touch.

The fix is not better prompts. The fix is that the team agrees on scope before any prompt is written.

A scope brief that an AI agent and a human can both work from looks roughly like this:

GOAL: Add cursor-based pagination to /api/articles list endpoint.

IN SCOPE:
- New optional `cursor` and `limit` query params on GET /api/articles
- Encode cursor as opaque base64 of (published_at, id)
- Update OpenAPI spec
- Unit tests for cursor encode/decode and the SQL query

OUT OF SCOPE:
- Changing existing offset pagination on other endpoints
- Caching layer changes
- Frontend consumer updates (separate ticket)

CONSTRAINTS:
- Must not introduce new dependencies
- Query must remain index-friendly on (published_at DESC, id DESC)
- Backwards compatible: requests without cursor still work

That brief is the same thing a senior engineer would have written for a junior in 2018. It is not new. What is new is that without it, the agent will quietly expand the work, and the resulting PR will be a 1,400-line sprawl that touches three modules and changes the cache invalidation behavior because it "looked related." The team will then spend the review cycle relitigating the scope they failed to define up front.

The minimum coherent change principle: the smallest change that delivers the intended user value, and nothing else, no matter how much the AI volunteers.

The blueprint problem

The hardest failure mode in AI-assisted teams is not bad code. It is decisions made in a vacuum.

When twelve engineers each have an AI assistant and there is no shared architectural blueprint, every engineer is making local design decisions independently. The AI is biased toward producing whatever pattern is most common in its training data, which is rarely the pattern the team has standardized on. Within three months the codebase contains four authentication patterns, three logging conventions, two competing data access layers, and a quiet drift toward whichever framework idioms the AI has the most exposure to.

The fix is the blueprint. A shared, written, version-controlled design document that says: this is the system, these are the boundaries, these are the patterns we use, these are the patterns we have rejected and why. Not a 200-page tome. A working artifact. ADRs (Architecture Decision Records) are the cleanest format I have seen for this:

docs/adr/
  0001-use-postgres-not-mongo.md
  0002-cursor-pagination-pattern.md
  0003-auth-jwt-hs256-pinned.md
  0004-no-orm-direct-pg-driver.md

Each one is a page. Context, decision, consequences. Cheap to write, expensive to skip.

The critical point for the AI era: these documents are now machine context, not just human documentation. Reference them in your CLAUDE.md or equivalent. Make them part of the prompt context for every non-trivial task. The AI will happily follow conventions if it can see them. It will happily invent new ones if it cannot.

When a team is making a real architectural decision, the deliverable is an ADR before any code is written. The AI gets to help draft it, like any other team member, but the decision is human and the artifact is durable.

Software engineering, EA, and SA are still alive

The current discourse pushes hard on the idea that AI is collapsing the distinctions between engineering, enterprise architecture, and solution architecture. That the disciplines were always overhead, that the AI subsumes them, that the future is direct intent-to-code with no intermediate roles.

This reads as plausible to people who have never worked in an enterprise. It does not survive contact with one.

Software engineering is the discipline of decomposing problems, designing systems, and producing code that is correct, maintainable, and intelligible to other humans. The AI does not do this. It produces code. The discipline of deciding what code to produce, why, and how it will be maintained over five years still requires engineers who think.

Enterprise architecture is the discipline of mapping organizational concerns onto technical systems. Vendor strategy. Build-vs-buy. Data residency. Compliance posture. Coherence across dozens of systems owned by dozens of teams. The AI cannot do this work because the inputs are not in the codebase. They live in legal contracts, executive priorities, three-year-old M&A integration decisions, and the quiet understanding that the data team is going to migrate platforms next year. EA is the work of making those constraints visible to every team, including the teams whose AI assistants are confidently generating code that violates them.

Solution architecture is the discipline of taking a specific business problem and choosing the right combination of services, integration patterns, data flows, and operational tradeoffs to solve it within the existing landscape. The AI will produce a perfectly reasonable architecture for a problem it has seen a thousand times. It will produce a confidently-stated, plausible architecture that is wrong for your specific situation, and it will not flag the difference because it cannot see the difference.

These disciplines have not been commoditized. They have been leveraged. A senior engineer with a clear design in their head and a strong sense of the codebase now produces in a day what used to take a week. An architect who can document and disseminate decisions across the organization can keep more teams aligned. A domain expert can prototype against the real edge cases faster than ever.

The teams cutting these roles in the belief that AI replaces them are, in many cases, currently discovering that the AI was making the seniors faster, not making them unnecessary. The architectural debt accumulating in the meantime is not visible in this quarter's metrics. It will be visible in the rewrite that nobody planned for, eighteen months from now.

The short version

If the team is going to use AI coding tools at scale, and most teams are, the practices below are no longer optional.

Source control hygiene. Atomic commits, small PRs, branch discipline, and treat AI tool config files (.claude/, .cursorrules, MCP configs) as executable code in review.
CI/CD as life support. Lint, type check, tests with coverage, secret scanning, dependency scanning, security audits. Every gate. Every PR. Non-negotiable.
Scope discipline. Write the brief before the prompt. Define what is out of scope as carefully as what is in. Reject sprawling PRs even when the work looks done.
Shared blueprint. ADRs in the repo, referenced from your AI tool config, treated as machine context as much as human documentation. Decide architecture as a team; let the AI execute.
Defend the senior roles. Engineering, enterprise architecture, solution architecture. The AI amplifies these disciplines. It does not replace them. Cut them and the leverage cuts the wrong way.

AI did not kill software engineering. It raised the cost of doing it badly. The teams that ship in 2026 and beyond will be the ones that understood the difference early enough to invest in the disciplines while the rest of the industry was busy declaring them obsolete.

AI Coding in Teams: Why Source Control, CI/CD, and Shared Design Matter More Now, Not Less

The single-player game is not the team game

Source control discipline becomes load-bearing

CI/CD becomes the immune system

Scope discipline or the AI builds the wrong thing, beautifully

The blueprint problem

Software engineering, EA, and SA are still alive

The short version

Quick Links

Recent Articles

Need Expert Help?

Designing Solutions Architecture for Enterprise Integration: A Comprehensive Guide

The single-player game is not the team game

Source control discipline becomes load-bearing

CI/CD becomes the immune system

Scope discipline or the AI builds the wrong thing, beautifully

The blueprint problem

Software engineering, EA, and SA are still alive

The short version

Quick Links

Recent Articles

Need Expert Help?