Most conversations about AI and software engineering still get stuck in the wrong place.

One side asks whether developers are dead. The other side answers by showing how fast an agent can generate code. Both miss the more interesting shift.

AI does not only change how code is written. It changes where engineering leverage sits.

When implementation gets cheaper, the bottleneck moves. The hard part becomes deciding what should be built, giving the system enough context to build it correctly, and proving the result is safe enough to ship or accept.

That is the new AI-native engineering model:

AI-native engineering =
  product loop
  + context loop
  + harness loop

Product Engineering chooses the right problem, Context Engineering gives agents the right information, and Harness Engineering proves the result is safe enough to accept.

These are not always three separate jobs. In a small team, one senior engineer may own all three. In a larger company, they may become different roles, platforms, or operating models. The important point is that all three loops need to exist.

If one is missing, AI systems become expensive demos.

1. Product Engineering: Choosing the Right Problem

The product loop asks:

Are we solving the right problem?

This is where AI makes product judgment more valuable, not less valuable.

In the old ticket-taking model, a request moves from stakeholder to product manager to engineer. The engineer receives a ticket, clarifies a few details, implements the change, and ships it. That model already had problems, but AI makes the weakness more visible.

If an agent can generate the implementation quickly, then building the wrong thing also becomes faster.

The product loop is the work before and around the code:

  • talking to customers or operators;
  • understanding the workflow;
  • asking why this matters now;
  • finding the actual pain behind the request;
  • defining a hypothesis;
  • choosing the smallest useful slice;
  • deciding what metric or signal will prove the work helped;
  • writing acceptance criteria that connect behavior to value.

The Product Engineering loop turns a vague request into a validated problem slice with a clear success signal.

This is why product engineering can be both a loop and a role.

Some engineers naturally like this work. They talk to customers. They ask what the team is building, why it matters, when it should be built, and how success will be measured. They are close to product managers, product owners, designers, support teams, and operational users. They are not “full-stack engineers with better soft skills.” They are engineers who treat workflow understanding as part of the system.

Consider a support-heavy internal tool.

The incoming request says:

Add an AI assistant that answers customer questions.

A ticket-taking implementation might start with a chat UI, a vector database, and a model call. It may look impressive in a demo. But the product loop asks different questions:

  • Which customer questions create the most operational load?
  • Are operators spending time searching, reconciling, drafting, approving, or escalating?
  • Which answers are safe to automate and which require human review?
  • What is the first workflow where better support would measurably reduce cycle time?
  • What should remain deterministic code instead of an LLM decision?
  • What does “good enough” mean for the first release?

The first useful version may not be a full assistant. It may be a narrow triage tool that classifies tickets, attaches relevant account context, suggests the next action, and routes risky cases to a human. It may use disposable prototype code at first. That is fine if the prototype is used to validate the hypothesis, not to pretend the system is production-ready.

The product loop turns AI from “we can generate something” into “we know what outcome we are trying to change.”

2. Context Engineering: Giving Agents the Right Information

The context loop asks:

What does the agent need to know to do this work correctly?

Context engineering is often reduced to prompt engineering or retrieval. That is too small.

Useful agents need maintained context layers. Not just a bigger context window. Not just a pile of documents. Not just “read the repo and figure it out.”

In Project Memory for AI Agents, I wrote about one concrete version of this: a maintained product memory layer around pages, workflows, API behavior, business rules, incidents, and decisions. The broader point is that context engineering is not a one-time prompt-writing activity. It is the design of the memory an agent depends on.

In real systems, the agent needs different kinds of context for different tasks:

  • product context: what the feature is supposed to do and why it exists;
  • workflow context: how users, operators, and systems move through the process;
  • business-rule context: permissions, exceptions, thresholds, state transitions;
  • code context: architecture, patterns, contracts, APIs, test strategy;
  • evidence context: tickets, logs, incidents, metrics, screenshots, traces;
  • team context: owners, rollout rules, review expectations, operational constraints.

The hard part is not collecting all of this once. The hard part is maintaining it.

A serious context loop needs a lifecycle:

ingest -> structure -> retrieve -> use -> update -> prune

Context Engineering keeps product memory usable by structuring, retrieving, updating, and pruning it continuously.

Ingestion brings in raw material: support tickets, call notes, product decisions, incidents, observability evidence, docs, PRs, and user feedback.

Structure turns that material into usable memory. The context should mirror the product: pages, workflows, features, API boundaries, business rules, recurring incident types. A random dump of documents is not memory. It is storage.

Retrieval makes the right slice available for the task. The billing agent should not read the whole company wiki. The export-debugging agent should see export behavior, export incidents, relevant endpoints, recent deploys, and the acceptance criteria for the fix.

Use means the agent actually works with the context instead of treating it as background decoration. It should cite assumptions, apply rules, and produce output that a human can review.

Update closes the loop. If a support ticket reveals a recurring product behavior, that durable lesson should be written back into the memory layer.

Pruning is the part teams underestimate.

Bad pruning deletes context mechanically because the file is old or the token budget is high. Good pruning compresses raw evidence into durable knowledge. Ten similar incidents may become one rule, one known failure mode, and one link to representative evidence. A stale decision should be marked as superseded with the reason, not silently removed.

This matters even more when agents start replacing pieces of operational work.

If an agent handles support triage, onboarding, compliance review, reconciliation, sales research, or internal reporting, then context is no longer “documentation.” It becomes part of the operating system of the team.

The agent’s quality depends on whether the team maintains the memory it runs on.

3. Harness Engineering: Proving the Work Is Safe Enough

The harness loop asks:

How do we know this AI-generated result is correct enough to accept?

This is where many AI workflows are still weak.

Teams let an agent generate code, read the diff, maybe run the app, and then rely on human review to catch everything. That does not scale. It also puts the human in the worst possible position: reviewing a large change with limited confidence about what was tested.

Harness engineering is the work of building rails around agents.

For a codebase, the harness includes:

  • clear AGENTS.md instructions;
  • setup scripts and reproducible environments;
  • fast local verification commands;
  • strict types and contracts;
  • runtime validation at trust boundaries;
  • unit, integration, contract, and E2E tests;
  • affected-test selection;
  • CI gates;
  • review artifacts like screenshots, logs, and before/after evidence.

For agent platforms, the harness also includes:

  • custom agents for specific workflows;
  • skills and MCP servers that expose the right tools;
  • eval datasets;
  • observability for agent runs;
  • approval gates;
  • rollback paths;
  • audit logs;
  • dashboards that show where automation is helping or failing.

The goal is to move from a slow loop to a fast loop.

Harness Engineering turns agent work from runtime discovery into cheap, local, repeatable signals before human review.

Without a harness, the agent guesses which command matters, starts the app to discover basic failures, uses runtime behavior as the first real signal, and leaves review to catch project assumptions late.

With a harness, the project tells the agent what to run. Types catch impossible shapes. Linters catch mechanical mistakes. Contract tests catch boundary issues. Focused tests catch behavior regressions. CI remains the final judge.

Human review does not disappear. It moves up the stack.

Instead of spending most of the review on “does this compile?” or “did it miss an import?”, the human can focus on judgment:

  • Is this the right behavior?
  • Did the agent respect the product constraint?
  • Are the edge cases covered?
  • Is the rollout safe?
  • Does this change make the system easier or harder to operate?

This is closely related to the argument in Make Your Project Agent-Ready: AI coding agents expose the real quality of a codebase. The harness is the wider version of that idea. It includes the codebase, but also the agent setup, tools, evals, observability, review gates, and feedback capture around the work.

Harness engineering is why “human-in-the-loop” should not be treated as a fallback. It is an architecture.

How the Three Loops Work Together

The loops are strongest when they feed each other.

Imagine the team wants to reduce support time for failed exports.

The product loop defines the target:

  • failed exports create too many support escalations;
  • the first goal is to reduce diagnosis time, not automate every fix;
  • success means operators can identify the likely cause and next action in minutes;
  • risky account changes still require human approval.

The context loop supplies the memory:

  • known export states;
  • recurring failure modes;
  • permission rules;
  • previous incidents;
  • relevant logs and traces;
  • examples of good support resolutions;
  • product decisions about what should and should not be automated.

The harness loop defines acceptance:

  • test fixtures for common export failures;
  • eval cases for ticket classification;
  • contract checks for API boundaries;
  • observability for suggested actions;
  • a human approval gate for account-impacting changes;
  • review artifacts attached to every automated suggestion.

After release, the loop continues.

Operator feedback updates the product hypothesis. New incidents update context. Failed evals become new harness cases. Metrics show whether diagnosis time actually improved.

That is AI engineering as a closed loop, not a one-time code generation event.

Where This Model Stops Applying

Not every script needs a full AI engineering operating model.

If you are building a weekend prototype, a one-off migration, or a small internal utility with low risk, the right move may be simple: prompt the agent, inspect the result, run the obvious checks, and move on.

The three-loop model becomes important when the work is recurring, business-critical, customer-facing, or operationally risky. It matters when multiple people or agents need to work inside the same system. It matters when mistakes create support load, revenue impact, data risk, or broken trust.

There is also a sequencing risk.

Teams can overbuild the harness before proving the product problem. They can create elaborate context infrastructure for workflows that do not matter. They can turn product discovery into endless process instead of shipping a thin slice.

The point is not to add ceremony. The point is to know which loop is currently the bottleneck.

Sometimes the bottleneck is product judgment. The team is automating the wrong workflow.

Sometimes it is context. The agent has tools, but not the product memory needed to use them well.

Sometimes it is harness. The agent can produce plausible work, but the team cannot verify it cheaply enough to trust it.

A Practical Checklist

Before calling a workflow AI-native, ask three sets of questions.

Product loop:

  • Who is the user or operator?
  • What pain are we reducing?
  • What hypothesis are we testing?
  • What is the smallest useful slice?
  • What metric or signal will tell us it worked?
  • What should stay human-owned?

Context loop:

  • What does the agent need to know before touching tools or code?
  • Where does that context live?
  • How is it updated?
  • How is stale context pruned?
  • What raw evidence should be compressed into durable memory?
  • Which context is task-specific and which is system-level?

Harness loop:

  • What checks run before a human reviews the output?
  • What tests, evals, contracts, or approval gates protect the workflow?
  • What evidence does the agent attach to its result?
  • How do we observe failures in production?
  • How do failures become new tests, evals, or context updates?

This is the part of AI engineering that will become more valuable as code generation improves.

The engineer’s job is not only to produce code. It is to shape the problem, maintain the memory, and design the verification loop that makes AI work usable in a real system.

Developers are not dead.

Developers who only translate tickets into code are becoming easier to replace.

The future senior engineer owns the loops around the code.