Blog

The right model for application logic is not always the smartest one


LLMs are becoming part of application logic

More and more, developers are using LLMs not just as assistants but as part of application logic. In some cases, the model is not sitting outside the product but taking on work that would previously have been handled by code, rules, or narrower algorithms.

Think of this as LLM-as-backend. The model is not a tool the user reaches for. It is a component in a system, doing a job that the system depends on.

This represents a shift similar to earlier moves in software abstraction. Developers stopped writing everything at the lowest possible level and started working in higher-level primitives instead. The model becomes one of those primitives. It handles a layer of work that would otherwise require explicit logic.

What makes this shift feel real now is that models have become good enough, cheap enough, and fast enough to absorb narrow product logic that teams would previously have implemented by hand.

The point is not to introduce a catchy label. The point is that once the model is part of backend logic, the criteria for choosing it should also change.

Once a model becomes backend logic, the question changes

If a model is part of your application logic, the question is no longer which model is most generally intelligent. It becomes: which model can do this specific job well enough for the system?

That is how developers already think about the rest of a software stack. They do not always choose the most powerful possible component in the abstract. They choose the component that is appropriate for the role, the constraints, and the operating environment.

A database is chosen for its access patterns, not its theoretical maximum throughput. A queue is chosen for its delivery guarantees, not its feature count. A cache is chosen for its eviction behavior, not its vendor reputation. In hindsight, it seems obvious that the same reasoning should apply to models.

General intelligence is not always the relevant metric. Fit for the job is.

Many product tasks are narrower than general intelligence

Many product tasks are not open-ended research problems. They are closer to bounded tasks: constrained transformations, task-specific classification, narrow domain assistance, structured decision support, repeated implementation help, workflow-specific logic, predictable input-to-output mapping.

If you are building a feature that extracts structured data from a consistent document format, that is a bounded task. If you are building a feature that classifies support tickets into a fixed taxonomy, that is a bounded task. If you are building a coding workflow that needs correct use of a specific library, reliable OAuth setup patterns, or a narrow class of refactors, that is also a bounded task.

For tasks like these, broad generality may matter less than consistency, cost, latency, controllability, and the ability to perform the same kind of work repeatedly and well. The task has a shape. The model should fit that shape.

Why this changes the case for smaller models

The case for smaller task-specific models is not just cost. That is true but too shallow.

The stronger case is task fit. A smaller model trained or fine-tuned for a specific recurring task may offer lower latency, more predictable cost, easier deployment, better fit for constrained environments, tighter control over behavior, and a more natural path into the product workflow.

When a model is a component in a system, it needs to behave like one. That means it should be reliable, observable, and operable. A model that is slightly less capable in general but significantly more consistent on a specific task may be the better engineering choice.

Cost matters, but it is a downstream consequence of fit, not the primary argument. The primary argument is that the model should match the layer of work it is being asked to do.

Frontier models for orchestration, smaller models for narrow execution

Frontier models are still the right choice for many things. Broad planning, ambiguous requirements, open-ended reasoning, multi-step orchestration, general-purpose coding help. These are areas where the breadth and depth of a frontier model genuinely matters. That is not a concession. It is an important part of the real picture.

The contrast is between broad orchestration versus narrow execution. A frontier model may be the right choice for the orchestration layer of a coding agent, where it needs to reason about a codebase, plan a sequence of changes, and handle ambiguous requirements. But a smaller task-specific model may be enough for correct use of a specific library, OAuth setup patterns, narrow refactor classes, or constrained implementation tasks.

These are not competing claims. They are different layers of the same system, each with different requirements. The question is not which model wins. The question is which model fits which layer.

Over-specification is a real engineering mistake

Sometimes using a frontier model for a narrow product task is not wrong, but over-specified. The task does not need that much breadth. The product does not benefit enough from that extra generality. The system is paying for capability it does not meaningfully use.

Developers already understand over-specification in other contexts. Not every service needs the most complex infrastructure. Not every component needs the most flexible architecture. Not every feature needs the most powerful tool. Choosing a tool that is more capable than the task requires is a recognizable engineering mistake, not a safe default.

The cost and latency tradeoff may not justify it. The operational complexity may not justify it. And in some cases, the extra generality of a frontier model can actually make behavior less predictable for a narrow, well-defined task.

Fit still matters. It always has.

The right model is the one that fits the layer of work

Instead of asking which model is best, the developer should increasingly ask: what level of capability does this part of the system actually need? What are the real constraints? What quality bar does this task require? What model makes this feature practical to operate?

The right layer of capability is not the highest available layer. It is the layer that matches the work. A model that is fit for the job, not just fit for the leaderboard, is the model that belongs in that part of the system.

This framing also makes it easier to reason about a system as a whole. Different layers have different requirements. Some need broad reasoning. Some need narrow reliability. Matching the model to the layer is not a compromise. It is good system design.

As LLMs become part of normal software, model choice should start to resemble other engineering decisions. Not 'what is the most capable option in the abstract,' but 'what level of capability does this part of the system actually need?' In many cases, that question leads not to the smartest model available, but to the model that fits the job.