Use Celedog Auto-routing for Cost Optimization

Auto-routing is a 1-line change that lets Celedog pick the right concrete model per request. Cheap models handle simple prompts, strong models handle hard ones, fallback handles outages.

Auto-routing replaces a fixed model name with a strategy, and lets Celedog pick the concrete model per request. Done right it trims 20–40% off an inference bill with no measurable quality drop, because most requests are simple enough that a budget model handles them perfectly.

The idea

Instead of hard-coding "gpt-5.5", you send a meta-model such as "celedog/auto-cheapest". The gateway evaluates the request and routes it to the lowest-cost model that clears your quality bar, falling back automatically if an upstream errors. You keep one model string in your code and let the gateway optimise.

Using it

resp = client.chat.completions.create(
    model="celedog/auto-cheapest",
    messages=[{"role": "user", "content": "Summarise this email in one line."}],
)
print(resp.model)        # the concrete model that actually served it
print(resp.choices[0].message.content)

Note resp.model: Celedog tells you which concrete model served each request, so the routing decision is auditable rather than a black box.

Choosing a strategy

  • auto-cheapest — lowest cost that meets the quality bar. Best default for high-volume, simple tasks (classification, extraction, short summaries).
  • auto-fastest — lowest latency. Use for interactive UIs where time-to-first-token matters most.
  • auto-quality — highest-quality model. Reserve for hard reasoning, long-form generation, or agentic coding.

A practical pattern: tier by task

Most apps have a mix of easy and hard calls. Route the easy ones (formatting, tagging, routing user intent) through auto-cheapest and the hard ones (final answer, code generation) through auto-quality. You pay premium prices only where they earn their keep.

Measuring the win

Use the Settlement / usage dashboard to compare cost-per-request before and after. The typical result: a large share of traffic silently moves to a budget model, the bill drops, and user-facing quality is unchanged because those requests never needed a frontier model.

Auto-routing is the single highest-leverage cost lever a gateway offers: one string change, a fifth to two-fifths off the bill, and full visibility into which model served each call.

Written by · Last updated May 28, 2026

Where to go next