Last verified: June 13, 2026
Claude Opus 4 (claude-opus-4-20250514) and Claude Sonnet 4 (claude-sonnet-4-20250514) are deprecated and retire on June 15, 2026, after which requests to them return a 404. The official replacements are claude-opus-4-8 and claude-sonnet-4-6. But swapping the model string alone will break a working integration: depending on which target you choose, several request parameters that were valid on the May 2025 models now return a 400 error, and two changes alter behavior silently. This page maps each removed or changed parameter to the exact failure and the fix.
One distinction governs the whole migration. The Opus path (to claude-opus-4-8) is the strict one: it removes temperature/top_p/top_k and manual thinking budgets entirely. The Sonnet path (to claude-sonnet-4-6) is gentler: it keeps sampling parameters (with the older “one of temperature or top_p, not both” rule) and still accepts budget_tokens as deprecated-but-functional. The one rule both paths share: assistant-turn prefills now return 400.
Each row is a change that breaks on at least one migration target. “Error” means the API rejects the request server-side (HTTP 400) even though the SDK request type still type-checks. “Silent” means no error — the behavior simply differs.
Change
On Opus 4.8
On Sonnet 4.6
Symptom
Fix
thinking: {type:"enabled", budget_tokens:N}
400 error (removed)
Deprecated, still works
400 on Opus; cost/latency drift on Sonnet
thinking: {type:"adaptive"} + output_config.effort
temperature / top_p / top_k
400 error (removed)
Keep only one of temperature or top_p
400 on Opus if any set; 400 on Sonnet if both set
Remove on Opus; steer via prompt. Keep one on Sonnet
Assistant-turn prefill (last message role:"assistant")
400 error
400 error
Request rejected on both
output_config.format (structured outputs) or system-prompt instruction
thinking.display default
Defaults to "omitted"
Returns summarized text
Reasoning text empty on Opus (silent)
Set display: "summarized" on Opus
Tokenizer
New tokenizer (more tokens)
Unchanged tokenizer
Same text counts higher on Opus; max_tokens too tight
Re-baseline with count_tokens; add headroom
output_format (top-level)
Deprecated API-wide
Deprecated API-wide
Works, but slated for removal
Move to output_config: {format: {...}}
Retiring model
Model ID
Retires
Replacement
Claude Opus 4
claude-opus-4-20250514 (alias claude-opus-4-0)
June 15, 2026
claude-opus-4-8
Claude Sonnet 4
claude-sonnet-4-20250514 (alias claude-sonnet-4-0)
June 15, 2026
claude-sonnet-4-6
These are the original May 2025 models, not the later Opus 4.6 or Sonnet 4.5 releases. Use the exact replacement strings above — do not append a date suffix to claude-opus-4-8 or claude-sonnet-4-6 (they are dateless pinned snapshots).
The Opus path removes the fixed thinking budget. thinking: {type:"enabled", budget_tokens:N} returns a 400 on claude-opus-4-8. The replacement is adaptive thinking — the model decides how much to think per request — with overall depth controlled by the effort parameter (low | medium | high | xhigh | max). There is no direct token-count equivalent; effort is an output-level control, not a thinking budget.
# Before (Claude Opus 4 / Sonnet 4)
client.messages.create(
model="claude-opus-4-20250514",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": 10000},
messages=[{"role": "user", "content": "..."}],
)
# After (Claude Opus 4.8)
client.messages.create(
model="claude-opus-4-8",
max_tokens=16000,
thinking={"type": "adaptive"},
output_config={"effort": "high"}, # or "max", "xhigh", "medium", "low"
messages=[{"role": "user", "content": "..."}],
)On the Sonnet path, budget_tokens is deprecated but still functional on claude-sonnet-4-6, so it will not 400 — but you should still migrate to adaptive thinking. Note also that Sonnet 4.6 defaults to effort: "high" where Sonnet 4 had no effort parameter at all; if you do not set it explicitly you may see higher latency and token use after the swap.
This is where the two paths diverge most. On claude-opus-4-8, setting temperature, top_p, or top_k to any non-default value returns a 400. Remove them entirely and steer behavior through prompting instead. (If you used temperature=0 for determinism, note it never guaranteed identical outputs on prior models either.)
# Opus path — sampling params 400 on claude-opus-4-8
# Before
client.messages.create(
model="claude-opus-4-20250514",
temperature=0.7,
top_p=0.9,
messages=[...],
)
# After — remove them
client.messages.create(
model="claude-opus-4-8",
messages=[...],
)On claude-sonnet-4-6 the older Claude 4.x rule still applies: you may pass one of temperature or top_p, but passing both returns a 400. So a Sonnet 4 to Sonnet 4.6 move only requires dropping one of the two if you were setting both.
Prefilling the final assistant turn — ending your messages array with a role: "assistant" message to force a response shape — returns a 400 on both claude-opus-4-8 and claude-sonnet-4-6. This is the one breaking change you cannot dodge by choosing the gentler target. The replacement depends on what the prefill was doing.
Prefill was used for
Replacement
Forcing JSON / YAML / schema output
output_config.format with a json_schema
Forcing a classification label
A tool with an enum field, or structured outputs
Skipping preambles (“Here is…”)
System-prompt instruction: respond directly, no preamble
Continuing an interrupted response
Move continuation into the user turn
Steering around bad refusals
Usually unnecessary now — plain user-turn prompting suffices
# Before (fails on both targets) — prefill forcing JSON shape
messages=[
{"role": "user", "content": "Extract the name."},
{"role": "assistant", "content": "{\"name\": \""},
]
# After — structured outputs replace the prefill
client.messages.create(
model="claude-opus-4-8",
max_tokens=1024,
output_config={"format": {"type": "json_schema", "schema": SCHEMA}},
messages=[{"role": "user", "content": "Extract the name."}],
)On claude-opus-4-8, thinking blocks still stream, but their thinking text field is empty unless you opt in — the default is display: "omitted". There is no error; if your UI rendered the summarized reasoning, it now shows a long pause before output. Restore it by setting the display mode:
thinking = {
"type": "adaptive",
"display": "summarized", # default is "omitted" on Opus 4.8/4.7
}The block-field name is unchanged — it is still block.thinking on a thinking-type block. The fix is the request parameter, not the response-handling code. (Sonnet 4.6 is not affected by this default change.)
This change is Opus-only and easy to miss because it produces no error. claude-opus-4-8 uses the tokenizer introduced with Opus 4.7, under which the same text tokenizes to roughly 1x–1.35x as many tokens — up to about 35% more, around 30% on typical content, varying by workload. Three consequences:
What to check
Why
max_tokens ceilings and compaction triggers
The same output now consumes more tokens; tight limits truncate mid-thought
Client-side token estimators (e.g. fixed char-to-token ratios)
Calibrated against the old tokenizer; now undercount
Cost and rate-limit dashboards
count_tokens returns higher numbers; re-baseline before reacting
Re-run client.messages.count_tokens(model="claude-opus-4-8", ...) on a representative sample of your prompts. Do not apply a blanket multiplier. Sonnet 4.6 keeps the older tokenizer, so a Sonnet 4 to Sonnet 4.6 move has no tokenizer re-baseline to do.
Step
Opus 4 to 4.8
Sonnet 4 to 4.6
Update model ID string
Required
Required
Replace budget_tokens with adaptive thinking
Required (400)
Recommended (deprecated)
Sampling params
Remove all (400)
Keep only one (both 400)
Remove assistant-turn prefills
Required (400)
Required (400)
Set display: "summarized" if showing reasoning
Required for visible thinking
Not applicable
Re-baseline max_tokens for new tokenizer
Required
Not applicable
Set effort explicitly
Defaults to high
Defaults to high
Move output_format to output_config.format
Recommended
Recommended
Verify tool inputs parsed with a JSON parser
Recommended
Recommended
Spot-check one request, then roll out
Required
Required
If you run Claude Code, /claude-api migrate applies the model swap, breaking-parameter changes, prefill replacement, and effort calibration across a codebase, then produces a verify-it-yourself checklist. It asks you to confirm scope before editing any files.
No. Moving to claude-opus-4-8 also requires removing temperature/top_p/top_k and any budget_tokens (all now return 400), removing assistant-turn prefills (400), opting back into summarized thinking if your UI shows it, and re-baselining max_tokens for the new tokenizer. Only the Sonnet 4 to Sonnet 4.6 move is close to a drop-in — and even that requires removing prefills.
June 15, 2026. After that date, requests to claude-opus-4-20250514 and claude-sonnet-4-20250514 return a 404. These are the original May 2025 models, not Opus 4.6 or Sonnet 4.5.
Adaptive thinking (thinking: {type:"adaptive"}) plus the effort parameter inside output_config. There is no exact token-count equivalent: the model decides how much to think per request, and effort (low through max) tunes overall depth and spend. On Sonnet 4.6, budget_tokens still works but is deprecated.
Opus 4.8 uses the tokenizer introduced with Opus 4.7, under which the same text produces roughly 1x–1.35x as many tokens (about 30% more on typical content, up to ~35%). Re-run the count_tokens endpoint against claude-opus-4-8 and give max_tokens and compaction triggers extra headroom. Sonnet 4.6 keeps the older tokenizer, so it is unaffected.
No. On Opus 4.8 (and 4.7), thinking.display defaults to "omitted", so thinking blocks stream with an empty text field. Set display: "summarized" in your thinking config to restore visible reasoning. The field name is unchanged; only the default flipped.