😺 OpenAI gave GPT-5.4 Mini its own interns

专属客服号

微信订阅号
大数据治理
全面提升数据价值
赋能业务提质增效

Welcome, humans.
Anthropic just dropped a feature that makes us feel like we're living in the future.
Cowork Dispatch
lets you text Claude a task from your phone, go make lunch, and come back to finished work on your desktop. One continuous conversation that picks up where you left off.
Yes; this is what the future of AI should enable!
It's an early research preview, so pair your devices
here
.
Now, if you want the full backstory on how Cowork went from a 10-day prototype to what might be the closest thing to AGI yet (Swyx’s words, not ours,
but uh… we get the sentiment!
),
Latent Space interviewed Anthropic's Felix Rieseberg
on why local-first agent workflows matter, why skills may now matter more than MCPs (except in some cases), and why the real frontier is trusted task execution, not better chat. Worth the watch!
Here’s what happened in AI today:
😼 OpenAI released GPT-5.4 Mini and Nano, purpose-built "subagents" that work as cheap, fast AI workers for your main AI model
📰 The Pentagon is developing its own AI models to replace Anthropic after their $200M contract collapsed; OpenAI clinched an AWS deal to expand its government footprint
📰 OpenAI is cutting back on side projects (Sora, Atlas browser, hardware) to focus on coding after Claude Code dominance created a "code red"
🍪 Unsloth Studio trains and runs 500+ open AI models locally on your computer 2x faster with 70% less memory
📖 Ben Thompson argues the agentic AI wave is fundamentally different from past tech bubbles
P.S:
Later today
@
4pm PT
, Our very own Corey is going on
NVIDIA's livestream
! Check him out as he’s interviewed about some of the wild and whacky projects he’s been working on later.

😼 OpenAI Released GPT-5.4 Mini (and Your AI Just Hired Its Own Interns)

So it seems like every AI company has the same problem right now: their smartest model is too slow and expensive to do everything.
How did OpenAI attempt to solve this? They OpenAI
released GPT-5.4 mini and nano
today, and the play is less about shrinking the model and more about rethinking how AI systems work altogether.
These are purpose-built "subagents" (think of them as junior associates that a senior partner delegates tasks to). In
Codex
, the full GPT-5.4 acts as a project manager: it plans, makes decisions, and coordinates. Then it hands off parallel tasks (searching codebases, reviewing files, running tests) to a swarm of
GPT-5.4 minis
that execute fast and cheap.
It's the McKinsey model, except these consultants actually write code.
You can kind of think of subagents as the organizing system that will replace the model router. The router doesn't totally go away; it just gets smarter, using subagents (running faster, cheaper models) to delegate tasks to, abstracting the system one layer deeper than you or I need to worry about picking and assigning the right model to the task.
Now, faster and cheaper means nothing if it also means dumber. So here's what the benchmarks say:
GPT-5.4 mini scores 54.4% on SWE-Bench Pro
(coding benchmark, just 3 points behind the full GPT-5.4) and
72.1% on OSWorld
computer-use tasks (testing how good the agent is at using your computer), nearly matching the flagship model.
Pricing:
Mini runs $0.75 per million input tokens; nano costs just $0.20 (that's $0.05 less than
Mercury 2
, if I remember correctly). Mini uses 30% of GPT-5.4's Codex quota, so developers get roughly 3x the throughput.
Speed:
Over 2x faster than GPT-5 mini, with similar or better quality across coding, tool-calling, and vision tasks.
Can you use it?
GPT-5.4 mini = yes. It's live in the API, Codex, and ChatGPT (free users get it through "Thinking" mode). Nano is API-only atm
(conspiracy-minded folks would argue this market positioning is meant to compete directly with Mercury 2...).
Why this matters:
If you're a regular ChatGPT user, the speed improvements matter most to you. Responses in Thinking mode get faster and better. And if you're mostly using ChatGPT on your phone, it's worth checking out the
Codex desktop app
(now on Mac and Windows) for heavier work.
Codex is a great app; the only problem is it's built for coders and not all of us. We've been reading a lot of takes lately that argue OpenAI needs to give regular business users the same Codex-app capabilities in ChatGPT, or an equivalent work tool. Anthropic's doing something similar with
Cowork
, which brings Claude Code-style agent capabilities to non-developers.
And we also read a tweet that hinted at Anthropic launching a "Codex app killer" sometime next week. Smash that Eyes Looking Emoji Button!
Our take:
The price is the real story here. We don't really care about "mini" models because frankly, we try to use the best quality model possible, whenever possible. This is cost prohibitive, of course; so if we're going to use less than the best, it better be free or close to it.
According to OpenAI, Mini delivers ~95% of GPT-5.4's performance on computer use for a fraction of the cost. But compare it to the broader small model market and it's actually the priciest option:
Gemini 3 Flash
scores 78% on SWE-bench Verified at $0.50/$3.00.
Claude Haiku 4.5
matches Sonnet 4-level quality at $1/$5.
And the wildcard is
Mercury 2
, a diffusion-based model (generates all tokens in parallel instead of one-by-one) that hits ~1,000 tokens/sec at just $0.25/$0.75 (though Nano has Mercury beat here).
GPT-5.4 mini is a great model, but "cheapest" belongs to someone else. Could it be "pareto frontier" (the highest intelligence for the lowest cost) level quality though?
It might be
(just keep refreshing Artificial Analysis until they benchmark it)
...

FROM OUR PARTNERS
Are you risk-ready or risk exposed? Breaches are inevitable. What’s far less clear is whether organizations are truly ready to recover.
Based on insights from security and technology leaders worldwide,
Cohesity’s Global Cyber Resilience
research reveals what the top 6% of resilient organizations do differently. The findings are sobering. Most have faced material attacks, many more than once.
But the real value lies beyond the statistics. It shows why some teams recover fast while others absorb lasting operational and reputational damage. From recovery speed to data resilience and the real impact of AI and automation, this is a clear view of what modern resilience looks like in practice.

🎓 AI Skill of the Day:
Lock Down Your AI Agent in 60 Seconds
As you probably know by now,
OpenClaw
is an open-source AI agent that runs tasks autonomously on your machine. The problem? No security guardrails by default. Some of the worst risks included potentially exposed keys, unrestricted file access, and uncontrolled network activity.
Security folks called it all variations of dumpster fire when it first released.
But now we have
NVIDIA NemoClaw
, which wraps OpenClaw in a sandboxed runtime called
OpenShell
that enforces network, filesystem, and privacy policies so your agent can only touch what you approve.
Step 1:
Open your terminal (ask Claude / ChatGPt if you dont know what that is) and paste this one line. It downloads everything, walks you through setup, and creates your sandbox automatically:
curl -fsSL https://nvidia.com/nemoclaw.sh | bash
Step 2:
Connect to your sandboxed agent and start chatting:
nemoclaw my-assistant connect
openclaw tui
That's it. Your agent works, and your data stays locked down.
Two commands between "dumpster fire" and "locked vault" is a pretty good trade.
Don't want to deal with the terminal?
Deploy NemoClaw on Brev
and NVIDIA hosts the whole thing for you; one click, no setup, for only $0.13 per hour.
If you wanna try the claw but you’ve been too afraid, this is the way.

Trending: Three popular Neuron podcast eps…
|
|
|

🍪
Treats to Try.
Claude Cowork
now dispatches tasks from your phone or desktop in one continuous conversation that picks up where you left off; assign Claude a task, walk away, and come back to finished work (early research preview;
pair your devices here
) —free to try.
Unsloth Studio
trains and runs 500+ open models (Qwen, DeepSeek, Gemma, vision, audio) locally on Mac/Windows/Linux 2x faster with 70% less memory, auto-creates datasets from your PDFs and documents via visual workflows, and exports to all formats (
GitHub
) —free to try.
Gamma Imagine
generates brand-specific charts, social graphics, and infographics from text prompts inside Gamma's 100+ presentation templates (integrates with ChatGPT, Claude, Zapier, Atlassian) —free to try.
Mistral Forge
builds custom AI models trained on your company's proprietary data, policies, and workflows so the model actually knows your business instead of giving generic answers; covers everything from data prep to alignment to production deployment (
intro video
) —enterprise pricing.
Hermes Agent v0.3.0
gives you real-time streaming AI agents across CLI and every platform with a plugin system to share tools and skills, plus live Chrome control, VS Code/Zed/JetBrains integration, and local voice mode —free to try.
Manus My Computer
brings Manus's AI agent directly to your desktop so it works on your actual files, apps, and browser without uploading anything.
Proton Mail Born Private
reserves a private email address for your child (from birth to age 15) with zero tracking, no ads, and zero-access encryption; you pick the address, donate $1+ to the Proton Foundation, and unlock it whenever they're ready —$1 minimum donation.

📰 Around the Horn
The
Pentagon is developing its own large language models
to replace Anthropic after their $200M contract collapsed over surveillance and weapons clauses, while
OpenAI clinched an AWS deal
to expand its government footprint.
OpenAI is cutting back on side projects
(Sora video app, Atlas browser, hardware device) to refocus on coding and enterprise after Claude Code's dominance created a "code red"; Codex now has 2M weekly active users. Here's a
good recap on all their sidequests
.
Amazon CEO Andy Jassy
told employees AI will push AWS to a $600B annual run-rate by 2036, double his prior projection.
Microsoft shook up its Copilot AI leadership team
, freeing up Mustafa Suleyman from day-to-day management.

FROM OUR PARTNERS
The 2026 B2C ecommerce AI trends
Discover how AI-powered search, agentic AI, and personalization are transforming ecommerce. With 61% planning agentic AI adoption and 63% seeing higher purchase likelihood with AI tools, this report reveals what’s driving revenue, loyalty, and competitive advantage in 2026.

📖
Midweek Wisdom:
Your reading list for the middle of the week:
Agents Over Bubbles
— Ben Thompson makes the case that the agentic AI wave is fundamentally different from prior tech bubbles, and why that matters for how you think about what's coming.
Post-Apocalyptic Education
— Ethan Mollick argues the Homework Apocalypse is already here (82% of undergrads use AI for schoolwork), teachers can't reliably detect it, and students overestimate what they're learning when AI does the work.
The Next Phase of Open Models
— Nathan Lambert argues open models will win by specialization, not by chasing closed-model performance, splitting into three distinct classes with different strengths.
The Karpathy Loop
— Fortune profiles how Andrej Karpathy ran 700 autonomous AI experiments in 2 days, and what that tells us about where agents are heading.
How to Buy an AI 'Grassroots' Movement
— Veronica Irwin investigates the manufactured grassroots campaigns behind AI lobbying and who's really funding them.
How to Survive the AI Age: A Concrete Guide
— A practical framework for navigating career and life decisions as AI reshapes the economy.

A Cat’s Commentary


![]() |
|
P.P.S:
Love the newsletter, but only want to get it once per week? Don’t unsubscribe—
update your preferences here
.








