😺 OpenAI gave GPT-5.4 Mini its own interns

来源：The Neuron原文链接

专属客服号

微信订阅号

大数据治理

全面提升数据价值

赋能业务提质增效

Your browser does not support the audio element.

Welcome, humans.

Anthropic just dropped a feature that makes us feel like we're living in the future.

Cowork Dispatch

lets you text Claude a task from your phone, go make lunch, and come back to finished work on your desktop. One continuous conversation that picks up where you left off.

Yes; this is what the future of AI should enable!

It's an early research preview, so pair your devices

here

.

Now, if you want the full backstory on how Cowork went from a 10-day prototype to what might be the closest thing to AGI yet (Swyx’s words, not ours,

but uh… we get the sentiment!

),

Latent Space interviewed Anthropic's Felix Rieseberg

on why local-first agent workflows matter, why skills may now matter more than MCPs (except in some cases), and why the real frontier is trusted task execution, not better chat. Worth the watch!

Here’s what happened in AI today:

😼 OpenAI released GPT-5.4 Mini and Nano, purpose-built "subagents" that work as cheap, fast AI workers for your main AI model

📰 The Pentagon is developing its own AI models to replace Anthropic after their $200M contract collapsed; OpenAI clinched an AWS deal to expand its government footprint

📰 OpenAI is cutting back on side projects (Sora, Atlas browser, hardware) to focus on coding after Claude Code dominance created a "code red"

🍪 Unsloth Studio trains and runs 500+ open AI models locally on your computer 2x faster with 70% less memory

📖 Ben Thompson argues the agentic AI wave is fundamentally different from past tech bubbles

P.S:

Later today

@

4pm PT

, Our very own Corey is going on

NVIDIA's livestream

! Check him out as he’s interviewed about some of the wild and whacky projects he’s been working on later.

Want to reach 675,000 AI-hungry readers?

Click here to advertise with us.

😼 OpenAI Released GPT-5.4 Mini (and Your AI Just Hired Its Own Interns)

DEEP DIVE:

OpenAI Built a Team of AI Interns for Your AI Boss

So it seems like every AI company has the same problem right now: their smartest model is too slow and expensive to do everything.

How did OpenAI attempt to solve this? They OpenAI

released GPT-5.4 mini and nano

today, and the play is less about shrinking the model and more about rethinking how AI systems work altogether.

These are purpose-built "subagents" (think of them as junior associates that a senior partner delegates tasks to). In

Codex

, the full GPT-5.4 acts as a project manager: it plans, makes decisions, and coordinates. Then it hands off parallel tasks (searching codebases, reviewing files, running tests) to a swarm of

GPT-5.4 minis

that execute fast and cheap.

It's the McKinsey model, except these consultants actually write code.

You can kind of think of subagents as the organizing system that will replace the model router. The router doesn't totally go away; it just gets smarter, using subagents (running faster, cheaper models) to delegate tasks to, abstracting the system one layer deeper than you or I need to worry about picking and assigning the right model to the task.

Now, faster and cheaper means nothing if it also means dumber. So here's what the benchmarks say:

GPT-5.4 mini scores 54.4% on SWE-Bench Pro

(coding benchmark, just 3 points behind the full GPT-5.4) and

72.1% on OSWorld

computer-use tasks (testing how good the agent is at using your computer), nearly matching the flagship model.

Pricing:

Mini runs $0.75 per million input tokens; nano costs just $0.20 (that's $0.05 less than

Mercury 2

, if I remember correctly). Mini uses 30% of GPT-5.4's Codex quota, so developers get roughly 3x the throughput.

Speed:

Over 2x faster than GPT-5 mini, with similar or better quality across coding, tool-calling, and vision tasks.

Can you use it?

GPT-5.4 mini = yes. It's live in the API, Codex, and ChatGPT (free users get it through "Thinking" mode). Nano is API-only atm

(conspiracy-minded folks would argue this market positioning is meant to compete directly with Mercury 2...).

Why this matters:

If you're a regular ChatGPT user, the speed improvements matter most to you. Responses in Thinking mode get faster and better. And if you're mostly using ChatGPT on your phone, it's worth checking out the

Codex desktop app

(now on Mac and Windows) for heavier work.

Codex is a great app; the only problem is it's built for coders and not all of us. We've been reading a lot of takes lately that argue OpenAI needs to give regular business users the same Codex-app capabilities in ChatGPT, or an equivalent work tool. Anthropic's doing something similar with

Cowork

, which brings Claude Code-style agent capabilities to non-developers.

And we also read a tweet that hinted at Anthropic launching a "Codex app killer" sometime next week. Smash that Eyes Looking Emoji Button!

Our take:

The price is the real story here. We don't really care about "mini" models because frankly, we try to use the best quality model possible, whenever possible. This is cost prohibitive, of course; so if we're going to use less than the best, it better be free or close to it.

According to OpenAI, Mini delivers ~95% of GPT-5.4's performance on computer use for a fraction of the cost. But compare it to the broader small model market and it's actually the priciest option:

Gemini 3 Flash

scores 78% on SWE-bench Verified at $0.50/$3.00.

Claude Haiku 4.5

matches Sonnet 4-level quality at $1/$5.

And the wildcard is

Mercury 2

, a diffusion-based model (generates all tokens in parallel instead of one-by-one) that hits ~1,000 tokens/sec at just $0.25/$0.75 (though Nano has Mercury beat here).

GPT-5.4 mini is a great model, but "cheapest" belongs to someone else. Could it be "pareto frontier" (the highest intelligence for the lowest cost) level quality though?

It might be

(just keep refreshing Artificial Analysis until they benchmark it)

...

FROM OUR PARTNERS

Are you risk-ready or risk exposed? Breaches are inevitable. What’s far less clear is whether organizations are truly ready to recover.

Based on insights from security and technology leaders worldwide,

Cohesity’s Global Cyber Resilience

research reveals what the top 6% of resilient organizations do differently. The findings are sobering. Most have faced material attacks, many more than once.

But the real value lies beyond the statistics. It shows why some teams recover fast while others absorb lasting operational and reputational damage. From recovery speed to data resilience and the real impact of AI and automation, this is a clear view of what modern resilience looks like in practice.

Explore Now

🎓 AI Skill of the Day:

Lock Down Your AI Agent in 60 Seconds

As you probably know by now,

OpenClaw

is an open-source AI agent that runs tasks autonomously on your machine. The problem? No security guardrails by default. Some of the worst risks included potentially exposed keys, unrestricted file access, and uncontrolled network activity.

Security folks called it all variations of dumpster fire when it first released.

But now we have

NVIDIA NemoClaw

, which wraps OpenClaw in a sandboxed runtime called

OpenShell

that enforces network, filesystem, and privacy policies so your agent can only touch what you approve.

Here’s how you (yes you, a non-technical person!) can install it (

docs

):

Step 1:

Open your terminal (ask Claude / ChatGPt if you dont know what that is) and paste this one line. It downloads everything, walks you through setup, and creates your sandbox automatically:

curl -fsSL https://nvidia.com/nemoclaw.sh | bash

Step 2:

Connect to your sandboxed agent and start chatting:

nemoclaw my-assistant connect openclaw tui

That's it. Your agent works, and your data stays locked down.

Two commands between "dumpster fire" and "locked vault" is a pretty good trade.

Don't want to deal with the terminal?

Deploy NemoClaw on Brev

and NVIDIA hosts the whole thing for you; one click, no setup, for only $0.13 per hour.

If you wanna try the claw but you’ve been too afraid, this is the way.

Want more tips like this? Check out our

AI Skill of the Day Digest

for this month.

Have a specific skill you want to learn?

Request it here.

New episodes air

every week

on:

Spotify

|

Apple Podcasts

|

YouTube

🍪

Treats to Try.

Claude Cowork

now dispatches tasks from your phone or desktop in one continuous conversation that picks up where you left off; assign Claude a task, walk away, and come back to finished work (early research preview;

pair your devices here

) —free to try.

Unsloth Studio

trains and runs 500+ open models (Qwen, DeepSeek, Gemma, vision, audio) locally on Mac/Windows/Linux 2x faster with 70% less memory, auto-creates datasets from your PDFs and documents via visual workflows, and exports to all formats (

GitHub

) —free to try.

Gamma Imagine

generates brand-specific charts, social graphics, and infographics from text prompts inside Gamma's 100+ presentation templates (integrates with ChatGPT, Claude, Zapier, Atlassian) —free to try.

Mistral Forge

builds custom AI models trained on your company's proprietary data, policies, and workflows so the model actually knows your business instead of giving generic answers; covers everything from data prep to alignment to production deployment (

intro video

) —enterprise pricing.

Hermes Agent v0.3.0

gives you real-time streaming AI agents across CLI and every platform with a plugin system to share tools and skills, plus live Chrome control, VS Code/Zed/JetBrains integration, and local voice mode —free to try.

Manus My Computer

brings Manus's AI agent directly to your desktop so it works on your actual files, apps, and browser without uploading anything.

Proton Mail Born Private

reserves a private email address for your child (from birth to age 15) with zero tracking, no ads, and zero-access encryption; you pick the address, donate $1+ to the Proton Foundation, and unlock it whenever they're ready —$1 minimum donation.

📰 Around the Horn

The

Pentagon is developing its own large language models

to replace Anthropic after their $200M contract collapsed over surveillance and weapons clauses, while

OpenAI clinched an AWS deal

to expand its government footprint.

OpenAI is cutting back on side projects

(Sora video app, Atlas browser, hardware device) to refocus on coding and enterprise after Claude Code's dominance created a "code red"; Codex now has 2M weekly active users. Here's a

good recap on all their sidequests

.

Amazon CEO Andy Jassy

told employees AI will push AWS to a $600B annual run-rate by 2036, double his prior projection.

Microsoft shook up its Copilot AI leadership team

, freeing up Mustafa Suleyman from day-to-day management.

Want absolutely EVERYTHING that happened in AI this week? Click here!

FROM OUR PARTNERS

The 2026 B2C ecommerce AI trends

Discover how AI-powered search, agentic AI, and personalization are transforming ecommerce. With 61% planning agentic AI adoption and 63% seeing higher purchase likelihood with AI tools, this report reveals what’s driving revenue, loyalty, and competitive advantage in 2026.

Get the report

📖

Midweek Wisdom:

Your reading list for the middle of the week:

Agents Over Bubbles

— Ben Thompson makes the case that the agentic AI wave is fundamentally different from prior tech bubbles, and why that matters for how you think about what's coming.

Post-Apocalyptic Education

— Ethan Mollick argues the Homework Apocalypse is already here (82% of undergrads use AI for schoolwork), teachers can't reliably detect it, and students overestimate what they're learning when AI does the work.

The Next Phase of Open Models

— Nathan Lambert argues open models will win by specialization, not by chasing closed-model performance, splitting into three distinct classes with different strengths.

The Karpathy Loop

— Fortune profiles how Andrej Karpathy ran 700 autonomous AI experiments in 2 days, and what that tells us about where agents are heading.

How to Buy an AI 'Grassroots' Movement

— Veronica Irwin investigates the manufactured grassroots campaigns behind AI lobbying and who's really funding them.