The AI Job Market Split in Two. One Side Pays $400K and Can't Hire Fast Enough

Video: The AI Job Market Split in Two. One Side Pays $400K and Can't Hire Fast Enough. (25m39s) → https://www.youtube.com/watch?v=4cuT-LKcmWs Abstract: Nate says the AI hiring market now looks like two diverging economies: traditional headcount is flat, yet AI-native roles are chronically understaffed with 3.2 jobs per qualified builder and 142-day vacancy cycles. He distills hundreds of job posts into seven concrete skill stacks—specification precision, eval/taste systems, multi-agent decomposition, failure diagnostics, trust & safety design, context architecture, and token economics—and argues that the ability to quantify cost and quality will matter more than “prompting flair.”

Highlights

  • [00:00] AI talent is the bottleneck. Manpower data shows ~1.6M openings vs. ~0.5M qualified candidates (3.2:1), with AI requisitions staying unfilled for 142 days; he’s launching a vetted job board and Substack guide so both sides stop guessing at requirements.
  • [04:45] Specification precision replaces “prompting.” Employers now screen for people who can write literal, intent-complete briefs (down to escalation rules and logging) because agents cannot infer missing requirements.
  • [06:56] Evaluation and taste are measurable. Winning ICs build automated eval harnesses, detect AI-specific failure modes, and can explain why an answer that “sounds right” still fails functional correctness.
  • [09:49] Multi-agent orchestration = management skill. Decomposing work for planners/executors, sizing tasks for the harness you have, and keeping specs refreshed mid-run are now core PM/eng expectations.
  • [12:35] Failure pattern literacy. Nate catalogues the six recurring breakages (context drift, spec drift, sycophantic confirmation, bad tool picks, cascading loops, silent failure) and insists hiring screens probe for how candidates diagnose them.
  • [19:57] Systems thinking at the top of the ladder. Senior roles now blend trust & safety design, context/knowledge architecture, and token-cost modeling so teams can prove ROI before burning 100M-token runs.

References & Links

Tobi Lütke Made a 20-Year-Old Codebase 53% Faster Overnight. Here's How

Video: Tobi Lütke Made a 20-Year-Old Codebase 53% Faster Overnight. Here's How. (29m34s) → https://www.youtube.com/watch?v=YpPcDHc3e9U Abstract: Nate argues that “agents” isn’t a single architecture but at least four distinct species: task-scale coding harnesses, project-scale planner/executor systems, dark factories that run from spec to eval with humans only at the edges, auto-research loops that hill-climb against measurable metrics, and orchestration frameworks that pass work between specialized roles. Picking the wrong species for the job causes most production failures—not model quality. The remedy is to anchor every deployment in decomposition quality, specification rigor, evaluation design, and an honest read on whether you are building software, optimizing a rate, or routing workflows.

Highlights

  • [00:00] Why the taxonomy matters. “LLM + tools in a loop” hides that enterprises are really running four divergent agent patterns; mislabeling them is why teams try to jam a single chat-based helper into multi-week deliverables.
  • [04:00] Coding harnesses and decomposition. Single-threaded helpers (à la Karpathy’s personal agents or Peter Steinberger’s CodeX swarms) work when humans stay in the manager role, slice work precisely, and treat the agent like an IC inside their repo and tooling.
  • [10:30] When to upgrade to planner/executor systems. Cursor’s production browser/compiler builds use a manager agent that plans, queues tasks, and spins short-lived executor agents—proof that team-scale projects demand simple but explicit management layers, not just “more assistants.”
  • [15:30] Dark factories demand specs plus evals. Fully autonomous pipelines push specs in, iterate until tests pass, and only surface to humans for intent-setting and final audit; Amazon’s recent AI-caused outages are Nate’s reminder that eval design, not heroics, keeps these runs safe.
  • [20:30] Auto research is about metrics, not deliverables. Whether Toby Lütke’s team is squeezing Shopify’s Liquid renderer or Karpathy is auto-tuning GPT-2-class stacks, the loop only works when you can measure the hill you’re climbing (latency, conversion, loss, etc.).
  • [23:00] Orchestration is a routing problem. Tools like LangGraph or CrewAI make sense when distinct roles (researcher → drafter → reviewer) must hand off thousands of tickets, but the coordination tax only pays off at scale; otherwise a project harness is cleaner.
  • [27:30] Cheat sheet. Nate closes with a matrix: use coding harnesses when judgment is the gate, planner/executor systems when teams need shared memory, dark factories when evals are trustworthy, auto research when a metric is king, and orchestration when workflow routing is the real bottleneck.

References & Links

Nvidia Just Open-Sourced What OpenAI Wants You to Pay Consultants For.

Published: 25 Mar 2026 · 09:05 AM AEDT

Abstract

Nate contrasts Nvidia’s NemoClaw security stack with the consultant-heavy alliances OpenAI and Anthropic just announced. He says four of the five “hard” agent-deployment problems are just engineering fundamentals—context compression, instrumentation, linting, and planner/executor plumbing—while only the specification/change-management gap may warrant outside help, so most teams can spend tens of thousands on engineering instead of millions on consultants.

Highlights

  • [00:00] Nvidia frames NemoClaw as an enterprise wrapper for OpenClaw: it runs inside the OpenShell runtime, enforces YAML guardrails, adds GPU attestation, and keeps execution on local Nvidia hardware so security teams can audit every connector without hiring Accenture or Deloitte.
  • [03:30] Huang’s ecosystem play assumes competent engineers can reuse proven primitives, a stark contrast to OpenAI/Anthropic’s narrative that enterprises lack expertise and therefore need consulting partners.
  • [10:45] Nate revisits Rob Pike’s five rules—measure before tuning, keep algorithms simple, let data dominate—to argue that most “agent” failures are hygiene failures, not novel AI limits.
  • [12:10] Factory.ai’s agent-readiness audits (style/lint configs, documented builds, dev containers, observability, governance, etc.) show the agent rarely breaks; the environment does, and fixing those eight pillars creates a compounding productivity loop.
  • [13:45] Factory’s context-compression bake-off found anchored iterative summaries preserved session intent better than OpenAI’s opaque Compact endpoint or Anthropic’s regenerate-everything SDK, but every strategy is lossy unless you scope work into milestones or hand tasks to fresh agents.
  • [16:00] Codebase instrumentation plus ruthless linting are the cheapest upgrades: baseline LLM responses, capture golden datasets, and force “straightjacket” style rules because agents behave like lazy developers unless the guardrails are non-negotiable.
  • [18:50] Stick to planner/executor pairs for multi-agent coordination until you can measure gains; complex orchestration without telemetry just multiplies bugs.
  • [20:10] The only problem that may justify consultants is specification fatigue—most teams can’t maintain clean context graphs or write crystal-clear specs—so Nate frames the landscape as four engineering problems you already own versus one domain-expertise problem you might outsource.

References & Links

I Mapped Where Every AI Agent Actually Sits. Most People Pick Wrong.

Published: 24 Mar 2026 · 01:00 AM AEDT

Abstract

Nate argues that OpenClaw’s viral success created a reference frame for every “me-too” agent launch, but most people miss the trade-offs that actually matter. He lays out three axes—where the agent runs, how intelligence is orchestrated, and what the interface contract looks like—to explain how Perplexity, Meta’s Manus, Anthropic, Lovable, and others are carving out niches in the same ecosystem. The punch line: picking agents is now a question of delegating trust, not chasing hype cycles.

Highlights

  • [00:00] Reframes the OpenClaw frenzy: rather than a simple spectrum of control, evaluate agents by execution venue (local vs. cloud), orchestration model (single vs. multi-model), and messaging interface to decide whether a fork actually fits your workflow.
  • [06:40] Profiles OpenClaw as the sovereignty play—local, API-key controlled, infinitely swappable modules with 250k+ GitHub stars—while reminding viewers that freedom comes with security debt (compromised plugin registries, supply-chain exploits, and user responsibility).
  • [08:10] Breaks down Perplexity Computer as the delegation play: a $200/month, cloud-contained, multi-agent service that promises months-long runs and even a “personal computer” enclave for data holdouts, best suited for research pipelines and exec briefings where outcome guarantees trump tinkering.
  • [10:50] Explains Meta’s Manus relaunch as a distribution grab; by mixing Meta and third-party models, Zuck keeps consumer/SB users inside the Meta attention funnel, but anyone prioritizing data privacy or model choice will balk at the trade.
  • [13:40] Covers Anthropic Dispatch, which pipes phone messages into a single Claude co-work session—limited routing and no multi-instance harness, but the safety-first brand plus existing desktop agent powers make it a credible turnkey option for non-technical professionals.
  • [16:30] Notes Lovable’s pivot from best-in-class “vibe coding” tool to general agent executor, illustrating how every breakout product now feels compelled to respond to OpenClaw’s definition of agent expectations.
  • [20:00] Closes with the strategic takeaway: agent trust decisions will set market structure for years, so map each entrant on the sovereignty–delegation–distribution axes instead of reacting to every announcement.

References & Links

McKinsey Says $1 Trillion In Sales Will Go Through AI Agents. Most Businesses Are Invisible.

Published: 23 Mar 2026 · 05:00 AM AEDT

Abstract

Nate argues that the past decade of anti-bot architecture is now blocking the AI agents that will mediate the majority of customer attention, so businesses must rebuild their data stacks to be agent-readable and agent-writable. He points to OpenClaw’s adoption curve, Google’s Universal Commerce Protocol, and Shopify’s agentic pilots as proof that the market is moving faster than incumbents, then dissects how Stripe and SAP illustrate the hidden data-engineering lift required to make agents useful. The episode finishes with four misconceptions that keep executives complacent and a call to encode “tribal” product knowledge into structured data before competitors’ agents capture the demand.

Highlights

  • [00:00] Fifteen years of anti-bot infrastructure—CAPTCHAs, walled APIs, JavaScript-heavy flows—now gatekeep the agent attention that will drive buying decisions, and OpenClaw’s 250k+ GitHub stars show customers want one agent that can talk to every system.
  • [07:30] Agentic commerce is already scaling: McKinsey pegs up to $1T in orchestrated U.S. retail revenue by 2030, Google’s Universal Commerce Protocol (UCP) and Shopify’s Toby Lütke are pushing merchants to expose shipping, pricing, and returns data, and NShift warns that agents simply skip offers with unclear logistics promises.
  • [13:30] Stripe’s MCP connector proved that wrapping APIs isn’t enough—Sigma exports are too large for context windows, so companies need intermediate databases with tight auth models before agents can safely query revenue-grade data.
  • [15:30] SAP exemplifies the legacy gap: most installs would need multi-quarter cleanups before agents could read or transact, so pressure from agent-ready customers (and their vendors) will decide which ERPs evolve.
  • [17:15] Four misconceptions debunked—agent discovery is not SEO, rich schemas serve complex goods, trust grows via partial delegation, and “wait and see” is fatal because data remediation alone consumes months.
  • [25:00] Roughly 80% of a product’s meaning lives in tribal knowledge and marketing copy; Nate urges teams to encode provenance, sustainability, loyalty perks, and shipping promises as structured data, then routinely benchmark agent flows (Claude, ChatGPT, etc.) against competitors to spot blind spots.

References & Links

Your AI Agent Fails 97.5% of Real Work. The Fix Isn't Coding

Video: Your AI Agent Fails 97.5% of Real Work. The Fix Isn't Coding. (29m26s) → https://www.youtube.com/watch?v=awV2kJzh8zk Abstract: Nate argues that the real blocker to reliable agent deployments is not model capability but the "memory wall"—agents have minutes of context while real jobs demand months of situational awareness. He walks through a fresh production failure, new benchmark data, and labor-market research that all converge on the same message: senior humans must steward context and encode it into evals, or agents will keep breaking systems. The upside goes to teams that treat evaluation design as a core competency, not a box-check.

Highlights

  • [00:00] Memory wall framing. Agents can now close tickets, ship designs, and write code, but their recall spans hours while software roles span ~18–24 months. Without a mechanism to carry institutional knowledge forward, otherwise competent agents guess which world they operate in and often guess wrong.
  • [02:30] Live production loss at DataTalks. Alexey Grigorev reused archived infrastructure configs to save cloud spend, and his coding agent logically “cleaned up” duplicates by tearing down everything described—including the live database (1.9 M student rows). It took paid AWS support and 24 hours to recover, underscoring that only humans or eval guardrails can encode “this is prod” knowledge.
  • [07:00] Remote Labor Index reality check. Scale AI & CAIS ran frontier agents through 240 paid Upwork briefs; only 2.5 % of projects met client-ready quality even though OpenAI’s GDP-Val benchmark shows the same models hitting expert speed when handed perfect context. Tasks with fully specified deliverables are tractable—jobs that mix ambiguous briefs, files, and implicit norms are still mostly failures.
  • [08:30] SWE-CI maintenance benchmark. Alibaba’s SWE-CI forces agents to evolve 100 real codebases over ~233 days of history. Seventy-five percent of models regress previously working features, proving that writing fresh code ≠ sustaining it once earlier decisions accumulate debt.
  • [11:30] Org-wide stakes. Nate generalizes the pattern to legal, marketing, and finance: agents can draft perfect work products yet miss the off-ledger agreements, cultural wounds, or board politics that decide whether an output is safe. Gartner (Feb ‘26) expects half of firms that cut staff for AI to rehire by 2027, and Forrester already sees 55 % regretting AI-driven layoffs.
  • [12:30] Harvard labor data. A longitudinal study of 62 M U.S. workers shows gen-AI adopters shrink junior hiring ~8 % but keep adding senior roles; the market is literally pricing contextual stewardship higher than execution. Seniors should spend cycles writing evals that express “what good looks like here,” then report weekly on the failures those evals prevented.
  • [20:30] Action plan. Treat eval writing as a senior job, memorialize the mental model (load-bearing services, taboo levers, political constraints), and make leadership aware of the risks you prevented. Agent insurance, better harness design, and weekly status notes about “evals that saved us” are becoming hygiene.

References & Links

Anthropic \(/loop\) Gives Your Agent a Heartbeat

Published: 21 Mar 2026 · 09:05 AM AEDT

Abstract

Anthropic quietly added /loop to Claude Code, and Nate argues it9s the missing primitive that lets every OpenBrain-style setup behave like OpenClaw without inheriting its security chaos. He frames effective agents as three Lego bricksa personal SQL+MCP memory, proactive scheduling, and MCP toolsand shows how /loop lets those pieces accumulate value across cycles. From wellness check-ins and customer success monitors to job-search copilots, networking briefings, and sales pipelines, the key is that proactivity plus memory turns chatbots into pattern-spotting operators. He closes by contrasting this native stack with OpenClaw9s risks, citing Karpathy9s Auto Research loop and Toby Lftke9s overnight model tuning as proof that compound loops beat one-off promptsespecially if you9re willing to live in the terminal for a few months of "free" time travel.

Highlights

  • [00:00] OpenBrain graduates from a solo Nate experiment to a community recipe rolodex: thousands have wired Claude to a personal SQL memory, and /loop is the "heartbeat" that lets that memory wake itself up instead of waiting for you to poke it.
  • [02:30] Nate formalizes the three Lego bricks of a real agentmemory, proactivity, and toolsand shows how /loop turns Claude Code into an OpenClaw-class runner without downloading OpenClaw.
  • [07:30] Energy journaling and customer-success check-ins demonstrate why memory matters: proactive loops can compare weeks of entries, surface correlations (late meals vs. fatigue, multi-week usage slides), and prescribe actions instead of repeating advice.
  • [12:30] Tools give agents hands: he walks through a networking workflow where Claude queries OpenBrain, calls Remotion through MCP to render a video briefing, and even pings Slack with links you forgot to send before a happy hour.
  • [17:30] Weekly job-search sessions, content calendars, and sales-pipeline sweeps become compounding loopsagents draft cover letters with fresh metrics, update comparison tables when vendors change pricing, and distinguish net-new, win-back, or duplicate leads on their own.
  • [22:30] Karpathy9s Auto Research repo and Toby Lftke9s overnight model tuning show why persistent logs plus loops beat brute force; Nate argues /loop plus OpenBrain offers the same pattern-learning architecture without exposing networks like raw OpenClaw does.
  • [27:30] Limitations remain (no built-in "done" signal, sessions die if you close the laptop, CLI skill required), but the upside is native scheduling, tighter data boundaries, and a gentle push to live in the terminal for "months of free time travel" ahead of GUI rollouts.

References & Links

Perplexity Computer Is Incredible. It Won't Matter. Here's Why.

Published: 20 Mar 2026 · 01:50 AM AEDT

Abstract

Perplexity's new Computer product is a genuine breakthrough in multi-model agent orchestration, but Nate argues it still sits in the most fragile spot of the AI stack: middleware that rents access to models and distribution it doesn't control. He walks through the February 2026 wave of launches that stratified the stack, then outlines four narrow positions where orchestration companies can still build durable advantage—and several dead ends they must avoid.

Highlights

  • [00:45] Computer orchestrates 19 frontier models, spawns sub-agents with persistent memory, integrates 400+ SaaS tools, and targets $200/month power users—yet its reliance on competitor models makes it a structural cautionary tale.
  • [06:00] January-February launches (Claude Co-Work, Opus 4.6, OpenClaw's surge, Samsung's agentic phone) hardened the stack into three layers: model owners, orchestration/app builders, and distribution surfaces controlled by hyperscalers.
  • [09:00] Middleware rents its position; model providers can ban credentials, raise prices, or absorb features, while simultaneously attacking the context layer from above (e.g., OpenAI Frontier, Anthropic Enterprise Agents).
  • [15:00] Computer's best fits are research-heavy workflows: competitive intelligence briefs, financial analysis packs, outbound prospecting pipelines, and long-running build tasks—but it's overkill for single-model chat or deep engineering work.
  • [20:00] Four defensible plays remain for middleware: own the context you platform (especially proprietary or fast-moving operational data), become infra other agents must call (search/verification APIs), embed so deeply in workflows that switching is intolerable, or police the trust/verification layer.
  • [24:00] Three blind alleys emerge: competing with hyperscalers for cloud tokens, betting on margin where model vendors set price, or assuming enterprises won't default to forward-deployed teams from OpenAI/Anthropic—so teams must reposition before the next model jump erases their edge.

References & Links

ChatGPT Health Identified Respiratory Failure. Then It Said Wait.

Published: 19 Mar 2026 · 01:00 AM AEDT

Abstract

What's really happening inside AI agents when they give you the wrong answer? The common story is that smarter models mean safer agents — but the reality is that reasoning traces and final outputs often run as two separate processes.

Highlights

  • In this video, I share the inside scoop on why AI agents fail in production and how to build evals that actually catch it.
  • Why agents perform worst precisely where the stakes are highest.
  • How reasoning traces routinely contradict an agent's final recommendation.
  • What factorial stress testing reveals that standard benchmarks completely miss.
  • Where to build the four-layer architecture that keeps agents honest in production.
  • Operators who ignore this now will face it later — through customer harm, regulatory pressure, or an insurance policy they can't obtain.

References & Links

Anthropic Didn't Build a New Browser. They Did Something Smarter.

Published: 18 Mar 2026 · 01:00 AM AEDT

Abstract

Claude's Chrome extension isn't a chatbot sitting in the sidebar—it's a workflow recorder that can run entire browser routines on autopilot. Nate shows how scheduling those recordings turns customer-service fights, inbox triage, and multitab research into repeatable background jobs.

Highlights

  • Let Claude fight customer-service battles and negotiate credits while you stay out of the queue.
  • Record a multi-step browser workflow once, schedule it, and keep it running without touching the keyboard.
  • Gmail, Calendar, and Drive awareness means Claude can triage your inbox and surface the items that actually need you.
  • Group tabs plus structured exports turn scattered research into spreadsheets, briefs, or reports automatically.
  • Debugging tips, data limits, and security reminders so the extension doesn't outrun your governance.

References & Links

Claude Code Wiped 2.5 Years of Data. The Engineer Who Built It Couldn't Stop It.

Published: 17 Mar 2026 · 01:00 AM AEDT

Abstract

Vibe coding got a lot of founders to MVP, but agents behave like unsupervised contractors once they touch production work. Nate walks through the five management skills that keep Claude Code from undoing months of shipping.

Highlights

  • Version control and save points are survival skills—without them you can't unwind a bad agent run.
  • Know when to restart an agent (and when to rebuild its context) before the window collapses your instructions.
  • Standing orders, guardrails, and rules files beat heroic prompting when an agent wakes up mid-task.
  • Make small, sandboxed bets so a runaway edit can't torch the entire product.
  • Treat agents like powerful interns: scoped tasks, persistent briefs, and human review keep them from wiping production again.

References & Links

She quit, picked up AI, and shipped in 30 days what her team planned for Q3.

Published: 16 Mar 2026 · 05:00 AM AEDT

Abstract

Solo founders aren't mythical outliers—they're just operating without the meetings, approvals, and coordination drag that suffocates the same people inside larger teams. Nate explains how AI agents cut overhead so top talent can finally ship.

Highlights

  • AI agents delete coordination overhead; they don't just replace headcount, they give builders back their day.
  • Taste without conviction keeps high performers stuck—AI accelerates those who are willing to act on their instincts.
  • "Speed of control" beats "span of control": the faster leaders remove blockers, the faster AI compounding shows up.
  • When companies refuse to clear the path, ambitious people leave to solo-found simply because it's the only place they can move.
  • Recognize, protect, and unburden the 25% of talent that's already operating at 4× output with AI.

References & Links

AI Made Every Company 10x More Productive. The Ones Cutting Headcount Are Telling on Themselves.

Published: 15 Mar 2026 · 02:01 AM AEDT

Abstract

What's really happening when Whoop announces it's hiring 600 people while the media narrative focuses entirely on job displacement? The common story is about how many fewer people companies need—but the reality is more interesting when execution costs drop by an order of magnitude and the pie itself expands.

Highlights

  • In this video, I share the inside scoop on six unlocks that give you a picture of what the future actually looks like:
    • Why iteration cycles compressing from months to days changes the mechanics of strategy
    • How hundreds of millions of domain experts become builders when the translation layer disappears
    • What happens when quality software becomes the default, not a premium
    • Where the market for ambition explodes when CFO math flips on experiments For anyone wrestling with the people challenges of AI, the hardest work ahead isn't technical—it's figuring out what upskilling looks like when the job isn't do the same thing faster.

References & Links

One Simple System Gave All My AI Tools a Memory. Here's How.

Published: 14 Mar 2026 · 01:01 AM AEDT

Abstract

What's really happening when thousands of people build an agent-readable database but can only interact with it through a chat window keyhole? The common story is that the MCP server is the whole system—but the reality is more interesting when you add a human door alongside the agent door.

Highlights

  • In this video, I share the inside scoop on how to give your Open Brain hands and feet through visual interfaces you build and deploy for free:
    • Why the table becomes a shared surface that both you and your agent see
    • How to build a visual layer with Claude and host it on Vercel for nothing
    • What household knowledge, professional relationships, and job hunts look like as dashboards
    • Where time bridging and cross-category reasoning earn their keep

References & Links

4,000 People Lost Their Jobs At Block. Dorsey Blamed AI. Here's What Actually Happened.

Published: 13 Mar 2026 · 01:01 AM AEDT

Abstract

What's really happening when the average knowledge worker spends 60% of their time on meetings and documents that exist only to coordinate with other humans? The common story is that AI automates tasks within your existing org—but the reality is more interesting when the coordination layer evaporates entirely.

Highlights

  • In this video, I share the inside scoop on why AI is revealing the job was never the real job:
    • Why PRDs, sprint planning, and status updates exist because the execution layer is human
    • How agent harnesses delete the need for handoffs, not just automate the handoffs themselves
    • What survives when coordination roles disappear: vision, architecture, genuine care, systems design
    • Where the two qualities that matter most are agency and ramp

References & Links

4 AI Labs Built the Same System Without Talking to Each Other (And Nobody's Discussing Why)

Published: 12 Mar 2026 · 01:00 AM AEDT

Abstract

What's really happening with AI capabilities at work — and why the "jagged AI" frame is now obsolete? The common story is that AI is brilliant at some things and broken at others — but the reality is that jaggedness was never about intelligence; it was about how we were deploying it.

Highlights

  • In this video, I share the inside scoop on why AI agents in proper harnesses are smoothing the capability frontier for real work:
    • Why the jagged AI frontier was always a deployment problem
    • How multi-agent coordination unlocks long-horizon knowledge work
    • What Cursor's math breakthrough reveals about AI generalization
    • Where meta-skills like sniff-checking become your competitive edge The organizations and individuals who learn to decompose work, delegate to AI agents, and verify outputs will extend their leverage — those who don't will find the shift happening to them anyway.

References & Links

Stop accepting AI output that "looks right." The other 17% is everything and nobody is ready for it.

Published: 11 Mar 2026 · 01:00 AM AEDT

Abstract

What's really happening when frontier models beat professionals with 14 years of experience 70% of the time but the output still doesn't survive contact with anyone who actually understands the domain? The common story is about prompting and workflow design—but the reality is more interesting when rejection creates institutional knowledge that did not exist before.

Highlights

  • In this video, I share the inside scoop on why learning to say no is the missing skill in the judgment and taste category:
    • Why your rejections are more valuable than your prompts
    • How recognition, articulation, and encoding break down into learnable dimensions
    • What Epic Systems teaches about scaling taste through thousands of encoded workflows
    • Where the structural gap in the AI tool ecosystem leaves every rejection on the floor For anyone watching AI flood organizations with output, the frontier of AI value is identical to the frontier of your organization's taste.

References & Links

Claude Blackmailed Its Developers. Here's Why the System Hasn't Collapsed Yet.

Published: 10 Mar 2026 · 01:01 AM AEDT

Abstract

What's really happening with AI safety in 2026? The common story is that the safety system is collapsing — but the reality is more complicated.

Highlights

  • In this video, I share the inside scoop on why the AI risk picture is both worse and more resilient than the headlines suggest: Why frontier AI agents scheme even after anti-scheming training
    • How competitive dynamics create emergent safety properties no lab planned
    • What "intent engineering" is and why it beats prompt engineering for AI agents
    • Where the real vulnerability lives — and why it's you, not the models The risks from large language models and autonomous AI agents are accelerating, but so are the structural forces holding the system together — and closing the gap between what you tell an agent and what you actually mean is the most leveraged safety skill you can build right now.

References & Links

45 People, $200M Revenue. The Question Nobody's Asking About AI and Your Team Size.

Published: 09 Mar 2026 · 05:00 AM AEDT

Abstract

What's really happening with AI and team size in your organization? The common story is that AI makes teams more productive so you can cut headcount — but the reality is more complicated.

Highlights

  • In this video, I share the inside scoop on why the five-person strike team is the structural unit of the AI era:
    • Why AI raised coordination costs by the same order as output
    • How scouts and strike teams map to different AI-era missions
    • What correctness-first thinking means for how you hire and build
    • Where the real opportunity is — expanding ambition, not shrinking headcount AI agents and LLMs didn't break your meetings problem — they amplified a team size problem you already had, and the leaders who restructure around small, high-judgment teams will build the defining companies of this decade.

References & Links

GPT-5.4 Let Mickey Mouse Into a Production Database. Nobody Noticed. (What This Means For Your Work)

Published: 08 Mar 2026 · 03:00 AM AEDT

Abstract

What's really happening when OpenAI engineers accidentally leak ChatGPT 5.4's existence but the model isn't even the interesting part? The common story is about the next capability jump—but the reality is more interesting when the company that first makes trillion-token organizational context genuinely usable becomes the new enterprise data platform.

Highlights

  • In this video, I share the inside scoop on why the four-part compound bet determines whether this justifies an $840 billion valuation:
    • Why intelligence and context are multiplicative—and weak reasoning with long context is actively harmful
    • How retrieval at enterprise scale breaks RAG in ways nobody's benchmarking
    • What memory that doesn't rot requires when organizational knowledge continuously evolves
    • Where Anthropic's organic context accumulation through Claude Code might beat OpenAI's infrastructure play For builders watching the enterprise stack get restructured, the lock-in from synthesized understanding is deeper than anything enterprise software has ever seen.

References & Links