Tokenmaxxing: The Hidden Cost That Eats the Savings You Were Promised
Saturday Edition — March 20, 2026
Tuesday flagged a New York Times piece about engineers running autonomous coding agents around the clock and burning billions of tokens a week. The real question nobody is wrestling with yet: what happens to the business case for laying off humans when the engineers who remain are spending more on Claude than some of them earn? That is not theoretical. That is this week.
🧭 The Setup
A term has entered the lexicon: tokenmaxxing. It describes the practice of consuming the largest possible volume of AI tokens and not because the work demands it, but because token volume has become a proxy for status, performance, and job security inside a growing number of technology companies. Internal leaderboards are real. Token spend tied to performance reviews is real. One engineer told the Times, “I likely spend more on Claude than I earn.” His employer, apparently, covers the tab.
The behaviour is not irrational from the individual’s perspective. In an environment where AI is used to justify headcount reductions, visible and aggressive AI usage is a form of career insurance. The engineer who tokenmaxxes today is the engineer who cannot be accused of underusing the tools tomorrow. The incentive is perfectly rational. The aggregate consequence for the CFO, though, is a budget leak that nobody has a line item for.
What makes it urgent right now is the arrival of autonomous agents and subagent swarms. Until recently, high token consumption required sustained human effort. You had to type, prompt, revise. Natural limits existed. A student revising an essay might consume around 10,000 tokens across several passes. Reaching millions required hours. Billions seemed nearly impossible. Then agentic coding tools arrived. A single developer can now spin up a swarm of agents running 24 hours a day, seven days a week, each calling tools, looping, iterating, and feeding results back into subsequent prompts. Research from ByteDance noted that in typical agentic loops, token consumption grows quadratically with the number of API interactions — not linearly. The meter doesn’t just run. It accelerates.
Anthropic reportedly doubled its revenue forecasts within two months in early 2026, driven largely by the expansion of agentic coding tools. OpenAI’s Codex agent tripled its weekly active users since the start of the year, with overall token usage increasing fivefold. Google disclosed that its AI models processed over 1.3 quadrillion tokens monthly. These numbers are not coming from casual users. They are largely the product of agents running agents running agents and right now very few organisations have any idea what it is costing them.
🔍 What’s Actually Happening
The financial case for replacing human workers with AI is real. The fully loaded cost of a knowledge worker earning $100,000 annually reaches approximately $135,000 when benefits, payroll taxes, office space, and management overhead are factored in at around $724,000 over five years. The equivalent AI agent, with all infrastructure, operations staffing, error mitigation, guardrail tooling, and risk overhead accounted for, runs roughly $82,000 per year, or about $410,000 over five years. That 1.8x cost advantage is genuine.
It is also, critically, the honest number which not the “you can replace a $100K employee with $2,730 in tokens” figure that circulates on LinkedIn. The inference cost is only about 7% of the total. The rest is everything else: the infrastructure underneath it, the AI ops team managing it, the error and compliance overhead surrounding it. Companies that build their AI business cases on inference pricing alone are not doing financial analysis. They are writing optimistic fiction.
Now introduce tokenmaxxing. A moderate heavy AI user might consume 10 million tokens per month. At current pricing, that runs roughly $100 to $200 per month in raw inference cost, or about $1,200 to $2,400 per year. Manageable. But the engineers described in the Times piece are not moderate users. Swarms of agents running continuously can exceed 10 billion tokens per week. One engineer, by the Times’ account, processed the equivalent of 33 times the entire content of Wikipedia in a single week. At that scale, a single individual’s token spend can reach $150,000 per month or more which is well over $1 million per year in inference cost alone, before a single dollar of infrastructure or oversight cost is added.
The arithmetic is not subtle. A company lays off a senior engineer at $180,000 in fully loaded annual cost. One of the engineers who remains tokenmaxxes their way past $1 million per year in token spend. The headcount saving is erased several times over. The CFO approved a cost optimisation initiative. Finance is looking at a line item labelled “AI productivity tools.” Nobody has connected the dots yet.
👤 People
The humans caught in this are in an impossible position. Layoffs are being justified using AI productivity arguments, so the rational response from every remaining employee is to demonstrate the most aggressive possible AI usage. Performance reviews and leaderboards that reward token volume, rather than useful output, have turned a professional tool into a status competition. The anxiety is real and it is being expressed through consumption. Token burn has become career insurance. This is what happens when organisations use AI metrics as a performance lever without defining what those metrics are supposed to measure.
⚙️ Process
Agentic AI makes token costs fundamentally different from anything the enterprise has managed before. Traditional SaaS was priced by seat which is predictable, bounded, controllable. Token spend scales with behaviour: prompt length, retry loops, reasoning depth, context window size, and the number of subagents in a swarm. A Deloitte report found that AI is the fastest-growing expense in corporate technology budgets, with some firms reporting it consuming up to 50% of IT spend. Only 15% of companies can forecast AI costs within plus or minus 10% accuracy because spending is fragmented across teams, environments, and vendors. An unconstrained agent entering a loop can consume 50 times the tokens of a single clean pass. There are no circuit breakers unless someone builds them.
📋 Policy
The governance gap is severe. Most enterprises that have deployed AI tools have not established token budgets, cost-allocation frameworks, or anomaly detection for usage spikes. Deloitte’s guidance is direct: treat AI tokens like energy or capital, not like SaaS seats. That means metering by use case, per-team budgets, model routing by task complexity, caching of repeated contexts, and chargeback back to business units. None of this is technically hard. All of it requires the CFO, the CIO, and the engineering leadership to agree that token spend is a governed cost category, not a productivity perk. Very few organisations have made that agreement yet.
💡 So What?
The 1.8x cost advantage of AI agents over human labour is real. It is also conditional. The condition is that token consumption is governed, budgeted, monitored and attributed to outcomes. Without governance, the advantage reverses. A single tokenmaxxing engineer running agent swarms can burn through more in monthly inference costs than the annual salary of the person who was laid off to fund the AI transition. The “savings” evaporate. The restructuring announcement still went out. The headcount is still gone. The savings just aren’t there.
The deeper issue is that the business cases being used to justify AI-driven workforce reductions are being built on inference pricing, not fully loaded costs. That is a structural flaw. The business case should include infrastructure, AI ops headcount, error overhead, and a realistic assumption about the distribution of token usage across the engineering population; including the top 10% who will tokenmaxx. If the business case can’t survive that stress test, it needs to be revised before the layoffs are announced, not after the quarterly results come in.
In my experience, the organisations that get this right are treating AI spend the same way they treat cloud spend: metered, budgeted by team, governed by policy, and tied to outcomes. The organisations that get it wrong are the ones treating AI as a licence fee with no usage ceiling.
🙋 Who Cares?
CFOs should care the most. The tokenmaxxing dynamic means that the financial model underpinning most AI-driven restructuring is incomplete. If token spend is not included in the cost-benefit analysis (and it typically isn’t) the savings projections are overstated. A token governance programme is not optional; it’s the thing that makes the AI investment case defensible.
CIOs and CTOs carry the operational exposure. They are the ones who will be asked to explain why the AI productivity initiative is showing up as a cost overrun. Model routing, agent recursion limits, per-user token budgets and these are engineering decisions that need to be made before deployment, not after the first unexpected invoice arrives.
CSM leaders should pay attention because the same dynamic is coming for customer-facing functions. As AI takes on health scoring, renewal forecasting, and automated outreach, the token costs of running those agents will land somewhere. If they land in the CS budget with no governance model attached, the cost-to-serve numbers that justified the AI investment will stop making sense fast. The CSM team doesn’t write the infrastructure policy, but it absolutely bears the consequences when the policy doesn’t exist.
Boards and governance committees need to ask a question that is not yet on most board agendas: is the organisation’s AI cost exposure metered, bounded, and reported? If the answer is no; or uncertain then the risk-adjusted return on the AI programme cannot be calculated. That is not an acceptable position for a board to be in.
⚖️ The Honest Trade-Offs
Nothing here is all upside. The cost advantage of AI agents is real but fragile, and the table below reflects where the bodies are buried.
🛠️ Making It Real — The Token Accountability Stack
The right response is not to shut down agentic AI. The right response is to govern it the way serious organisations govern cloud spend. The following four-layer framework can be stood up in 90 days.
Layer 1 — Visibility (Weeks 1–4). You cannot govern what you cannot see. Instrument every AI tool and API call with token consumption logging tied to user identity and business unit. Most AI gateways, such as; Kong, LangSmith, Helicone, Bifrost. They all support this natively. Build a simple dashboard that shows token spend by team, by use case, and by model. Surface the top 10 consumers by volume. Do not act yet. Just look.
Layer 2 — Benchmarking (Weeks 3–6). Establish what reasonable token consumption looks like for each role and task type. A CSM using AI to summarise call notes and draft renewal emails should consume a different order of magnitude than an engineer running autonomous refactoring agents. Set consumption benchmarks by role. Flag anything exceeding 3x the benchmark as an anomaly requiring review not punishment, review. The goal is understanding, not policing.
Layer 3 — Budgets and Routing (Weeks 5–10). Set explicit token budgets per workflow, per team, and per individual. Implement model routing: send simple, repetitive tasks to smaller and cheaper models and reserve the flagship models for genuinely complex reasoning. Cap agent recursion depth and retry limits. A loop that reruns 10 times consumes 50x the tokens of a single clean pass. Set a turn limit and escalate to a human when the ceiling is reached. Token costs can be reduced 30–80% with routing and capping alone, without quality loss on routine tasks.
Layer 4 — Value Attribution (Weeks 8–12). Connect token consumption to business outcomes. The right metric is not tokens consumed per user. It is cost-per-useful-decision, cost-per-resolved-ticket, or cost-per-committed-code-line in production. “Dollar-per-decision” is a better ROI metric than cost-per-inference because it captures both cost and the business value of each autonomous action. Without this layer, tokenmaxxing leaderboards persist because nobody can tell productive token burn apart from performative token burn.
📥 Subscriber Resource
The Token Accountability Audit a one-page diagnostic with 12 questions to assess your organisation’s current exposure to tokenmaxxing risk and readiness to govern AI compute costs. Covers visibility, governance, budgeting, and value attribution. Designed to be brought to your next CFO or CIO conversation.
💬 The Question I’m Sitting With
If token volume is becoming a performance metric and even informally, even just because it appears on a leaderboard then have we accidentally created an incentive structure that rewards the appearance of productivity over its substance? And if that’s true, is it meaningfully different from the expense-account culture or the billable-hours culture that came before it? I keep turning this over. The technology is new. The human behaviour underneath it is very, very old.
Hit reply; I read everything.
Sources
Cade Metz and Erin Griffith. “More! More! More! Tech Workers Max Out Their A.I. Use.” New York Times, March 20, 2026. https://www.nytimes.com/2026/03/20/technology/tokenmaxxing-ai-agents.html.
MeaningfulTech. “The Token Economy: What a $100,000 Employee Really Costs in the Age of AI.” Meaningful Technology, 2026. https://meaningfultech.com/p/the-token-economy-what-a-100000-employee.
OpenTools AI. “Hard Fork Podcast Dives Into AI Washing, LLM Writing Struggles, and Tokenmaxxing Frenzy.” OpenTools, 2026. https://opentools.ai/news/hard-fork-podcast-dives-into-ai-washing-llm-writing-struggles-and-tokenmaxxing-frenzy.
Deloitte. “AI Tokens: How to Navigate AI’s New Spend Dynamics.” Deloitte Insights, January 17, 2026. https://www.deloitte.com/us/en/insights/topics/emerging-technologies/ai-tokens-how-to-navigate-spend-dynamics.html.
Sahar Hashmi. “Agentic AI’s Token Paradox: When Cheaper Means More Expensive.” Forbes, November 2, 2025. https://www.forbes.com/sites/saharhashmi/2025/11/03/agentic-ais-token-paradox-when-cheaper-means-more-expensive/.
Larridin. “AI Usage and Token Consumption Visibility: How CFOs Regain Control.” Larridin, January 28, 2026. https://larridin.com/blog/ai-usage-token-visibility.
Kong Inc. “Agentic AI Cost Management: Stopping Margin Erosion.” Kong Blog, January 29, 2026. https://konghq.com/blog/enterprise/ai-cost-management-stopping-margin-erosion.
Pankaj Jadeja. “Why Your AI Agent Burns Money: Token Cost Optimization Guide.” LinkedIn Pulse, March 4, 2026. https://www.linkedin.com/pulse/why-your-ai-agent-burns-money-token-cost-optimization-jadeja-nninf.
ZDNet. “Why You’ll Pay More for AI in 2026, and 3 Money-Saving Strategies.” ZDNet, January 12, 2026. https://www.zdnet.com/article/why-ai-costs-increasing-2026-tokens-dram-licensing-how-to-budget/.
Reuters. “Companies Cutting Jobs as Investments Shift Toward AI.” Reuters, March 19, 2026. https://www.reuters.com/business/world-at-work/companies-cutting-jobs-investments-shift-toward-ai-2026-03-19/.
CNBC. “Meta Stock Climbs on Report of Mass Layoff Plans to Offset AI Costs.” CNBC, March 16, 2026. https://www.cnbc.com/2026/03/16/meta-ai-costs-mass-layoffs-20percent-up-premarket.html.
The Conversation. “Tech Companies Are Blaming Massive Layoffs on AI. What’s Really Going On?” The Conversation, January 30, 2026. https://theconversation.com/tech-companies-are-blaming-massive-layoffs-on-ai-whats-really-going-on-278314.
Campbell Robertson. “The Code Revolution Has Second-Order Effects and the Market Is Getting Them Wrong.” AI & CSM: So What? and Who Cares? Substack, March 1, 2026. https://campbellrobertson.substack.com/p/the-code-revolution-has-second-order.



