Tech

Tokenomics 2.0: The battle against AI costs

Published on

June 16, 2026

Via IndiaTImes (ET) - TechnologyIndiaTImes (ET) - Technology

Tokenomics 2.0: The battle against AI costs

BenchmarksCLOSED

Nifty23,853.90231.00

BenchmarksCLOSED

Sensex76,264.33736.38

FEATURED FUNDS

★★★★★

Motilal Oswal Midcap Fund Direct-Growth

5Y Return

21.56 %Invest Now

Today's ePaper

Tokenomics 2.0: The battle against AI costs

Swathi Moorthy

, ETtechLast Updated: Jun 16, 2026, 06:00:00 AM IST

Font Size

Save

Comment

Synopsis

Staggering token spending is forcing Indian firms to explore ways to rein in AI cost. This is where open-source models, small language models & inference layer startups step in, writes Swathi Moorthy.

Listen to this article in summarized format

400 billion tokens. That was a monthly expense of $78,000 which worked out to $1 million a year. A large enterprise in a regulated industry ran up this bill.

“This was an eye-wateringly high number for the firm,” said Karan Kirpalani, chief product officer, Neysa Networks. It is not that the company, which was one of Neysa’s key clients, used the most advanced model.
The strict internal usage protocol didn’t bring costs down. Latency issues persisted despite adopting models launched by frontier labs. The villain here is unoptimised AI workload.

Talk to any CTO, they will tell you that tokenmaxxing is one of the biggest challenges they are facing today. While MakeMyTrip’s Sanjay Mohan tells ET the monthly consumption is in millions Mohit Saxena of InMobi Group says it is in billions.

The staggering levels of token spend is forcing Indian companies to explore ways to rein in AI costs. The immediate response is to opt for open-source models and small language models. An additional option is to turn to startups such as Pipeshift and Divyam.ai. Through a range of solutions for inference optimisation, GPU orchestration and model routing, they help improve usage and save cost.

Sandeep Kohli, cofounder, Divyam.ai said that this is coming at the back of huge adoption of AI as large firms are investing significantly in building AI-native solutions. As intelligence is spread across multiple systems in enterprises inefficiencies can creep in. Not every aspect of the business requires the latest models, which are token hungry, and thus results in increased cost.

“When you start paying on a per token basis to some of the frontier labs and AI hyperscalers, tokenomics start to become eyewatering so quickly that it is unbelievable. The question then is, when you are at scale, does every request or prompt need to be directed to cutting-edge frontier models, or can you have a team of models in your AI ecosystem, where you determine which request goes through which model,” explained Neysa’s Kirpalani.

“(As much as) 80% of tasks can be solved by models that consume less tokens. There is a demand for rightsizing of model (usage) for the right task,” Kohli said. The enterprise demand for optimisation has spawned the inference market or the business of efficient deployment of models. Kirpalani said that there is no definitive estimate of India’s inference market. The global inference market could be worth about $125 billion in 2025 by conservative estimates, he noted.

The sophistication premium

Cost is shooting up as model sophistication increases, prompting enterprises to actively pursue inference optimisation. Token costs for latest models such as Claude Opus 4.8 and ChatGPT 5.5 have increased significantly.

For instance, Opus 4.8 costs $25 per million output tokens, compared to $5 for Claude Haiku 3.5. In the case of GPT5.5, a million output tokens cost $30, up from $15 for GPT5.4. Pro versions of both the models cost $180. GPT3.5 costs $1.5. For comparison, DeepSeek’s latest model costs $0.28-$0.87.

But not every query/prompt needs GPT5.5 pro or Opus 4.8, and can be routed through DeepSeek or Claude Haiku or cheaper open source models.

Divyam.ai’s Kohli said that they have built a team of models for each of the enterprise’s AI agents, which will route the queries to the right model. These are a mixture of frontier and open source models.

“We keep updating the models along with the rate cards so that enterprises can get the latest model without them having to adapt everytime,” Kohli said. Kohli said that their customers are able to see 50-70% savings in generative AI cost using model routing and inferencing. The firm is currently onboarding two US clients.

Pipeshift CEO Arko C said that they have orchestrated their stack to reduce latency, which is critical in multiple Indian businesses. The company has partnered with the GPU-service provider Neysa, where the former will deploy open source models to address rising cost and latency. Nurix AI, which has deployed Pipeshift’s tech stack, said that they are able to see 3x reduction on tokens, in a statement.

Beyond model selection, MakeMyTrip’s Mohan said, “There are use cases that need low latency, which do not require reasoning models. We are now focusing on small language models that run on CPUs.” However, these are not without challenges.

Sifting through AI washing

Enterprises’ immediate challenge is to find the right solution amid hype and AI washing surrounding the market. Kirpalani says clients are struggling to differentiate the signal from noise. “That has a direct correlation whether you were able to take your use case into production and see ROI and value capture successfully. So, unfortunately, there’s some amount of disingenuity on that front as well,” he said.

“Beyond digital natives, there is a sort of push that we need to do in terms of explaining to people why Anthropic, OpenAI and Gemini are not the answer to everything, and demonstrate that some use cases can work well on open source,” said Pipeshift’s Arko.

Kohli said one of the challenges they are facing is while the US counterparts are moving fast, the similar wave and urgency is missing in India, with firms still doing pilots. Divyam.ai is now shifting its focus on the US market, where there is a rising demand.

Add

as a Reliable and Trusted News Source

Add Now!

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We StartupNews.fyi want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It's possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Don't Miss

New leak shows us how Samsung’s upcoming foldables could stack up to each other

Up Next

AI adoption gains momentum, returns emerge but infra lags: Industry executives