Tokenomics 2.0: The battle against AI costs
Synopsis
“This was an eye-wateringly high number for the firm,” said Karan Kirpalani, chief product officer, Neysa Networks. It is not that the company, which was one of Neysa’s key clients, used the most advanced model.
The strict internal usage protocol didn’t bring costs down. Latency issues persisted despite adopting models launched by frontier labs. The villain here is unoptimised AI workload.
Talk to any CTO, they will tell you that tokenmaxxing is one of the biggest challenges they are facing today. While MakeMyTrip’s Sanjay Mohan tells ET the monthly consumption is in millions Mohit Saxena of InMobi Group says it is in billions.
ETtechThe staggering levels of token spend is forcing Indian companies to explore ways to rein in AI costs. The immediate response is to opt for open-source models and small language models. An additional option is to turn to startups such as Pipeshift and Divyam.ai. Through a range of solutions for inference optimisation, GPU orchestration and model routing, they help improve usage and save cost.
Sandeep Kohli, cofounder, Divyam.ai said that this is coming at the back of huge adoption of AI as large firms are investing significantly in building AI-native solutions. As intelligence is spread across multiple systems in enterprises inefficiencies can creep in. Not every aspect of the business requires the latest models, which are token hungry, and thus results in increased cost.
ETtech“When you start paying on a per token basis to some of the frontier labs and AI hyperscalers, tokenomics start to become eyewatering so quickly that it is unbelievable. The question then is, when you are at scale, does every request or prompt need to be directed to cutting-edge frontier models, or can you have a team of models in your AI ecosystem, where you determine which request goes through which model,” explained Neysa’s Kirpalani.
“(As much as) 80% of tasks can be solved by models that consume less tokens. There is a demand for rightsizing of model (usage) for the right task,” Kohli said. The enterprise demand for optimisation has spawned the inference market or the business of efficient deployment of models. Kirpalani said that there is no definitive estimate of India’s inference market. The global inference market could be worth about $125 billion in 2025 by conservative estimates, he noted.
The sophistication premium
Cost is shooting up as model sophistication increases, prompting enterprises to actively pursue inference optimisation. Token costs for latest models such as Claude Opus 4.8 and ChatGPT 5.5 have increased significantly.
For instance, Opus 4.8 costs $25 per million output tokens, compared to $5 for Claude Haiku 3.5. In the case of GPT5.5, a million output tokens cost $30, up from $15 for GPT5.4. Pro versions of both the models cost $180. GPT3.5 costs $1.5. For comparison, DeepSeek’s latest model costs $0.28-$0.87.
But not every query/prompt needs GPT5.5 pro or Opus 4.8, and can be routed through DeepSeek or Claude Haiku or cheaper open source models.
Divyam.ai’s Kohli said that they have built a team of models for each of the enterprise’s AI agents, which will route the queries to the right model. These are a mixture of frontier and open source models.
“We keep updating the models along with the rate cards so that enterprises can get the latest model without them having to adapt everytime,” Kohli said. Kohli said that their customers are able to see 50-70% savings in generative AI cost using model routing and inferencing. The firm is currently onboarding two US clients.
Pipeshift CEO Arko C said that they have orchestrated their stack to reduce latency, which is critical in multiple Indian businesses. The company has partnered with the GPU-service provider Neysa, where the former will deploy open source models to address rising cost and latency. Nurix AI, which has deployed Pipeshift’s tech stack, said that they are able to see 3x reduction on tokens, in a statement.
Beyond model selection, MakeMyTrip’s Mohan said, “There are use cases that need low latency, which do not require reasoning models. We are now focusing on small language models that run on CPUs.” However, these are not without challenges.
Sifting through AI washing
Enterprises’ immediate challenge is to find the right solution amid hype and AI washing surrounding the market. Kirpalani says clients are struggling to differentiate the signal from noise. “That has a direct correlation whether you were able to take your use case into production and see ROI and value capture successfully. So, unfortunately, there’s some amount of disingenuity on that front as well,” he said.
“Beyond digital natives, there is a sort of push that we need to do in terms of explaining to people why Anthropic, OpenAI and Gemini are not the answer to everything, and demonstrate that some use cases can work well on open source,” said Pipeshift’s Arko.
Kohli said one of the challenges they are facing is while the US counterparts are moving fast, the similar wave and urgency is missing in India, with firms still doing pilots. Divyam.ai is now shifting its focus on the US market, where there is a rising demand.
(Catch all the Technology News News, and Latest News Updates on The Economic Times.)
...more
Source link
















