Google Unveils Low-Cost AI Model Gemini 3.1 Flash-Lite

Google has unveiled one of the cheapest artificial intelligence models available

Google has introduced a new artificial intelligence model - Gemini 3.1 Flash-Lite. The developers call it the fastest and most cost-effective model in the Gemini lineup, focused on tasks with a large number of requests.

Google has unveiled one of the cheapest artificial intelligence models available

The new model is designed primarily for developers and companies that need to process large amounts of data in real time. It is already available in a pre-release version through Google’s AI Studio platform and Vertex AI cloud service, reports Dataconomy.

Betting on speed and cheapness

According to the company, the model costs $0.25 per million input tokens and $1.50 per million output tokens to run, making it one of the cheapest models in the Gemini ecosystem.

At the same time, the model demonstrates a high generation rate. According to Google’s internal tests, the first response time can be up to 2.5 times faster than the previous version of Gemini 2.5 Flash, and text generation speed has increased by about 45%.

Tulsee Doshi, senior director of product management for the Gemini team, said the new approach to model architecture allows for a combination of performance and low cost.

“Gemini 3.1 Flash-Lite is our fastest and most cost-effective model variant. It is designed for large-scale developer tasks where speed, low latency and cost optimization are important,” Doshi said.

According to Google, the model is focused on a wide range of tasks: from automatic translation and content moderation to interface generation and processing of large data streams.

In Arena.ai’s rating, the new model scored a 1,432 Elo score, putting it on par with many previous-generation commercial models, Dataconomy claims.

Experts believe that the launch of Gemini 3.1 Flash-Lite reflects a new trend in the artificial intelligence market – a shift from maximally powerful models to cheaper and more scalable systems that can be used in millions of applications and services.