DeepSeek has unveiled V3.2-exp, an experimental model designed to significantly reduce costs when dealing with long contexts. The announcement was made on Hugging Face along with a research post on GitHub. The key innovation is the Deepseek Sparse Attention system, which combines the Lightning indexer modules for selecting important fragments and the fine-grained Token selection system for selecting individual tokens within those fragments. This allows processing long sequences without significant load on the server.
Initial tests have shown that in long-context tasks the cost of an API request can be almost halved. While additional tests are required for a full evaluation, open scales and free access on hugging Face accelerate independent verification of performance.
The development of V3.2-exp was part of DeepSeek’s strategy to reduce the operational costs of smart models by focusing on optimising the architecture rather than training neural networks. The previous R1 model showed that the company could compete with US developers on costs, although it didn’t resonate much in the market.
The new technology is unlikely to create the same impact as R1, but it could be an important lesson for the global industry, helping AI service providers reduce the cost of running models without sacrificing performance.