Microsoft AI CEO Mustafa Suleyman announced on X the launch of the company’s first homegrown AI models: MAI-VOICE-1 and MAI-1-Preview. The debut is accompanied by Copilot Audio Expressions, an experimental feature in Copilot Labs that generates a lifelike voice directly from text.
Two Models to Power the Future of Copilot
MAI-VOICE-1
Microsoft describes this as a voice synthesis model built for expressiveness and speed. It can generate one minute of audio in under a second using just a single GPU.
The model is already being integrated into Copilot Daily and podcast-style formats. Users can also try out different voices, styles, and tones through the new Copilot Audio Expressions experiment inside Copilot Labs.
Mai-1-Preview
A large language model with a Mixture-of-Experts (MoE) architecture, trained across roughly 15,000 Nvidia H100 GPUs. Positioned as an end-user–focused model, it specializes in instruction following and answering everyday queries.
It’s already available for testing on LMSYS’s Arena (LMarena) and will be gradually integrated into Copilot alongside other models in the coming weeks.
Shift Toward Independence from ChatGPT
This move highlights Microsoft’s growing effort to reduce dependence on third-party AI providers like OpenAI while maintaining those partnerships. By building its own models, Microsoft gains:
- Shorter integration cycles for Copilot updates.
- Cost optimization across training and deployment.
- Stronger regulatory compliance by owning the full AI pipeline.
- Tighter control of the AI value chain.
The initiative follows significant investment in Microsoft AI, including high-profile hires led by Suleyman, co-founder of DeepMind and Inflection AI. These efforts are already surfacing in recent weeks with updates like semantic search in Copilot, the new “study and learning” mode, a redesigned Copilot app for Windows 11, and a Samsung partnership bringing Copilot to monitors and TVs.
How to Try It
To test MAI-VOICE-1 now:
- Go to Copilot Labs.
- Open Audio Expressions.
- Paste in text, choose a voice, style, and tone, and generate an audio clip (with the option to download).
As with all Copilot Labs experiments, the feature is a concept test—it may evolve over time or eventually be retired.