AI Models
Guide
A quick-reference to the major AI models, who makes them, and what they do best.
Updated April 10, 2026
Quick Reference
| Model | Company | Best For | Key Differentiator |
| GPT-5.x | OpenAI | General purpose | Dynamic routing picks the right sub-model per request |
| Claude 4.6 | Anthropic | Coding & reasoning | Top human-preference scores; agentic autonomy |
| Gemini 3.x | Google DeepMind | Multimodal | Best benchmark breadth; strong pricing; Workspace integration |
| Grok 4.x | xAI | Real-time info | Live X/Twitter data; multi-agent parallel reasoning |
| Llama 4 | Meta | Open-source | 10M token context; fully self-hostable |
| DeepSeek V4 | DeepSeek | Cost efficiency | Launched Mar 2026; 1T MoE, 1M context, $0.30/MTok input; open-source; Engram conditional memory |
| Mistral 3 Family | Mistral | EU compliance | Large 3, Small 4 (MoE), and Voxtral TTS — enterprise-safe with data sovereignty |
| Qwen 3.6 | Alibaba | Multilingual | 119 languages; 1M context (Qwen3.6-Plus); native audio/video/text (Qwen3.5-Omni) |
| Microsoft MAI | Microsoft | Speech & media AI | MAI-Transcribe-1, MAI-Voice-1, MAI-Image-2 — Microsoft's own foundation stack on Foundry |
| Gemma 4 | Google | On-device open-weight | 31B ranks #3 on Arena AI (1452 Elo); E2B/E4B optimized for Android; Apache 2.0 |
| Kimi K2.5 | Moonshot AI | Agentic open-source | 1T params; Agent Swarm coordinates up to 100 sub-agents |
| GLM-5 | Zhipu AI | Open frontier | 744B MoE; MIT license; trained on zero NVIDIA GPUs |
| Sonar | Perplexity | Search & research | Search-grounded answers at 1200 tok/s; built on Llama 3.3 + Cerebras |
| Composer 2 | Cursor | AI-native coding | Frontier-level coding via RL on long-horizon tasks; 73.7 SWE-bench Multilingual |
Frontier Language Models
GPT-5.x Series
OpenAI · San Francisco The versatile all-rounder with dynamic internal routing.
- GPT-5.4 is the current flagship with a 1M token context window
- Uses an internal router to select the right sub-model per request in real time
- GPT-5.4 mini and nano offer GPT-5-class performance at lower cost; mini comes within 5% of flagship on coding benchmarks
- GPT-5.4 Thinking (Mar 5, 2026): reasoning-first variant integrated into ChatGPT — strongest at math, spreadsheets, research, and document tasks; comparable to a dedicated o-series model
- GPT-5.3-Codex: dedicated agentic coding variant combining Codex and GPT-5 training stacks
- Native computer use and tool calling for agentic automation
- Strong at documentation, unit tests, and complex SQL queries
Claude 4.6 Family
Anthropic · San Francisco The developer favorite for coding, reasoning, and safety.
- Opus 4.6 (flagship), Sonnet 4.6 (best value), Haiku 4.5 (fast/cheap)
- Leads human-preference leaderboards; strong ARC-AGI-2 scores
- Agentic capabilities: autonomous multi-step coding tasks
- Known for safety, steerability, and high-quality long-form writing
- Claude Mythos Preview (Apr 7, 2026): released as gated preview via Project Glasswing — limited to ~50 partner orgs; 93.9% SWE-bench Verified, 97.6% USAMO 2026; not publicly available due to cybersecurity risks
Gemini 3.x
Google DeepMind · Mountain View Multimodal powerhouse with top benchmark breadth.
- Gemini 3.1 Pro leads the Artificial Analysis Intelligence Index
- Native multimodal: processes text, images, audio, and video natively
- 1M token context window; deep Google Workspace integration
- Gemini 3 Flash: current default model in the Gemini app — strong balance of speed and capability
- Gemini 3.1 Flash-Lite (Mar 3): $0.25/1M input, 2.5× faster than prior Flash — cost-efficient tier for high-volume workloads
- Gemini 3.1 Flash Live: audio/voice model with 90+ language support and lower latency (March 26)
Real-time data meets raw reasoning power.
- Grok 4.20 uses native multi-agent architecture with four specialist agents that debate before responding
- Grok 4.20 Beta 2 (March 3): improved instruction following and hallucination reduction
- Real-time integration with X (Twitter) for current events
- Scored 100% on AIME 2025 math competition (Heavy variant)
- Grok 5 in training — reported ~6T parameter MoE model training on xAI's Colossus 2 supercluster; Q2 2026 target
Sonar (Perplexity)
Perplexity AI · San Francisco Search-native AI built for grounded, cited answers.
- Built on Llama 3.3 70B, further trained for search-grounded factuality
- Runs at 1,200 tokens/sec on Cerebras inference hardware
- Model family: Sonar, Sonar Pro, Reasoning Pro, Deep Research
- Matches GPT-4o on user satisfaction benchmarks
Composer 2 (Cursor)
Cursor · San Francisco Frontier-level coding model trained via RL on long-horizon software engineering tasks.
- 73.7 on SWE-bench Multilingual — top scores across CursorBench and Terminal-Bench 2.0
- Trained with RL to solve complex tasks requiring hundreds of sequential actions
- Fast variant is now default: frontier-level intelligence at lower cost than competing fast models
- Background agents run tasks autonomously while you work
Microsoft MAI
Microsoft · Redmond Microsoft's own foundation model stack — independent of OpenAI, built for speech, voice, and vision.
- MAI-Transcribe-1: speech-to-text across 25 languages; outperforms Whisper-large-v3 on accuracy
- MAI-Voice-1: generates 60s of audio in 1s; supports voice cloning
- MAI-Image-2: high-quality image generation
- All available on Microsoft Foundry — signals Microsoft building foundational AI independent of OpenAI (Apr 2, 2026)
Open Source & Cost-Efficient
Llama 4
Meta · Menlo Park The leading open-source model family.
- Llama 4 Scout: industry-leading 10M token context window
- Llama 4 Maverick: 17B active / 128 experts — outperforms GPT-4o and Gemini 2.0 Flash on key benchmarks
- Fully open weights; can be self-hosted for complete data control
- Llama 4 Behemoth (288B active) still in training; widely covered as next milestone
DeepSeek V4
DeepSeek · Hangzhou, China Cost-redefining open-source model with 1T parameters and conditional memory.
- Launched early March 2026: 1T MoE parameters, 1M context window, $0.30/MTok input pricing
- Engram conditional memory and Manifold-Constrained Hyper-Connections for improved long-context performance
- 81% on SWE-bench Verified; native multimodal: text, image, and video
- DeepSeek V3.2 (open-source): bridges V3 and V4 — optimized for reasoning + agentic long-context and tool-use workloads
- Entire V3/V4 lineage trained for under $6M — redefining AI cost efficiency
Mistral 3 Family
Mistral AI · Paris, France The enterprise-safe European model family, now spanning text, edge, and speech.
- Mistral Large 3: EU AI Act-compliant flagship for regulated industries (finance, healthcare, gov)
- Mistral Small 4 (March 17): MoE architecture, 119B total / 6B active, 128 experts — fast and efficient
- Voxtral TTS (March 26): open-source 4B text-to-speech, 9 languages, runs on consumer hardware
- Strong European data sovereignty guarantees across the full model family
Kimi K2.5
Moonshot AI · Beijing, China Open-source agentic powerhouse with Agent Swarm.
- 1T total parameters (32B active), MoE architecture with native vision-language
- Agent Swarm coordinates up to 100 specialized sub-agents in parallel
- Kimi Code CLI agent rivals Claude Code and Gemini CLI
- Backed by Alibaba and HongShan; strong global traction
GLM-5
Zhipu AI · Beijing, China Frontier-class model on a MIT license.
- 744B parameter MoE model (44B active) with 200K context window
- Released under MIT license; trained entirely on Huawei Ascend chips
- 77.8% on SWE-bench Verified; 50.4% on Humanity's Last Exam
- Priced roughly 6x cheaper than comparable proprietary models
Gemma 4
Google · Mountain View Open-weight models from Gemini 3 research — optimized for on-device and frontier-class performance.
- Four Apache 2.0 models: E2B (2.3B), E4B (4.5B), 26B MoE (4B active), 31B dense (Apr 2, 2026)
- 31B ranks #3 on Arena AI leaderboard at 1452 Elo — outperforms models 20× its size
- E2B/E4B optimized for on-device Android: up to 4× faster and 60% less battery than prior Gemma
- All models natively multimodal; larger variants support 256K context
Qwen 3.6
Alibaba Cloud · Hangzhou, China The multilingual giant — now split into multimodal and agentic branches.
- Qwen3.5-Omni (Feb 2026): native audio/video/text multimodal — Thinker architecture, 256K context, 113-language speech recognition
- Qwen3.6-Plus (Apr 2, 2026): 1M context; matches Claude Opus 4.5 on SWE-bench and Terminal-Bench 2.0
- 0.6B to 235B open-weight; Qwen3-Max (1T+) API-only; supports 119 languages
- Qwen3-Coder achieves 69.6% on SWE-Bench Verified, surpassing many frontier models
Image & Video Generation
Video generation leaders: Google Veo 3.1 (native 4K + vertical video), Kling 3.0 (native 4K/60fps), Runway Gen-4.5 (creative/cinematic), and Seedance 2.0 (ByteDance — notable for Identity Lock, which maintains consistent faces across multi-scene video). Sora 2 (OpenAI) remains available via ChatGPT, though the standalone app shut down in March 2026.
When to Use What
Building Software
- Build a full-stack app from scratch Claude 4.6
- Debug a complex codebase Claude 4.6
- Generate unit tests and docs GPT-5.x
- Rapid UI prototyping GPT-5.x
- Background agents for parallel development Composer 2 (Cursor)
- Open-source agentic coding Kimi K2.5
Research & Analysis
- Analyze a long PDF or contract Gemini 3.x
- Summarize a YouTube video Gemini 3.x
- Get real-time data on a trending topic Grok 4.x
- Get sourced answers with citations Sonar (Perplexity)
- Deep competitive research Claude 4.6
Creative & Visual
- Create stylized hero images Midjourney v7
- Generate photorealistic product shots Imagen 4
- Edit and remix existing images Nano Banana 2
- Generate a short video from a prompt Veo 3
Data & Math
- Solve complex math problems step-by-step Grok 4.x
- Write and optimize SQL queries GPT-5.x
- Transparent chain-of-thought reasoning DeepSeek R1
- Analyze spreadsheet data Gemini 3.x
Self-Hosting & Privacy
- Run a model on your own infrastructure Llama 4
- Fine-tune for a domain-specific task Llama 4
- Deploy in EU-regulated environments Mistral Large
- Budget-friendly open-source alternative DeepSeek V4
- MIT-licensed frontier alternative GLM-5
Writing & Communication
- Write long-form technical content Claude 4.6
- Draft emails and business writing GPT-5.x
- Translate content across 100+ languages Qwen 3.6
- Summarize meeting transcripts Gemini 3.x
There is no single "best" model in 2026. The landscape has shifted from a winner-take-all race to specialized excellence. Match the model to the task.
Sourced directly from company websites and documentation. Updated weekly.