AI Models
Guide
A quick-reference to the major AI models, who makes them, and what they do best.
Updated March 2026
Quick Reference
| Model | Company | Best For | Key Differentiator |
| GPT-5.x | OpenAI | General purpose | Dynamic routing picks the right sub-model per request |
| Claude 4.6 | Anthropic | Coding & reasoning | Top human-preference scores; agentic autonomy |
| Gemini 3.x | Google DeepMind | Multimodal | Best benchmark breadth; strong pricing; Workspace integration |
| Grok 4.x | xAI | Real-time info | Live X/Twitter data; multi-agent parallel reasoning |
| Llama 4 | Meta | Open-source | 10M token context; fully self-hostable |
| DeepSeek V3/R1 | DeepSeek | Cost efficiency | Frontier performance trained for under $6M; open source |
| Mistral Large | Mistral | EU compliance | Enterprise-safe; strong open-weight models for regulated industries |
| Qwen 3 | Alibaba | Multilingual | 119 languages; models from 0.6B to 1T+ parameters |
| Kimi K2.5 | Moonshot AI | Agentic open-source | 1T params; Agent Swarm coordinates up to 100 sub-agents |
| GLM-5 | Zhipu AI | Open frontier | 744B MoE; MIT license; trained on zero NVIDIA GPUs |
| Sonar | Perplexity | Search & research | Search-grounded answers at 1200 tok/s; built on Llama 3.3 + Cerebras |
| Composer | Cursor | AI-native coding | MoE model trained via RL for software engineering; background agents |
Frontier Language Models
GPT-5.x Series
OpenAI · San Francisco The versatile all-rounder with dynamic internal routing.
- GPT-5.4 is the current flagship with a 1M token context window
- Uses an internal router to select the right sub-model per request in real time
- Native computer use and tool calling for agentic automation
- Strong at documentation, unit tests, and complex SQL queries
Claude 4.6 Family
Anthropic · San Francisco The developer favorite for coding, reasoning, and safety.
- Opus 4.6 (flagship), Sonnet 4.6 (best value), Haiku 4.5 (fast/cheap)
- Leads human-preference leaderboards; strong ARC-AGI-2 scores
- Agentic capabilities: autonomous multi-step coding tasks
- Known for safety, steerability, and high-quality long-form writing
Gemini 3.x
Google DeepMind · Mountain View Multimodal powerhouse with top benchmark breadth.
- Gemini 3.1 Pro leads the Artificial Analysis Intelligence Index
- Native multimodal: processes text, images, audio, and video natively
- 1M token context window; deep Google Workspace integration
- Strong value at $2/$12 per million input/output tokens
Real-time data meets raw reasoning power.
- Grok 4.20 uses native multi-agent architecture with four specialist agents that debate before responding
- Real-time integration with X (Twitter) for current events
- Scored 100% on AIME 2025 math competition (Heavy variant)
- Less filtered personality; positioned as an alternative to corporate AI
Sonar (Perplexity)
Perplexity AI · San Francisco Search-native AI built for grounded, cited answers.
- Built on Llama 3.3 70B, further trained for search-grounded factuality
- Runs at 1,200 tokens/sec on Cerebras inference hardware
- Model family: Sonar, Sonar Pro, Reasoning Pro, Deep Research
- Matches GPT-4o on user satisfaction benchmarks
Composer 1.5 (Cursor)
Cursor · San Francisco AI-native coding model built for software engineering.
- Mixture-of-experts model trained via RL in real development environments
- 4x faster generation than comparable frontier coding models
- Background agents run tasks autonomously while you work
- Cursor Automations trigger agents from GitHub PRs, Slack, Linear, PagerDuty
Open Source & Cost-Efficient
Llama 4
Meta · Menlo Park The leading open-source model family.
- Llama 4 Scout: industry-leading 10M token context window
- Fully open weights; can be self-hosted for complete data control
- Performance competitive with paid frontier models
- Requires powerful hardware for full-scale deployment
DeepSeek V3.2 / R1
DeepSeek · Hangzhou, China Frontier performance at a fraction of the cost.
- 671B parameter MoE model famously trained for under $6M
- R1 variant excels at transparent, step-by-step reasoning
- V3.2-Speciale matches GPT-5 level performance on key benchmarks
- Open-source with competitive benchmark scores; V4 multimodal imminent
Mistral Large 3
Mistral AI · Paris, France The enterprise-safe European choice.
- Strong open-weight models designed for EU AI Act compliance
- Default choice for regulated industries (finance, healthcare, gov)
- Good multilingual support, especially European languages
- Balances capability with privacy and sovereignty requirements
Kimi K2.5
Moonshot AI · Beijing, China Open-source agentic powerhouse with Agent Swarm.
- 1T total parameters (32B active), MoE architecture with native vision-language
- Agent Swarm coordinates up to 100 specialized sub-agents in parallel
- Kimi Code CLI agent rivals Claude Code and Gemini CLI
- Backed by Alibaba and HongShan; strong global traction
GLM-5
Zhipu AI · Beijing, China Frontier-class model on a MIT license.
- 744B parameter MoE model (44B active) with 200K context window
- Released under MIT license; trained entirely on Huawei Ascend chips
- 77.8% on SWE-bench Verified; 50.4% on Humanity's Last Exam
- Priced roughly 6x cheaper than comparable proprietary models
Qwen 3
Alibaba Cloud · Hangzhou, China The multilingual giant.
- Supports 119 languages with hybrid Mixture-of-Experts architecture
- 0.6B to 235B open-weight; Qwen3-Max (1T+) API-only; Qwen3-Coder (480B) for code
- Competitive with DeepSeek-R1 and OpenAI o1 on reasoning benchmarks
- Qwen3-Coder achieves 69.6% on SWE-Bench Verified, surpassing many frontier models
Image & Video Generation
Video generation leaders: Google Veo 3.1 (native 4K + vertical video), Sora 2 (OpenAI, up to 25s + Disney partnership), Kling 3.0 (native 4K/60fps), and Runway Gen-4 (creative/cinematic).
When to Use What
Building Software
- Build a full-stack app from scratch Claude 4.6
- Debug a complex codebase Claude 4.6
- Generate unit tests and docs GPT-5.x
- Rapid UI prototyping GPT-5.x
- Background agents for parallel development Composer 1.5 (Cursor)
- Open-source agentic coding Kimi K2.5
Research & Analysis
- Analyze a long PDF or contract Gemini 3.x
- Summarize a YouTube video Gemini 3.x
- Get real-time data on a trending topic Grok 4.x
- Get sourced answers with citations Sonar (Perplexity)
- Deep competitive research Claude 4.6
Creative & Visual
- Create stylized hero images Midjourney v7
- Generate photorealistic product shots Imagen 4
- Edit and remix existing images Nano Banana 2
- Generate a short video from a prompt Veo 3 / Sora
Data & Math
- Solve complex math problems step-by-step Grok 4.x
- Write and optimize SQL queries GPT-5.x
- Transparent chain-of-thought reasoning DeepSeek R1
- Analyze spreadsheet data Gemini 3.x
Self-Hosting & Privacy
- Run a model on your own infrastructure Llama 4
- Fine-tune for a domain-specific task Llama 4
- Deploy in EU-regulated environments Mistral Large
- Budget-friendly open-source alternative DeepSeek V3
- MIT-licensed frontier alternative GLM-5
Writing & Communication
- Write long-form technical content Claude 4.6
- Draft emails and business writing GPT-5.x
- Translate content across 100+ languages Qwen 3
- Summarize meeting transcripts Gemini 3.x
There is no single "best" model in 2026. The landscape has shifted from a winner-take-all race to specialized excellence. Match the model to the task.
Sourced directly from company websites and documentation. Updated weekly.