AI Models
Guide
A quick-reference to the major AI models, who makes them, and what they do best.
Updated April 24, 2026
Quick Reference
| Model | Company | Best For | Key Differentiator |
| GPT-5.x | OpenAI | General purpose | GPT-5.5 flagship (Apr 23, 2026) — $5/1M input, $30/M output; stronger coding, computer use, and research; dynamic routing across sub-models |
| Claude Opus 4.7 | Anthropic | Coding & reasoning | Opus 4.7 (Apr 15, 2026) — latest GA model; 1M context, step-change agentic coding; Sonnet 4.6 best for speed/cost |
| Gemini 3.x | Google DeepMind | Multimodal | Best benchmark breadth; strong pricing; Workspace integration; Gemini 3.1 Pro flagship |
| Grok 4.x | xAI | Real-time info | Grok 4.20 flagship (Mar 2026): 2M context, three variants; live X/Twitter data; multi-agent parallel reasoning |
| Llama 4 | Meta | Open-source | 10M token context; fully self-hostable; Behemoth still in training/unreleased |
| DeepSeek V4 | DeepSeek | Cost efficiency | Released April 24, 2026 — V4-Pro (1.6T MoE, $1.74/1M input) and V4-Flash (284B, $0.14/1M input); Apache 2.0, 1M context |
| Mistral 3 Family | Mistral | EU compliance | Large 3, Magistral (reasoning), Devstral (open-source coding agent), Small 4, Voxtral — enterprise-safe with data sovereignty |
| Qwen 3.6 | Alibaba | Multilingual | Qwen3.5 open-weight (397B); Qwen3.6-Plus closed-source agentic (1M context, Apr 2026); 119 languages |
| Microsoft MAI | Microsoft | Speech & media AI | MAI-Transcribe-1, MAI-Voice-1, MAI-Image-2 — Microsoft's own foundation stack on Foundry |
| Gemma 4 | Google | On-device open-weight | 31B ranks #3 on Arena AI (1452 Elo); E2B/E4B optimized for Android; Apache 2.0 |
| Kimi K2.6 | Moonshot AI | Agentic open-source | Released Apr 20 — 1T MoE, 32B active; Agent Swarm scales to 300 sub-agents; 58.6 SWE-Bench Pro (top open model); Modified MIT |
| GLM-5.1 | Zhipu AI | Open frontier | Updated Apr 6, 2026; 744B MoE; MIT license; GPQA 0.9; trained on zero NVIDIA GPUs |
| GPT-5.4-Cyber | OpenAI | Defensive cybersecurity | TAC-gated fine-tune; lowered security refusals; binary reverse engineering; partners: CrowdStrike, Cloudflare, Palo Alto, Cisco |
| Claude Mythos Preview | Anthropic | Security research | Project Glasswing invite-only; GPQA 0.9; ~12 enterprise partners; $2,500/1M tokens to gate access |
| Sonar | Perplexity | Search & research | Search-grounded answers at 1200 tok/s; built on Llama 3.3 + Cerebras |
| Composer 2 | Cursor | AI-native coding | Frontier-level coding via RL on long-horizon tasks; 73.7 SWE-bench Multilingual |
Frontier Language Models
GPT-5.x Series
OpenAI · San Francisco The versatile all-rounder with dynamic internal routing.
- GPT-5.5 is the current flagship (Apr 23, 2026) — $5/1M input, $30/1M output; 1M token context; improved coding, computer use, and multi-step research
- GPT-5.5 rolling out to Plus, Pro, Business, and Enterprise via ChatGPT and Codex
- Uses an internal router to select the right sub-model per request in real time
- GPT-5.4 Thinking (Mar 5, 2026): reasoning-first variant — strongest at math, spreadsheets, research, and document tasks
- GPT-5.3-Codex (Feb 5, 2026): dedicated agentic coding variant combining Codex and GPT-5 training stacks
- GPT-5.3 Instant: fast chat variant for low-latency applications
- Native computer use and tool calling for agentic automation
Claude 4.x Family
Anthropic · San Francisco The developer favorite for coding, reasoning, and safety.
- Opus 4.7 (Apr 15, 2026): latest GA model — 1M context window, $5/1M input, $25/1M output; step-change agentic coding, substantially improved vision resolution
- Sonnet 4.6: best speed/cost balance for most production workloads
- Haiku 4.5: fast and cheap for high-volume tasks
- Leads human-preference leaderboards; strong ARC-AGI-2 scores
- Agentic capabilities: autonomous multi-step coding and long-horizon tasks
- Claude Mythos Preview (Apr 6–7, 2026): invite-only cybersecurity research model via Project Glasswing — limited to ~12 enterprise partners; not publicly available
Gemini 3.x
Google DeepMind · Mountain View Multimodal powerhouse with top benchmark breadth.
- Gemini 3.1 Pro leads the Artificial Analysis Intelligence Index
- Native multimodal: processes text, images, audio, and video natively
- 1M token context window; deep Google Workspace integration
- Gemini 3 Flash: current default model in the Gemini app — strong balance of speed and capability
- Gemini 3.1 Flash-Lite (Mar 3): $0.25/1M input, 2.5× faster than prior Flash — cost-efficient tier for high-volume workloads
- Gemini 3.1 Flash Live: audio/voice model with 90+ language support and lower latency (March 26)
- Gemini 3.1 Flash TTS: native text-to-speech model with audio generation capabilities
Real-time data meets raw reasoning power.
- Grok 4.20 uses native multi-agent architecture with four specialist agents that debate before responding
- Grok 4.20 Beta 2 (March 3): improved instruction following and hallucination reduction
- Real-time integration with X (Twitter) for current events
- Grok 4 Heavy (extended inference variant) scored 100% on AIME 2025 — not the base Grok 4.20 model; attribution is widely misreported
- Grok 5 in training — reported ~6T parameter MoE model training on xAI's Colossus 2 supercluster; Q2 2026 target
Sonar (Perplexity)
Perplexity AI · San Francisco Search-native AI built for grounded, cited answers.
- Built on Llama 3.3 70B, further trained for search-grounded factuality
- Runs at 1,200 tokens/sec on Cerebras inference hardware
- Model family: Sonar, Sonar Pro, Reasoning Pro, Deep Research
- Matches GPT-4o on user satisfaction benchmarks
Composer 2 (Cursor)
Cursor · San Francisco Frontier-level coding model trained via RL on long-horizon software engineering tasks.
- 73.7 on SWE-bench Multilingual — top scores across CursorBench and Terminal-Bench 2.0
- Trained with RL to solve complex tasks requiring hundreds of sequential actions
- Fast variant is now default: frontier-level intelligence at lower cost than competing fast models
- Background agents run tasks autonomously while you work
Microsoft MAI
Microsoft · Redmond Microsoft's own foundation model stack — independent of OpenAI, built for speech, voice, and vision.
- MAI-Transcribe-1: speech-to-text across 25 languages; outperforms Whisper-large-v3 on accuracy
- MAI-Voice-1: generates 60s of audio in 1s; supports voice cloning
- MAI-Image-2: high-quality image generation
- All available on Microsoft Foundry — signals Microsoft building foundational AI independent of OpenAI (Apr 2, 2026)
Restricted & Specialized
The biggest story of April 2026: Anthropic and OpenAI each released a cybersecurity-focused model within days of each other — gated, expensive, and limited to enterprise partners. These are not general-purpose models.
GPT-5.4-Cyber
OpenAI · San Francisco Fine-tuned GPT-5.4 for enterprise defensive cybersecurity — gated behind OpenAI's Trusted Access for Cyber program.
- Released April 14, 2026 — fine-tune of GPT-5.4 for dual-use security research
- Lowered refusal thresholds for defensive cybersecurity tasks; native binary reverse engineering without source code
- Gated behind OpenAI's Trusted Access for Cyber (TAC) program; enterprise partners include CrowdStrike, Cloudflare, Palo Alto Networks, Cisco, JPMorgan, Goldman Sachs
- No public API pricing; $10M in API credits committed via Cybersecurity Grant Program
- Direct counterpart to Anthropic's Claude Mythos Preview — part of a matched pair of restricted cyber-focused models released within days of each other
Claude Mythos Preview
Anthropic · San Francisco Invite-only cybersecurity research model via Project Glasswing — Anthropic's counterpart to GPT-5.4-Cyber.
- Released April 6–7, 2026 via Project Glasswing — not publicly available
- GPQA 0.9; 93.9% SWE-bench Verified; 97.6% USAMO 2026
- Limited to ~12 enterprise partners: AWS, Apple, Cisco, CrowdStrike, Google, JPMorgan, Microsoft, NVIDIA, and others
- Priced at ~$2,500/1M tokens to gate general use and prevent misuse
- Cybersecurity-focused capabilities not available in standard Claude models
Open Source & Cost-Efficient
Llama 4
Meta · Menlo Park The leading open-source model family.
- Llama 4 Scout: industry-leading 10M token context window
- Llama 4 Maverick: 17B active / 128 experts — outperforms GPT-4o and Gemini 2.0 Flash on key benchmarks
- Fully open weights; can be self-hosted for complete data control
- Llama 4 Behemoth (288B active) still in training and unreleased as of April 2026; widely covered as next milestone
DeepSeek V4
DeepSeek · Hangzhou, China Cost-redefining open-source frontier — released April 24, 2026 in two variants.
- V4-Pro: 1.6T total parameters / 49B active MoE; $1.74/1M input, $3.48/1M output — cheapest frontier-class open model
- V4-Flash: 284B total / 13B active; $0.14/1M input, $0.28/1M output — ultra-budget tier
- Apache 2.0 license; native 1M context window; built-in agentic long-context and tool-use
- Engram conditional memory and Manifold-Constrained Hyper-Connections for improved long-context performance
- Entire V3/V4 lineage trained for under $6M — redefining AI cost efficiency
Mistral 3 Family
Mistral AI · Paris, France The enterprise-safe European model family, now spanning text, reasoning, code, and speech.
- Mistral Large 3: EU AI Act-compliant flagship for regulated industries (finance, healthcare, gov); 675B total MoE
- Magistral: Mistral's reasoning model — multilingual, transparent chain-of-thought
- Mistral Small 4 (March 16): 119B/6.5B-active MoE unifying reasoning (Magistral), vision (Pixtral), and coding (Devstral) in one endpoint
- Devstral 2: 72.2% SWE-bench Verified — top open agentic coding model; 123B params, 256K context, MIT license; competitive with Composer 2
- Voxtral (March 26): open-source 4B text-to-speech, 9 languages, runs on consumer hardware
- Strong European data sovereignty guarantees across the full model family
Kimi K2.6
Moonshot AI · Beijing, China Open-source agentic leader — top SWE-Bench Pro score and the largest open agent swarm.
- Released April 20, 2026 — 1T parameter MoE, 32B active, 262.1K context window
- Agent Swarm scales to 300 specialized sub-agents with up to 4,000 coordinated steps (was 100)
- 58.6 on SWE-Bench Pro — edges GPT-5.4's 57.7; top open model on this benchmark
- Open-weight, Modified MIT license; weights on Hugging Face
- Kimi Code CLI agent rivals Claude Code and Gemini CLI
GLM-5.1
Zhipu AI · Beijing, China Frontier-class model on a MIT license.
- Updated Apr 6, 2026 — improved from GLM-5 baseline
- 744B parameter MoE model (44B active) with 200K context window
- Released under MIT license; trained entirely on Huawei Ascend chips (zero NVIDIA GPUs)
- GPQA 0.9; 77.8% on SWE-bench Verified; 50.4% on Humanity's Last Exam
- Priced roughly 6x cheaper than comparable proprietary models
NVIDIA Nemotron 3
NVIDIA · Santa Clara NVIDIA's open agentic reasoning stack — Nano, Super, and Ultra sizes on Bedrock.
- Released March 2026; available on Amazon Bedrock and NVIDIA NIM
- Three sizes: Nano (edge/on-device), Super (balanced), Ultra (frontier-class agentic reasoning)
- Nemotron 3 Super peers with Llama 4 Maverick on open-model benchmarks
- Leading open model for agentic reasoning and multi-step tool use
- Optimized for NVIDIA hardware; available for self-hosting via NIM microservices
Gemma 4
Google · Mountain View Open-weight models from Gemini 3 research — optimized for on-device and frontier-class performance.
- Four Apache 2.0 models: E2B (2.3B), E4B (4.5B), 26B MoE (4B active), 31B dense (Apr 2, 2026)
- 31B ranks #3 on Arena AI leaderboard at 1452 Elo — outperforms models 20× its size
- E2B/E4B optimized for on-device Android: up to 4× faster and 60% less battery than prior Gemma
- All models natively multimodal; larger variants support 256K context
Qwen 3.6
Alibaba Cloud · Hangzhou, China The multilingual giant — open-weight Qwen3.5 and closed-source agentic Qwen3.6 Plus.
- Qwen3.5 (Feb 2026): open-weight, 397B parameters — available for self-hosting and fine-tuning
- Qwen3.5-Omni: native audio/video/text multimodal — Thinker architecture, 256K context, 113-language speech recognition
- Qwen3.6-Plus (Apr 2, 2026): closed-source API-only, 1M context; agentic — matches Claude Opus 4.5 on SWE-bench and Terminal-Bench 2.0
- 0.6B to 235B open-weight range; Qwen3-Max (1T+) API-only; supports 119 languages
- Qwen3-Coder achieves 69.6% on SWE-Bench Verified, surpassing many frontier models
Image & Video Generation
Video generation leaders: Google Veo 3.1 (native 4K + vertical video), Kling 3.0 (native 4K/60fps), Runway Gen-4.5 (creative/cinematic), and Seedance 2.0 (ByteDance — notable for Identity Lock, which maintains consistent faces across multi-scene video). Sora 2 (OpenAI) remains available via ChatGPT, though the standalone app shut down in March 2026.
When to Use What
Building Software
- Build a full-stack app from scratch Claude 4.6
- Debug a complex codebase Claude 4.6
- Generate unit tests and docs GPT-5.x
- Rapid UI prototyping GPT-5.x
- Background agents for parallel development Composer 2 (Cursor)
- Open-source agentic coding Kimi K2.6
Research & Analysis
- Analyze a long PDF or contract Gemini 3.x
- Summarize a YouTube video Gemini 3.x
- Get real-time data on a trending topic Grok 4.x
- Get sourced answers with citations Sonar (Perplexity)
- Deep competitive research Claude 4.6
Creative & Visual
- Create stylized hero images Midjourney v7
- Generate photorealistic product shots Imagen 4
- Edit and remix existing images Nano Banana 2
- Generate a short video from a prompt Veo 3.1
Data & Math
- Solve complex math problems step-by-step Grok 4.x
- Write and optimize SQL queries GPT-5.x
- Transparent chain-of-thought reasoning DeepSeek R1
- Analyze spreadsheet data Gemini 3.x
Self-Hosting & Privacy
- Run a model on your own infrastructure Llama 4
- Fine-tune for a domain-specific task Llama 4
- Deploy in EU-regulated environments Mistral Large
- Budget-friendly open-source alternative DeepSeek V4
- MIT-licensed frontier alternative GLM-5
Writing & Communication
- Write long-form technical content Claude 4.6
- Draft emails and business writing GPT-5.x
- Translate content across 100+ languages Qwen 3.6
- Summarize meeting transcripts Gemini 3.x
There is no single "best" model in 2026. The landscape has shifted from a winner-take-all race to specialized excellence. Match the model to the task.
Sourced directly from company websites and documentation. Updated weekly.