AI Models
Guide

A quick-reference to the major AI models, who makes them, and what they do best.

Updated April 24, 2026

Quick Reference

Model	Company	Best For	Key Differentiator
GPT-5.x	OpenAI	General purpose	GPT-5.5 flagship (Apr 23, 2026) — $5/1M input, $30/M output; stronger coding, computer use, and research; dynamic routing across sub-models
Claude Opus 4.7	Anthropic	Coding & reasoning	Opus 4.7 (Apr 15, 2026) — latest GA model; 1M context, step-change agentic coding; Sonnet 4.6 best for speed/cost
Gemini 3.x	Google DeepMind	Multimodal	Best benchmark breadth; strong pricing; Workspace integration; Gemini 3.1 Pro flagship
Grok 4.x	xAI	Real-time info	Grok 4.20 flagship (Mar 2026): 2M context, three variants; live X/Twitter data; multi-agent parallel reasoning
Llama 4	Meta	Open-source	10M token context; fully self-hostable; Behemoth still in training/unreleased
DeepSeek V4	DeepSeek	Cost efficiency	Released April 24, 2026 — V4-Pro (1.6T MoE, $1.74/1M input) and V4-Flash (284B, $0.14/1M input); Apache 2.0, 1M context
Mistral 3 Family	Mistral	EU compliance	Large 3, Magistral (reasoning), Devstral (open-source coding agent), Small 4, Voxtral — enterprise-safe with data sovereignty
Qwen 3.6	Alibaba	Multilingual	Qwen3.5 open-weight (397B); Qwen3.6-Plus closed-source agentic (1M context, Apr 2026); 119 languages
Microsoft MAI	Microsoft	Speech & media AI	MAI-Transcribe-1, MAI-Voice-1, MAI-Image-2 — Microsoft's own foundation stack on Foundry
Gemma 4	Google	On-device open-weight	31B ranks #3 on Arena AI (1452 Elo); E2B/E4B optimized for Android; Apache 2.0
Kimi K2.6	Moonshot AI	Agentic open-source	Released Apr 20 — 1T MoE, 32B active; Agent Swarm scales to 300 sub-agents; 58.6 SWE-Bench Pro (top open model); Modified MIT
GLM-5.1	Zhipu AI	Open frontier	Updated Apr 6, 2026; 744B MoE; MIT license; GPQA 0.9; trained on zero NVIDIA GPUs
GPT-5.4-Cyber	OpenAI	Defensive cybersecurity	TAC-gated fine-tune; lowered security refusals; binary reverse engineering; partners: CrowdStrike, Cloudflare, Palo Alto, Cisco
Claude Mythos Preview	Anthropic	Security research	Project Glasswing invite-only; GPQA 0.9; ~12 enterprise partners; $2,500/1M tokens to gate access
Sonar	Perplexity	Search & research	Search-grounded answers at 1200 tok/s; built on Llama 3.3 + Cerebras
Composer 2	Cursor	AI-native coding	Frontier-level coding via RL on long-horizon tasks; 73.7 SWE-bench Multilingual

Frontier Language Models

GPT-5.x Series

OpenAI · San Francisco

The versatile all-rounder with dynamic internal routing.

GPT-5.5 is the current flagship (Apr 23, 2026) — $5/1M input, $30/1M output; 1M token context; improved coding, computer use, and multi-step research
GPT-5.5 rolling out to Plus, Pro, Business, and Enterprise via ChatGPT and Codex
Uses an internal router to select the right sub-model per request in real time
GPT-5.4 Thinking (Mar 5, 2026): reasoning-first variant — strongest at math, spreadsheets, research, and document tasks
GPT-5.3-Codex (Feb 5, 2026): dedicated agentic coding variant combining Codex and GPT-5 training stacks
GPT-5.3 Instant: fast chat variant for low-latency applications
Native computer use and tool calling for agentic automation

Best for: Rapid prototyping, general-purpose tasks, API automation

Claude 4.x Family

Anthropic · San Francisco

The developer favorite for coding, reasoning, and safety.

Opus 4.7 (Apr 15, 2026): latest GA model — 1M context window, $5/1M input, $25/1M output; step-change agentic coding, substantially improved vision resolution
Sonnet 4.6: best speed/cost balance for most production workloads
Haiku 4.5: fast and cheap for high-volume tasks
Leads human-preference leaderboards; strong ARC-AGI-2 scores
Agentic capabilities: autonomous multi-step coding and long-horizon tasks
Claude Mythos Preview (Apr 6–7, 2026): invite-only cybersecurity research model via Project Glasswing — limited to ~12 enterprise partners; not publicly available

Best for: Software development, agentic workflows, analysis, writing

Gemini 3.x

Google DeepMind · Mountain View

Multimodal powerhouse with top benchmark breadth.

Gemini 3.1 Pro leads the Artificial Analysis Intelligence Index
Native multimodal: processes text, images, audio, and video natively
1M token context window; deep Google Workspace integration
Gemini 3 Flash: current default model in the Gemini app — strong balance of speed and capability
Gemini 3.1 Flash-Lite (Mar 3): $0.25/1M input, 2.5× faster than prior Flash — cost-efficient tier for high-volume workloads
Gemini 3.1 Flash Live: audio/voice model with 90+ language support and lower latency (March 26)
Gemini 3.1 Flash TTS: native text-to-speech model with audio generation capabilities

Best for: Multimodal apps, Google Workspace users, cost-conscious teams

Grok 4.x

xAI · Austin

Real-time data meets raw reasoning power.

Grok 4.20 uses native multi-agent architecture with four specialist agents that debate before responding
Grok 4.20 Beta 2 (March 3): improved instruction following and hallucination reduction
Real-time integration with X (Twitter) for current events
Grok 4 Heavy (extended inference variant) scored 100% on AIME 2025 — not the base Grok 4.20 model; attribution is widely misreported
Grok 5 in training — reported ~6T parameter MoE model training on xAI's Colossus 2 supercluster; Q2 2026 target

Best for: Real-time info, math/science, users who want less guardrails

Sonar (Perplexity)

Perplexity AI · San Francisco

Search-native AI built for grounded, cited answers.

Built on Llama 3.3 70B, further trained for search-grounded factuality
Runs at 1,200 tokens/sec on Cerebras inference hardware
Model family: Sonar, Sonar Pro, Reasoning Pro, Deep Research
Matches GPT-4o on user satisfaction benchmarks

Best for: Research with citations, competitive analysis, fact-checking

Composer 2 (Cursor)

Cursor · San Francisco

Frontier-level coding model trained via RL on long-horizon software engineering tasks.

73.7 on SWE-bench Multilingual — top scores across CursorBench and Terminal-Bench 2.0
Trained with RL to solve complex tasks requiring hundreds of sequential actions
Fast variant is now default: frontier-level intelligence at lower cost than competing fast models
Background agents run tasks autonomously while you work

Best for: Agentic coding, background development, automated workflows

Microsoft MAI

Microsoft · Redmond

Microsoft's own foundation model stack — independent of OpenAI, built for speech, voice, and vision.

MAI-Transcribe-1: speech-to-text across 25 languages; outperforms Whisper-large-v3 on accuracy
MAI-Voice-1: generates 60s of audio in 1s; supports voice cloning
MAI-Image-2: high-quality image generation
All available on Microsoft Foundry — signals Microsoft building foundational AI independent of OpenAI (Apr 2, 2026)

Best for: Speech transcription, voice synthesis, enterprise media AI on Azure

Restricted & Specialized

The biggest story of April 2026: Anthropic and OpenAI each released a cybersecurity-focused model within days of each other — gated, expensive, and limited to enterprise partners. These are not general-purpose models.

GPT-5.4-Cyber

OpenAI · San Francisco

Fine-tuned GPT-5.4 for enterprise defensive cybersecurity — gated behind OpenAI's Trusted Access for Cyber program.

Released April 14, 2026 — fine-tune of GPT-5.4 for dual-use security research
Lowered refusal thresholds for defensive cybersecurity tasks; native binary reverse engineering without source code
Gated behind OpenAI's Trusted Access for Cyber (TAC) program; enterprise partners include CrowdStrike, Cloudflare, Palo Alto Networks, Cisco, JPMorgan, Goldman Sachs
No public API pricing; $10M in API credits committed via Cybersecurity Grant Program
Direct counterpart to Anthropic's Claude Mythos Preview — part of a matched pair of restricted cyber-focused models released within days of each other

Best for: Enterprise defensive security, vulnerability research, threat analysis (TAC partners only)

Claude Mythos Preview

Anthropic · San Francisco

Invite-only cybersecurity research model via Project Glasswing — Anthropic's counterpart to GPT-5.4-Cyber.

Released April 6–7, 2026 via Project Glasswing — not publicly available
GPQA 0.9; 93.9% SWE-bench Verified; 97.6% USAMO 2026
Limited to ~12 enterprise partners: AWS, Apple, Cisco, CrowdStrike, Google, JPMorgan, Microsoft, NVIDIA, and others
Priced at ~$2,500/1M tokens to gate general use and prevent misuse
Cybersecurity-focused capabilities not available in standard Claude models

Best for: Invite-only cybersecurity research (Project Glasswing partners only)

Open Source & Cost-Efficient

Llama 4

Meta · Menlo Park

The leading open-source model family.

Llama 4 Scout: industry-leading 10M token context window
Llama 4 Maverick: 17B active / 128 experts — outperforms GPT-4o and Gemini 2.0 Flash on key benchmarks
Fully open weights; can be self-hosted for complete data control
Llama 4 Behemoth (288B active) still in training and unreleased as of April 2026; widely covered as next milestone

Best for: Self-hosting, privacy-sensitive orgs, custom fine-tuning

DeepSeek V4

DeepSeek · Hangzhou, China

Cost-redefining open-source frontier — released April 24, 2026 in two variants.

V4-Pro: 1.6T total parameters / 49B active MoE; $1.74/1M input, $3.48/1M output — cheapest frontier-class open model
V4-Flash: 284B total / 13B active; $0.14/1M input, $0.28/1M output — ultra-budget tier
Apache 2.0 license; native 1M context window; built-in agentic long-context and tool-use
Engram conditional memory and Manifold-Constrained Hyper-Connections for improved long-context performance
Entire V3/V4 lineage trained for under $6M — redefining AI cost efficiency

Best for: Budget-conscious teams, math/reasoning, open-source self-hosting

Mistral 3 Family

Mistral AI · Paris, France

The enterprise-safe European model family, now spanning text, reasoning, code, and speech.

Mistral Large 3: EU AI Act-compliant flagship for regulated industries (finance, healthcare, gov); 675B total MoE
Magistral: Mistral's reasoning model — multilingual, transparent chain-of-thought
Mistral Small 4 (March 16): 119B/6.5B-active MoE unifying reasoning (Magistral), vision (Pixtral), and coding (Devstral) in one endpoint
Devstral 2: 72.2% SWE-bench Verified — top open agentic coding model; 123B params, 256K context, MIT license; competitive with Composer 2
Voxtral (March 26): open-source 4B text-to-speech, 9 languages, runs on consumer hardware
Strong European data sovereignty guarantees across the full model family

Best for: EU-regulated enterprises, edge deployment, privacy-first apps

Kimi K2.6

Moonshot AI · Beijing, China

Open-source agentic leader — top SWE-Bench Pro score and the largest open agent swarm.

Released April 20, 2026 — 1T parameter MoE, 32B active, 262.1K context window
Agent Swarm scales to 300 specialized sub-agents with up to 4,000 coordinated steps (was 100)
58.6 on SWE-Bench Pro — edges GPT-5.4's 57.7; top open model on this benchmark
Open-weight, Modified MIT license; weights on Hugging Face
Kimi Code CLI agent rivals Claude Code and Gemini CLI

Best for: Agentic workflows, open-source self-hosting, long-horizon coding tasks

GLM-5.1

Zhipu AI · Beijing, China

Frontier-class model on a MIT license.

Updated Apr 6, 2026 — improved from GLM-5 baseline
744B parameter MoE model (44B active) with 200K context window
Released under MIT license; trained entirely on Huawei Ascend chips (zero NVIDIA GPUs)
GPQA 0.9; 77.8% on SWE-bench Verified; 50.4% on Humanity's Last Exam
Priced roughly 6x cheaper than comparable proprietary models

Best for: Budget-conscious teams, open-source deployment, coding

NVIDIA Nemotron 3

NVIDIA · Santa Clara

NVIDIA's open agentic reasoning stack — Nano, Super, and Ultra sizes on Bedrock.

Released March 2026; available on Amazon Bedrock and NVIDIA NIM
Three sizes: Nano (edge/on-device), Super (balanced), Ultra (frontier-class agentic reasoning)
Nemotron 3 Super peers with Llama 4 Maverick on open-model benchmarks
Leading open model for agentic reasoning and multi-step tool use
Optimized for NVIDIA hardware; available for self-hosting via NIM microservices

Best for: Agentic reasoning, AWS/Bedrock deployments, NVIDIA-native infrastructure

Gemma 4

Google · Mountain View

Open-weight models from Gemini 3 research — optimized for on-device and frontier-class performance.

Four Apache 2.0 models: E2B (2.3B), E4B (4.5B), 26B MoE (4B active), 31B dense (Apr 2, 2026)
31B ranks #3 on Arena AI leaderboard at 1452 Elo — outperforms models 20× its size
E2B/E4B optimized for on-device Android: up to 4× faster and 60% less battery than prior Gemma
All models natively multimodal; larger variants support 256K context

Best for: On-device Android apps, open-weight deployment, resource-constrained environments

Qwen 3.6

Alibaba Cloud · Hangzhou, China

The multilingual giant — open-weight Qwen3.5 and closed-source agentic Qwen3.6 Plus.

Qwen3.5 (Feb 2026): open-weight, 397B parameters — available for self-hosting and fine-tuning
Qwen3.5-Omni: native audio/video/text multimodal — Thinker architecture, 256K context, 113-language speech recognition
Qwen3.6-Plus (Apr 2, 2026): closed-source API-only, 1M context; agentic — matches Claude Opus 4.5 on SWE-bench and Terminal-Bench 2.0
0.6B to 235B open-weight range; Qwen3-Max (1T+) API-only; supports 119 languages
Qwen3-Coder achieves 69.6% on SWE-Bench Verified, surpassing many frontier models

Best for: International businesses, multilingual apps, agentic coding, flexible deployment

Image & Video Generation

Midjourney v7

Midjourney

Artistic, stylized visuals with strong aesthetic control

Midjourney V1 Video

Midjourney

First video model from MJ — 5s clips extendable to 20s; ~25× cheaper than competitors

Imagen 4

Google

Photorealistic composition, spelling, and typography accuracy

Nano Banana 2

Google

Fast AI image editing, remixing, and style transfers; built on Gemini Flash

DALL-E 4

OpenAI

Integrated with ChatGPT; strong prompt adherence

Stable Diffusion 3.5

Stability AI

Open-source; self-hostable; highly customizable

FLUX.2

Black Forest Labs

From the Stable Diffusion creators; up to 4MP; open-weight Klein variant

LTX-2.3

Lightricks

Open-weights video+audio in one pass; 22B params; 4K at 50 FPS, up to 20s; one of the most capable open video models available

Video generation leaders: Google Veo 3.1 (native 4K + vertical video), Kling 3.0 (native 4K/60fps), Runway Gen-4.5 (creative/cinematic), and Seedance 2.0 (ByteDance — notable for Identity Lock, which maintains consistent faces across multi-scene video). Sora 2 (OpenAI) remains available via ChatGPT, though the standalone app shut down in March 2026.

When to Use What

Building Software

Build a full-stack app from scratch Claude 4.6
Debug a complex codebase Claude 4.6
Generate unit tests and docs GPT-5.x
Rapid UI prototyping GPT-5.x
Background agents for parallel development Composer 2 (Cursor)
Open-source agentic coding Kimi K2.6

Research & Analysis

Analyze a long PDF or contract Gemini 3.x
Summarize a YouTube video Gemini 3.x
Get real-time data on a trending topic Grok 4.x
Get sourced answers with citations Sonar (Perplexity)
Deep competitive research Claude 4.6

Creative & Visual

Create stylized hero images Midjourney v7
Generate photorealistic product shots Imagen 4
Edit and remix existing images Nano Banana 2
Generate a short video from a prompt Veo 3.1

Data & Math

Solve complex math problems step-by-step Grok 4.x
Write and optimize SQL queries GPT-5.x
Transparent chain-of-thought reasoning DeepSeek R1
Analyze spreadsheet data Gemini 3.x

Self-Hosting & Privacy

Run a model on your own infrastructure Llama 4
Fine-tune for a domain-specific task Llama 4
Deploy in EU-regulated environments Mistral Large
Budget-friendly open-source alternative DeepSeek V4
MIT-licensed frontier alternative GLM-5

Writing & Communication

Write long-form technical content Claude 4.6
Draft emails and business writing GPT-5.x
Translate content across 100+ languages Qwen 3.6
Summarize meeting transcripts Gemini 3.x

There is no single "best" model in 2026. The landscape has shifted from a winner-take-all race to specialized excellence. Match the model to the task.

Sourced directly from company websites and documentation. Updated weekly.

AI ModelsGuide