New: End of Coding, Age of Building

AI Models
Guide

A quick-reference to the major AI models, who makes them, and what they do best.

Updated April 10, 2026
Model Company Best For Key Differentiator
GPT-5.x OpenAI General purpose Dynamic routing picks the right sub-model per request
Claude 4.6 Anthropic Coding & reasoning Top human-preference scores; agentic autonomy
Gemini 3.x Google DeepMind Multimodal Best benchmark breadth; strong pricing; Workspace integration
Grok 4.x xAI Real-time info Live X/Twitter data; multi-agent parallel reasoning
Llama 4 Meta Open-source 10M token context; fully self-hostable
DeepSeek V4 DeepSeek Cost efficiency Launched Mar 2026; 1T MoE, 1M context, $0.30/MTok input; open-source; Engram conditional memory
Mistral 3 Family Mistral EU compliance Large 3, Small 4 (MoE), and Voxtral TTS — enterprise-safe with data sovereignty
Qwen 3.6 Alibaba Multilingual 119 languages; 1M context (Qwen3.6-Plus); native audio/video/text (Qwen3.5-Omni)
Microsoft MAI Microsoft Speech & media AI MAI-Transcribe-1, MAI-Voice-1, MAI-Image-2 — Microsoft's own foundation stack on Foundry
Gemma 4 Google On-device open-weight 31B ranks #3 on Arena AI (1452 Elo); E2B/E4B optimized for Android; Apache 2.0
Kimi K2.5 Moonshot AI Agentic open-source 1T params; Agent Swarm coordinates up to 100 sub-agents
GLM-5 Zhipu AI Open frontier 744B MoE; MIT license; trained on zero NVIDIA GPUs
Sonar Perplexity Search & research Search-grounded answers at 1200 tok/s; built on Llama 3.3 + Cerebras
Composer 2 Cursor AI-native coding Frontier-level coding via RL on long-horizon tasks; 73.7 SWE-bench Multilingual

GPT-5.x Series

OpenAI · San Francisco

The versatile all-rounder with dynamic internal routing.

  • GPT-5.4 is the current flagship with a 1M token context window
  • Uses an internal router to select the right sub-model per request in real time
  • GPT-5.4 mini and nano offer GPT-5-class performance at lower cost; mini comes within 5% of flagship on coding benchmarks
  • GPT-5.4 Thinking (Mar 5, 2026): reasoning-first variant integrated into ChatGPT — strongest at math, spreadsheets, research, and document tasks; comparable to a dedicated o-series model
  • GPT-5.3-Codex: dedicated agentic coding variant combining Codex and GPT-5 training stacks
  • Native computer use and tool calling for agentic automation
  • Strong at documentation, unit tests, and complex SQL queries

Claude 4.6 Family

Anthropic · San Francisco

The developer favorite for coding, reasoning, and safety.

  • Opus 4.6 (flagship), Sonnet 4.6 (best value), Haiku 4.5 (fast/cheap)
  • Leads human-preference leaderboards; strong ARC-AGI-2 scores
  • Agentic capabilities: autonomous multi-step coding tasks
  • Known for safety, steerability, and high-quality long-form writing
  • Claude Mythos Preview (Apr 7, 2026): released as gated preview via Project Glasswing — limited to ~50 partner orgs; 93.9% SWE-bench Verified, 97.6% USAMO 2026; not publicly available due to cybersecurity risks

Gemini 3.x

Google DeepMind · Mountain View

Multimodal powerhouse with top benchmark breadth.

  • Gemini 3.1 Pro leads the Artificial Analysis Intelligence Index
  • Native multimodal: processes text, images, audio, and video natively
  • 1M token context window; deep Google Workspace integration
  • Gemini 3 Flash: current default model in the Gemini app — strong balance of speed and capability
  • Gemini 3.1 Flash-Lite (Mar 3): $0.25/1M input, 2.5× faster than prior Flash — cost-efficient tier for high-volume workloads
  • Gemini 3.1 Flash Live: audio/voice model with 90+ language support and lower latency (March 26)

Grok 4.x

xAI · Austin

Real-time data meets raw reasoning power.

  • Grok 4.20 uses native multi-agent architecture with four specialist agents that debate before responding
  • Grok 4.20 Beta 2 (March 3): improved instruction following and hallucination reduction
  • Real-time integration with X (Twitter) for current events
  • Scored 100% on AIME 2025 math competition (Heavy variant)
  • Grok 5 in training — reported ~6T parameter MoE model training on xAI's Colossus 2 supercluster; Q2 2026 target

Sonar (Perplexity)

Perplexity AI · San Francisco

Search-native AI built for grounded, cited answers.

  • Built on Llama 3.3 70B, further trained for search-grounded factuality
  • Runs at 1,200 tokens/sec on Cerebras inference hardware
  • Model family: Sonar, Sonar Pro, Reasoning Pro, Deep Research
  • Matches GPT-4o on user satisfaction benchmarks

Composer 2 (Cursor)

Cursor · San Francisco

Frontier-level coding model trained via RL on long-horizon software engineering tasks.

  • 73.7 on SWE-bench Multilingual — top scores across CursorBench and Terminal-Bench 2.0
  • Trained with RL to solve complex tasks requiring hundreds of sequential actions
  • Fast variant is now default: frontier-level intelligence at lower cost than competing fast models
  • Background agents run tasks autonomously while you work

Microsoft MAI

Microsoft · Redmond

Microsoft's own foundation model stack — independent of OpenAI, built for speech, voice, and vision.

  • MAI-Transcribe-1: speech-to-text across 25 languages; outperforms Whisper-large-v3 on accuracy
  • MAI-Voice-1: generates 60s of audio in 1s; supports voice cloning
  • MAI-Image-2: high-quality image generation
  • All available on Microsoft Foundry — signals Microsoft building foundational AI independent of OpenAI (Apr 2, 2026)

Llama 4

Meta · Menlo Park

The leading open-source model family.

  • Llama 4 Scout: industry-leading 10M token context window
  • Llama 4 Maverick: 17B active / 128 experts — outperforms GPT-4o and Gemini 2.0 Flash on key benchmarks
  • Fully open weights; can be self-hosted for complete data control
  • Llama 4 Behemoth (288B active) still in training; widely covered as next milestone

DeepSeek V4

DeepSeek · Hangzhou, China

Cost-redefining open-source model with 1T parameters and conditional memory.

  • Launched early March 2026: 1T MoE parameters, 1M context window, $0.30/MTok input pricing
  • Engram conditional memory and Manifold-Constrained Hyper-Connections for improved long-context performance
  • 81% on SWE-bench Verified; native multimodal: text, image, and video
  • DeepSeek V3.2 (open-source): bridges V3 and V4 — optimized for reasoning + agentic long-context and tool-use workloads
  • Entire V3/V4 lineage trained for under $6M — redefining AI cost efficiency

Mistral 3 Family

Mistral AI · Paris, France

The enterprise-safe European model family, now spanning text, edge, and speech.

  • Mistral Large 3: EU AI Act-compliant flagship for regulated industries (finance, healthcare, gov)
  • Mistral Small 4 (March 17): MoE architecture, 119B total / 6B active, 128 experts — fast and efficient
  • Voxtral TTS (March 26): open-source 4B text-to-speech, 9 languages, runs on consumer hardware
  • Strong European data sovereignty guarantees across the full model family

Kimi K2.5

Moonshot AI · Beijing, China

Open-source agentic powerhouse with Agent Swarm.

  • 1T total parameters (32B active), MoE architecture with native vision-language
  • Agent Swarm coordinates up to 100 specialized sub-agents in parallel
  • Kimi Code CLI agent rivals Claude Code and Gemini CLI
  • Backed by Alibaba and HongShan; strong global traction

GLM-5

Zhipu AI · Beijing, China

Frontier-class model on a MIT license.

  • 744B parameter MoE model (44B active) with 200K context window
  • Released under MIT license; trained entirely on Huawei Ascend chips
  • 77.8% on SWE-bench Verified; 50.4% on Humanity's Last Exam
  • Priced roughly 6x cheaper than comparable proprietary models

Gemma 4

Google · Mountain View

Open-weight models from Gemini 3 research — optimized for on-device and frontier-class performance.

  • Four Apache 2.0 models: E2B (2.3B), E4B (4.5B), 26B MoE (4B active), 31B dense (Apr 2, 2026)
  • 31B ranks #3 on Arena AI leaderboard at 1452 Elo — outperforms models 20× its size
  • E2B/E4B optimized for on-device Android: up to 4× faster and 60% less battery than prior Gemma
  • All models natively multimodal; larger variants support 256K context

Qwen 3.6

Alibaba Cloud · Hangzhou, China

The multilingual giant — now split into multimodal and agentic branches.

  • Qwen3.5-Omni (Feb 2026): native audio/video/text multimodal — Thinker architecture, 256K context, 113-language speech recognition
  • Qwen3.6-Plus (Apr 2, 2026): 1M context; matches Claude Opus 4.5 on SWE-bench and Terminal-Bench 2.0
  • 0.6B to 235B open-weight; Qwen3-Max (1T+) API-only; supports 119 languages
  • Qwen3-Coder achieves 69.6% on SWE-Bench Verified, surpassing many frontier models

Midjourney v7

Midjourney

Artistic, stylized visuals with strong aesthetic control

Midjourney V1 Video

Midjourney

First video model from MJ — 5s clips extendable to 20s; ~25× cheaper than competitors

Imagen 4

Google

Photorealistic composition, spelling, and typography accuracy

Nano Banana 2

Google

Fast AI image editing, remixing, and style transfers; built on Gemini Flash

DALL-E 4

OpenAI

Integrated with ChatGPT; strong prompt adherence

Stable Diffusion 3.5

Stability AI

Open-source; self-hostable; highly customizable

FLUX.2

Black Forest Labs

From the Stable Diffusion creators; up to 4MP; open-weight Klein variant

LTX-2.3

Lightricks

Open-weights video+audio in one pass; 22B params; 4K at 50 FPS, up to 20s; one of the most capable open video models available

Video generation leaders: Google Veo 3.1 (native 4K + vertical video), Kling 3.0 (native 4K/60fps), Runway Gen-4.5 (creative/cinematic), and Seedance 2.0 (ByteDance — notable for Identity Lock, which maintains consistent faces across multi-scene video). Sora 2 (OpenAI) remains available via ChatGPT, though the standalone app shut down in March 2026.

Building Software

  • Build a full-stack app from scratch Claude 4.6
  • Debug a complex codebase Claude 4.6
  • Generate unit tests and docs GPT-5.x
  • Rapid UI prototyping GPT-5.x
  • Background agents for parallel development Composer 2 (Cursor)
  • Open-source agentic coding Kimi K2.5

Research & Analysis

  • Analyze a long PDF or contract Gemini 3.x
  • Summarize a YouTube video Gemini 3.x
  • Get real-time data on a trending topic Grok 4.x
  • Get sourced answers with citations Sonar (Perplexity)
  • Deep competitive research Claude 4.6

Creative & Visual

  • Create stylized hero images Midjourney v7
  • Generate photorealistic product shots Imagen 4
  • Edit and remix existing images Nano Banana 2
  • Generate a short video from a prompt Veo 3

Data & Math

  • Solve complex math problems step-by-step Grok 4.x
  • Write and optimize SQL queries GPT-5.x
  • Transparent chain-of-thought reasoning DeepSeek R1
  • Analyze spreadsheet data Gemini 3.x

Self-Hosting & Privacy

  • Run a model on your own infrastructure Llama 4
  • Fine-tune for a domain-specific task Llama 4
  • Deploy in EU-regulated environments Mistral Large
  • Budget-friendly open-source alternative DeepSeek V4
  • MIT-licensed frontier alternative GLM-5

Writing & Communication

  • Write long-form technical content Claude 4.6
  • Draft emails and business writing GPT-5.x
  • Translate content across 100+ languages Qwen 3.6
  • Summarize meeting transcripts Gemini 3.x

There is no single "best" model in 2026. The landscape has shifted from a winner-take-all race to specialized excellence. Match the model to the task.

Sourced directly from company websites and documentation. Updated weekly.

Developer Writing Assistant

ESC