New: Fix Your Agent Harness

AI Models
Guide

A quick-reference to the major AI models, who makes them, and what they do best.

Updated April 24, 2026
Model Company Best For Key Differentiator
GPT-5.x OpenAI General purpose GPT-5.5 flagship (Apr 23, 2026) — $5/1M input, $30/M output; stronger coding, computer use, and research; dynamic routing across sub-models
Claude Opus 4.7 Anthropic Coding & reasoning Opus 4.7 (Apr 15, 2026) — latest GA model; 1M context, step-change agentic coding; Sonnet 4.6 best for speed/cost
Gemini 3.x Google DeepMind Multimodal Best benchmark breadth; strong pricing; Workspace integration; Gemini 3.1 Pro flagship
Grok 4.x xAI Real-time info Grok 4.20 flagship (Mar 2026): 2M context, three variants; live X/Twitter data; multi-agent parallel reasoning
Llama 4 Meta Open-source 10M token context; fully self-hostable; Behemoth still in training/unreleased
DeepSeek V4 DeepSeek Cost efficiency Released April 24, 2026 — V4-Pro (1.6T MoE, $1.74/1M input) and V4-Flash (284B, $0.14/1M input); Apache 2.0, 1M context
Mistral 3 Family Mistral EU compliance Large 3, Magistral (reasoning), Devstral (open-source coding agent), Small 4, Voxtral — enterprise-safe with data sovereignty
Qwen 3.6 Alibaba Multilingual Qwen3.5 open-weight (397B); Qwen3.6-Plus closed-source agentic (1M context, Apr 2026); 119 languages
Microsoft MAI Microsoft Speech & media AI MAI-Transcribe-1, MAI-Voice-1, MAI-Image-2 — Microsoft's own foundation stack on Foundry
Gemma 4 Google On-device open-weight 31B ranks #3 on Arena AI (1452 Elo); E2B/E4B optimized for Android; Apache 2.0
Kimi K2.6 Moonshot AI Agentic open-source Released Apr 20 — 1T MoE, 32B active; Agent Swarm scales to 300 sub-agents; 58.6 SWE-Bench Pro (top open model); Modified MIT
GLM-5.1 Zhipu AI Open frontier Updated Apr 6, 2026; 744B MoE; MIT license; GPQA 0.9; trained on zero NVIDIA GPUs
GPT-5.4-Cyber OpenAI Defensive cybersecurity TAC-gated fine-tune; lowered security refusals; binary reverse engineering; partners: CrowdStrike, Cloudflare, Palo Alto, Cisco
Claude Mythos Preview Anthropic Security research Project Glasswing invite-only; GPQA 0.9; ~12 enterprise partners; $2,500/1M tokens to gate access
Sonar Perplexity Search & research Search-grounded answers at 1200 tok/s; built on Llama 3.3 + Cerebras
Composer 2 Cursor AI-native coding Frontier-level coding via RL on long-horizon tasks; 73.7 SWE-bench Multilingual

GPT-5.x Series

OpenAI · San Francisco

The versatile all-rounder with dynamic internal routing.

  • GPT-5.5 is the current flagship (Apr 23, 2026) — $5/1M input, $30/1M output; 1M token context; improved coding, computer use, and multi-step research
  • GPT-5.5 rolling out to Plus, Pro, Business, and Enterprise via ChatGPT and Codex
  • Uses an internal router to select the right sub-model per request in real time
  • GPT-5.4 Thinking (Mar 5, 2026): reasoning-first variant — strongest at math, spreadsheets, research, and document tasks
  • GPT-5.3-Codex (Feb 5, 2026): dedicated agentic coding variant combining Codex and GPT-5 training stacks
  • GPT-5.3 Instant: fast chat variant for low-latency applications
  • Native computer use and tool calling for agentic automation

Claude 4.x Family

Anthropic · San Francisco

The developer favorite for coding, reasoning, and safety.

  • Opus 4.7 (Apr 15, 2026): latest GA model — 1M context window, $5/1M input, $25/1M output; step-change agentic coding, substantially improved vision resolution
  • Sonnet 4.6: best speed/cost balance for most production workloads
  • Haiku 4.5: fast and cheap for high-volume tasks
  • Leads human-preference leaderboards; strong ARC-AGI-2 scores
  • Agentic capabilities: autonomous multi-step coding and long-horizon tasks
  • Claude Mythos Preview (Apr 6–7, 2026): invite-only cybersecurity research model via Project Glasswing — limited to ~12 enterprise partners; not publicly available

Gemini 3.x

Google DeepMind · Mountain View

Multimodal powerhouse with top benchmark breadth.

  • Gemini 3.1 Pro leads the Artificial Analysis Intelligence Index
  • Native multimodal: processes text, images, audio, and video natively
  • 1M token context window; deep Google Workspace integration
  • Gemini 3 Flash: current default model in the Gemini app — strong balance of speed and capability
  • Gemini 3.1 Flash-Lite (Mar 3): $0.25/1M input, 2.5× faster than prior Flash — cost-efficient tier for high-volume workloads
  • Gemini 3.1 Flash Live: audio/voice model with 90+ language support and lower latency (March 26)
  • Gemini 3.1 Flash TTS: native text-to-speech model with audio generation capabilities

Grok 4.x

xAI · Austin

Real-time data meets raw reasoning power.

  • Grok 4.20 uses native multi-agent architecture with four specialist agents that debate before responding
  • Grok 4.20 Beta 2 (March 3): improved instruction following and hallucination reduction
  • Real-time integration with X (Twitter) for current events
  • Grok 4 Heavy (extended inference variant) scored 100% on AIME 2025 — not the base Grok 4.20 model; attribution is widely misreported
  • Grok 5 in training — reported ~6T parameter MoE model training on xAI's Colossus 2 supercluster; Q2 2026 target

Sonar (Perplexity)

Perplexity AI · San Francisco

Search-native AI built for grounded, cited answers.

  • Built on Llama 3.3 70B, further trained for search-grounded factuality
  • Runs at 1,200 tokens/sec on Cerebras inference hardware
  • Model family: Sonar, Sonar Pro, Reasoning Pro, Deep Research
  • Matches GPT-4o on user satisfaction benchmarks

Composer 2 (Cursor)

Cursor · San Francisco

Frontier-level coding model trained via RL on long-horizon software engineering tasks.

  • 73.7 on SWE-bench Multilingual — top scores across CursorBench and Terminal-Bench 2.0
  • Trained with RL to solve complex tasks requiring hundreds of sequential actions
  • Fast variant is now default: frontier-level intelligence at lower cost than competing fast models
  • Background agents run tasks autonomously while you work

Microsoft MAI

Microsoft · Redmond

Microsoft's own foundation model stack — independent of OpenAI, built for speech, voice, and vision.

  • MAI-Transcribe-1: speech-to-text across 25 languages; outperforms Whisper-large-v3 on accuracy
  • MAI-Voice-1: generates 60s of audio in 1s; supports voice cloning
  • MAI-Image-2: high-quality image generation
  • All available on Microsoft Foundry — signals Microsoft building foundational AI independent of OpenAI (Apr 2, 2026)

The biggest story of April 2026: Anthropic and OpenAI each released a cybersecurity-focused model within days of each other — gated, expensive, and limited to enterprise partners. These are not general-purpose models.

GPT-5.4-Cyber

OpenAI · San Francisco

Fine-tuned GPT-5.4 for enterprise defensive cybersecurity — gated behind OpenAI's Trusted Access for Cyber program.

  • Released April 14, 2026 — fine-tune of GPT-5.4 for dual-use security research
  • Lowered refusal thresholds for defensive cybersecurity tasks; native binary reverse engineering without source code
  • Gated behind OpenAI's Trusted Access for Cyber (TAC) program; enterprise partners include CrowdStrike, Cloudflare, Palo Alto Networks, Cisco, JPMorgan, Goldman Sachs
  • No public API pricing; $10M in API credits committed via Cybersecurity Grant Program
  • Direct counterpart to Anthropic's Claude Mythos Preview — part of a matched pair of restricted cyber-focused models released within days of each other

Claude Mythos Preview

Anthropic · San Francisco

Invite-only cybersecurity research model via Project Glasswing — Anthropic's counterpart to GPT-5.4-Cyber.

  • Released April 6–7, 2026 via Project Glasswing — not publicly available
  • GPQA 0.9; 93.9% SWE-bench Verified; 97.6% USAMO 2026
  • Limited to ~12 enterprise partners: AWS, Apple, Cisco, CrowdStrike, Google, JPMorgan, Microsoft, NVIDIA, and others
  • Priced at ~$2,500/1M tokens to gate general use and prevent misuse
  • Cybersecurity-focused capabilities not available in standard Claude models

Llama 4

Meta · Menlo Park

The leading open-source model family.

  • Llama 4 Scout: industry-leading 10M token context window
  • Llama 4 Maverick: 17B active / 128 experts — outperforms GPT-4o and Gemini 2.0 Flash on key benchmarks
  • Fully open weights; can be self-hosted for complete data control
  • Llama 4 Behemoth (288B active) still in training and unreleased as of April 2026; widely covered as next milestone

DeepSeek V4

DeepSeek · Hangzhou, China

Cost-redefining open-source frontier — released April 24, 2026 in two variants.

  • V4-Pro: 1.6T total parameters / 49B active MoE; $1.74/1M input, $3.48/1M output — cheapest frontier-class open model
  • V4-Flash: 284B total / 13B active; $0.14/1M input, $0.28/1M output — ultra-budget tier
  • Apache 2.0 license; native 1M context window; built-in agentic long-context and tool-use
  • Engram conditional memory and Manifold-Constrained Hyper-Connections for improved long-context performance
  • Entire V3/V4 lineage trained for under $6M — redefining AI cost efficiency

Mistral 3 Family

Mistral AI · Paris, France

The enterprise-safe European model family, now spanning text, reasoning, code, and speech.

  • Mistral Large 3: EU AI Act-compliant flagship for regulated industries (finance, healthcare, gov); 675B total MoE
  • Magistral: Mistral's reasoning model — multilingual, transparent chain-of-thought
  • Mistral Small 4 (March 16): 119B/6.5B-active MoE unifying reasoning (Magistral), vision (Pixtral), and coding (Devstral) in one endpoint
  • Devstral 2: 72.2% SWE-bench Verified — top open agentic coding model; 123B params, 256K context, MIT license; competitive with Composer 2
  • Voxtral (March 26): open-source 4B text-to-speech, 9 languages, runs on consumer hardware
  • Strong European data sovereignty guarantees across the full model family

Kimi K2.6

Moonshot AI · Beijing, China

Open-source agentic leader — top SWE-Bench Pro score and the largest open agent swarm.

  • Released April 20, 2026 — 1T parameter MoE, 32B active, 262.1K context window
  • Agent Swarm scales to 300 specialized sub-agents with up to 4,000 coordinated steps (was 100)
  • 58.6 on SWE-Bench Pro — edges GPT-5.4's 57.7; top open model on this benchmark
  • Open-weight, Modified MIT license; weights on Hugging Face
  • Kimi Code CLI agent rivals Claude Code and Gemini CLI

GLM-5.1

Zhipu AI · Beijing, China

Frontier-class model on a MIT license.

  • Updated Apr 6, 2026 — improved from GLM-5 baseline
  • 744B parameter MoE model (44B active) with 200K context window
  • Released under MIT license; trained entirely on Huawei Ascend chips (zero NVIDIA GPUs)
  • GPQA 0.9; 77.8% on SWE-bench Verified; 50.4% on Humanity's Last Exam
  • Priced roughly 6x cheaper than comparable proprietary models

NVIDIA Nemotron 3

NVIDIA · Santa Clara

NVIDIA's open agentic reasoning stack — Nano, Super, and Ultra sizes on Bedrock.

  • Released March 2026; available on Amazon Bedrock and NVIDIA NIM
  • Three sizes: Nano (edge/on-device), Super (balanced), Ultra (frontier-class agentic reasoning)
  • Nemotron 3 Super peers with Llama 4 Maverick on open-model benchmarks
  • Leading open model for agentic reasoning and multi-step tool use
  • Optimized for NVIDIA hardware; available for self-hosting via NIM microservices

Gemma 4

Google · Mountain View

Open-weight models from Gemini 3 research — optimized for on-device and frontier-class performance.

  • Four Apache 2.0 models: E2B (2.3B), E4B (4.5B), 26B MoE (4B active), 31B dense (Apr 2, 2026)
  • 31B ranks #3 on Arena AI leaderboard at 1452 Elo — outperforms models 20× its size
  • E2B/E4B optimized for on-device Android: up to 4× faster and 60% less battery than prior Gemma
  • All models natively multimodal; larger variants support 256K context

Qwen 3.6

Alibaba Cloud · Hangzhou, China

The multilingual giant — open-weight Qwen3.5 and closed-source agentic Qwen3.6 Plus.

  • Qwen3.5 (Feb 2026): open-weight, 397B parameters — available for self-hosting and fine-tuning
  • Qwen3.5-Omni: native audio/video/text multimodal — Thinker architecture, 256K context, 113-language speech recognition
  • Qwen3.6-Plus (Apr 2, 2026): closed-source API-only, 1M context; agentic — matches Claude Opus 4.5 on SWE-bench and Terminal-Bench 2.0
  • 0.6B to 235B open-weight range; Qwen3-Max (1T+) API-only; supports 119 languages
  • Qwen3-Coder achieves 69.6% on SWE-Bench Verified, surpassing many frontier models

Midjourney v7

Midjourney

Artistic, stylized visuals with strong aesthetic control

Midjourney V1 Video

Midjourney

First video model from MJ — 5s clips extendable to 20s; ~25× cheaper than competitors

Imagen 4

Google

Photorealistic composition, spelling, and typography accuracy

Nano Banana 2

Google

Fast AI image editing, remixing, and style transfers; built on Gemini Flash

DALL-E 4

OpenAI

Integrated with ChatGPT; strong prompt adherence

Stable Diffusion 3.5

Stability AI

Open-source; self-hostable; highly customizable

FLUX.2

Black Forest Labs

From the Stable Diffusion creators; up to 4MP; open-weight Klein variant

LTX-2.3

Lightricks

Open-weights video+audio in one pass; 22B params; 4K at 50 FPS, up to 20s; one of the most capable open video models available

Video generation leaders: Google Veo 3.1 (native 4K + vertical video), Kling 3.0 (native 4K/60fps), Runway Gen-4.5 (creative/cinematic), and Seedance 2.0 (ByteDance — notable for Identity Lock, which maintains consistent faces across multi-scene video). Sora 2 (OpenAI) remains available via ChatGPT, though the standalone app shut down in March 2026.

Building Software

  • Build a full-stack app from scratch Claude 4.6
  • Debug a complex codebase Claude 4.6
  • Generate unit tests and docs GPT-5.x
  • Rapid UI prototyping GPT-5.x
  • Background agents for parallel development Composer 2 (Cursor)
  • Open-source agentic coding Kimi K2.6

Research & Analysis

  • Analyze a long PDF or contract Gemini 3.x
  • Summarize a YouTube video Gemini 3.x
  • Get real-time data on a trending topic Grok 4.x
  • Get sourced answers with citations Sonar (Perplexity)
  • Deep competitive research Claude 4.6

Creative & Visual

  • Create stylized hero images Midjourney v7
  • Generate photorealistic product shots Imagen 4
  • Edit and remix existing images Nano Banana 2
  • Generate a short video from a prompt Veo 3.1

Data & Math

  • Solve complex math problems step-by-step Grok 4.x
  • Write and optimize SQL queries GPT-5.x
  • Transparent chain-of-thought reasoning DeepSeek R1
  • Analyze spreadsheet data Gemini 3.x

Self-Hosting & Privacy

  • Run a model on your own infrastructure Llama 4
  • Fine-tune for a domain-specific task Llama 4
  • Deploy in EU-regulated environments Mistral Large
  • Budget-friendly open-source alternative DeepSeek V4
  • MIT-licensed frontier alternative GLM-5

Writing & Communication

  • Write long-form technical content Claude 4.6
  • Draft emails and business writing GPT-5.x
  • Translate content across 100+ languages Qwen 3.6
  • Summarize meeting transcripts Gemini 3.x

There is no single "best" model in 2026. The landscape has shifted from a winner-take-all race to specialized excellence. Match the model to the task.

Sourced directly from company websites and documentation. Updated weekly.

Developer Writing Assistant

ESC