Our AI Models
22 AI video models, 4 image models, and specialist tools — text-to-video, image-to-video, portrait animation, 3D generation and more. Compare specifications and choose the right engine for your project.
WAN
14B Parameter Text-to-Video • Rapid Generation
WAN is a 14-billion parameter diffusion model by Alibaba, designed for rapid text-to-video generation. Using a two-pass KSamplerAdvanced pipeline with LoRA-enhanced distilled weights, it produces quality videos in just 4 inference steps — making it ideal for fast iteration and experimentation.
The model leverages the Hunyuan latent video architecture with dual ModelSamplingSD3 shift control for both high-noise and low-noise passes, giving fine-tuned control over the generation process while maintaining exceptional speed.
Technical Specifications
| Architecture | Two-Pass KSamplerAdvanced + LoRA Distillation |
| Parameters | 14 Billion |
| Default Resolution | 848 × 480 |
| Frame Rate | 16 fps |
| Default Steps | 4 (rapid) |
| Sampler | Euler |
| Scheduler | Simple |
| CFG Scale | 1.0 |
| Shift | 5.0 |
| Output Format | MP4 (via SaveVideo) |
| Checkpoint | wan2.2-t2v-rapid-aio-v10 |
Best For
Get results in seconds with just 4 steps
Quickly test different prompts and settings
Natural movement with two-pass sampling
Low step count enables high throughput
LTX 2.3
Distilled Two-Pass Pipeline • HD Output
LTX 2.3 is a sophisticated two-pass generation pipeline that first generates at a lower resolution and then upsamples to produce crisp 1280×720 HD video. The LTXVLatentUpsampler ensures sharp details and consistent motion across the upscaling process.
It uses Euler Ancestral CFG++ sampling with manual sigma scheduling for precise noise control, dual CFGGuider nodes for both passes, and LoRA-enhanced model weights for optimal quality.
Technical Specifications
| Architecture | Two-Pass with LTXVLatentUpsampler |
| Text Encoder | LTXAVTextEncoder (Dedicated) |
| Default Resolution | 1280 × 720 (HD) |
| Frame Rate | 25 fps |
| Default Steps | 8 per pass |
| Sampler | Euler Ancestral CFG++ |
| Scheduler | Manual Sigmas |
| CFG Scale | 1.0 (both passes) |
| Shift | 2.05 |
| Output Format | MP4 (via SaveVideo) |
| Enhancement | LoRA Model Weights |
Best For
Two-pass pipeline for crisp HD results
Publication-ready 1280×720 video
LoRA enhancement for fine details
Advanced sigma scheduling for smooth motion
LTX Quality
Two-Pass Pipeline • Up to 4K Output
LTX Quality is the premium configuration of the LTX 2.3 pipeline, generating at half resolution then performing a 2× spatial upscale for ultra-sharp output up to 4K (3840×2176). It uses LoRA-enhanced distilled weights and a dedicated LTXVLatentUpsampler for the refinement pass.
The two-pass approach with separate CFGGuider nodes, configurable sigma schedules, and optional tiled VAE decoding makes it ideal for high-resolution final production work.
Technical Specifications
| Architecture | Two-Pass with 2× LTXVLatentUpsampler |
| Text Encoder | LTXAVTextEncoder (Dedicated) |
| Default Resolution | 960 × 544 → 1920 × 1088 |
| Max Resolution | 1920 × 1088 → 3840 × 2176 (4K) |
| Frame Rate | 24 fps |
| Default Steps | 20 (first pass) |
| Sampler | Euler |
| Scheduler | LTXVScheduler (custom shift) |
| CFG Scale | 3.0 (first pass) / 1.0 (upscale) |
| Shift Range | 0.95 – 2.05 |
| Output Format | MP4 (via SaveVideo) |
| Enhancement | LoRA Distill + Spatial Upscaler |
Best For
Generate up to 3840×2176 with 2× upscale
LoRA + spatial upscaler preserves fine details
Configurable sigma schedules for smooth motion
Separate controls for both generation passes
Hunyuan Video
13B Parameter Text-to-Video • High Motion Quality
Hunyuan Video is a 13-billion parameter diffusion model by Tencent, built on a Dual CLIP text encoder architecture (CLIP-L + LLaVA-LLaMA3) for rich text understanding. It uses FluxGuidance for conditioning control and ModelSamplingSD3 for shift-based sampling.
The model generates at 720p resolution with tiled VAE decoding by default to manage VRAM, producing videos with exceptional motion quality and character consistency — ideal for anime-style and character-driven animation.
Technical Specifications
| Architecture | Hunyuan Video + SD3 Sampling + FluxGuidance |
| Parameters | 13 Billion |
| Text Encoder | Dual CLIP (CLIP-L + LLaVA-LLaMA3) |
| Default Resolution | 848 × 480 |
| Frame Rate | 24 fps |
| Default Steps | 20 |
| Sampler | Euler (SamplerCustomAdvanced) |
| Scheduler | Simple (BasicScheduler) |
| Guidance | 6.0 (FluxGuidance) |
| Shift | 7.0 |
| VAE Decode | Tiled (256 tile, 64 overlap) |
| Output Format | MP4 (via SaveVideo) |
Best For
Exceptional temporal coherence and fluid movement
Excels at character animation and anime styles
Dual CLIP encoder understands complex descriptions
Maintains character identity across frames
Hunyuan 1.5
Next-Gen Text-to-Video • Native 720p • Dual CLIP v2
Hunyuan 1.5 is the next generation of Tencent’s video diffusion model, featuring an upgraded Dual CLIP encoder (Qwen 2.5 VL 7B + Byt5 Glyph XL) for significantly improved text comprehension and prompt following. It generates natively at 720p (1280×720) without upscaling.
The model uses CFGGuider with SamplerCustomAdvanced for precise generation control, and includes an optional super-resolution path to upscale output to 1080p using a dedicated latent upsampler.
Technical Specifications
| Architecture | Hunyuan Video 1.5 + SD3 Sampling + CFGGuider |
| Text Encoder | Dual CLIP (Qwen 2.5 VL 7B + Byt5 Glyph XL) |
| Default Resolution | 1280 × 720 (HD) |
| Frame Rate | 24 fps |
| Default Steps | 20 |
| Sampler | Euler (SamplerCustomAdvanced) |
| Scheduler | Simple (BasicScheduler) |
| CFG Scale | 6.0 |
| Shift | 7.0 |
| VAE Decode | Standard (VAEDecode) |
| Optional Upscale | 1080p Super Resolution (disabled by default) |
| Output Format | MP4 (via SaveVideo) |
Best For
Qwen 2.5 VL encoder understands complex, detailed descriptions
Generates at 1280×720 without upscaling artefacts
Inherits Hunyuan’s exceptional temporal coherence
Built-in super-resolution upscale path when needed
More AI Engines
CogVideoX
Tsinghua University • Open-Source
CogVideoX is a high-quality open-source video generation model from Tsinghua University. It delivers strong text comprehension and artistic output quality with a straightforward single-pass pipeline.
The model excels at stylised and artistic video generation, producing visually striking results with excellent prompt adherence for creative and research use cases.
Technical Specifications
| Architecture | CogVideoX Transformer |
| Default Resolution | 720 × 480 |
| Frame Rate | 8 fps |
| Default Steps | 50 |
| CFG Scale | 6 |
| Frame Count | 49 frames (~6s) |
| Sampler | Euler |
| Output Format | MP4 |
Best For
Excels at stylised and creative video output
Strong understanding of complex prompts
Open-source model with academic backing
49 frames for ~6 seconds of output
AnimateDiff
Stable Diffusion Animation • Motion Module
AnimateDiff turns any Stable Diffusion checkpoint into a video animation engine by inserting a temporal motion module. This enables the vast SD ecosystem of models, LoRAs, and styles to produce animated output.
With fast 20-step generation and a lightweight architecture, AnimateDiff is ideal for quick iterations, anime-style content, and leveraging the massive library of community SD checkpoints.
Technical Specifications
| Architecture | SD 1.5 + Motion Module |
| Default Resolution | 512 × 512 |
| Frame Rate | 8 fps |
| Default Steps | 20 |
| CFG Scale | 7 |
| Frame Count | 32 frames (~4s) |
| Sampler | Euler |
| Output Format | MP4 |
Best For
20-step generation for rapid prototyping
Leverage SD checkpoints for any art style
Compatible with community LoRAs and models
Perfect for 4-second animated sequences
Image-to-Video Models
Reference Image Animation • Multi-Engine
Image-to-Video (I2V) models animate a reference image into motion video. Upload a still image and the AI generates natural movement, camera motion, and scene dynamics while preserving the original composition.
Available I2V Engines
| Hunyuan 1.5 I2V | 1280×720 • 24 fps • 121 frames | Best motion quality from a reference image |
| LTX 2.3 I2V | 1280×720 • 25 fps • 121 frames | Fast 8-step inference, great detail |
| SVD XT I2V | 1024×576 • 6 fps • 25 frames | Stability AI's Stable Video Diffusion |
| WAN 2.2 I2V | 832×480 • 16 fps • 81 frames | 14B parameter model with Lightx2v turbo |
Advanced Video Modes
First-Last Frame • Sound-to-Video • Camera Control
Specialised generation modes that extend beyond simple text or image inputs. Interpolate between keyframes, drive video from audio, or control camera movements.
Available Modes
| LTX First-Last Frame | 768×512 • 25 fps • 97 frames | Interpolate between two keyframe images with prompt guidance |
| WAN First-Last Frame | 832×480 • 16 fps • 81 frames | WAN 14B first & last frame interpolation |
| WAN Sound-to-Video | 832×480 • 16 fps • 81 frames | Generate talking-head video from portrait + audio |
| LTX Camera Control | 768×512 • 25 fps • 97 frames | Camera dolly (in, out, left, right) via LoRA |
| LTX ControlNet | 768×512 • 25 fps • 97 frames | Depth, edge, or pose-guided generation |
| WAN VACE | 832×480 • 16 fps • 81 frames | Video conditioning & editing with subject/scene control |
| WAN Animate | 832×480 • 16 fps • 81 frames | Character animation from a single reference image |
Specialised Models
Portrait Animation • Face Transfer • 3D Generation
Purpose-built models for specific creative tasks including portrait animation, face identity transfer, audio-driven talking heads, and 3D asset generation.
Available Specialist Models
| LivePortrait | 512×512 • 24 fps • 81 frames | Animate portrait photos with facial expressions from a driving video |
| DreamID-V | 832×480 • 24 fps • 81 frames | Face identity transfer — insert your face into generated video |
| EchoMimic | 512×512 • 25 fps • 100 frames | Audio-driven talking portrait from a single photo + audio clip |
| Hunyuan3D v2 | 512×512 single output | Generate 3D model assets from text or reference image |
Image Generation Models
AI Image Generation • Text-to-Image & Image Editing
Four image generation models ranging from the fast and lightweight Stable Diffusion 1.5 to the powerful Flux 2 family. Use the Image Edit model to modify existing images with text instructions.
Available Image Models
| Stable Diffusion 1.5 | 512×512 • 25 steps • CFG 7 | Fast & lightweight • 2 credits |
| Flux 2 Klein | 1024×1024 • 20 steps • CFG 4 | Balanced quality & speed • 5 credits |
| Flux 2 Dev | 1024×1024 • 28 steps • CFG 4 | Highest quality • 8 credits |
| Flux 2 Image Edit | 1024×1024 • 20 steps • CFG 4 | Edit images with text instructions • 6 credits |
Model Comparison
| Feature | WAN | LTX 2.3 | LTX Quality | Hunyuan | Hunyuan 1.5 |
|---|---|---|---|---|---|
| Max Resolution | 848 × 480 | 1280 × 720 | 3840 × 2176 (4K) | 848 × 480 | 1280 × 720 |
| Frame Rate | 16 fps | 25 fps | 24 fps | 24 fps | 24 fps |
| Inference Steps | 4 (fastest) | 8 × 2 passes | 20 + upscale pass | 20 | 20 |
| Generation Speed | Fastest | Moderate | Slowest | Slow | Moderate |
| Output Quality | Good | Excellent | Best | Excellent | Excellent |
| Motion Quality | Good | Good | Excellent | Best | Best |
| Sampler | Euler | Euler Ancestral CFG++ | Euler | Euler | Euler |
| Upscaling | None | Built-in (LTXVUpsampler) | 2× Spatial Upscaler | None | Optional 1080p SR |
| Best Use Case | Prototyping & iteration | HD production | 4K final production | Character & anime | HD motion & prompts |
| Image-to-Video | ✓ Available | ✓ Available | Coming Soon | Coming Soon | ✓ Available |
Powered by NVIDIA DGX B200
NVIDIA DGX B200
8x Blackwell GPUs • 1,440 GB HBM3e • 144 PFLOPS
The foundation for your AI factory. NVIDIA DGX B200 is equipped with eight NVIDIA Blackwell GPUs interconnected with fifth-generation NVLink, delivering 3X the training performance and 15X the inference performance of previous-generation systems.
NVIDIA Blackwell architecture with NVLink interconnect and 2x NVSwitch
64 TB/s aggregate memory bandwidth across all GPUs
72 PFLOPS FP8 Tensor Core compute power
Fifth-generation NVLink aggregate bandwidth
Industry-standard node-based workflow engine for reproducible generation
Self-hosted infrastructure — your data never leaves the server
Full Specifications
| GPU | 8x NVIDIA Blackwell GPUs |
| GPU Memory | 1,440 GB total HBM3e — 64 TB/s aggregate bandwidth |
| FP4 Tensor Core | 144 PFLOPS (sparse) |
| FP8 Tensor Core | 72 PFLOPS (sparse) |
| NVLink | 14.4 TB/s aggregate — 5th generation via 2x NVSwitch |
| CPU | 2x Intel Xeon Platinum 8570 — 112 cores, 2.1 / 4 GHz |
| System Memory | 2 TB DDR5 (configurable to 4 TB) |
| Storage | 2x 1.9 TB NVMe M.2 (OS) + 8x 3.84 TB NVMe U.2 (data) |
| Networking | 8x 400 Gb/s ConnectX-7 + 2x 400 Gb/s BlueField-3 DPU |
| System Power | ~14.3 kW max |
| Form Factor | 10 RU rack unit |
| Software | NVIDIA AI Enterprise + Mission Control + DGX OS |
Choose your model and start creating
Create a free account to access all models — text-to-video and image-to-video. Switch between them anytime.