OpenFork Desktop

The local GPU worker for the AI movie studio your agent can operate.

Download OpenFork Desktop

Windows Linux

View Source Code

Core Client Desktop App

Available Services

WAN 2.2

8-24GB

High-quality video (Text/Image-to-Video)

LTX-2.3 Official

16-32GB

Official Lightricks LTX-2 Trainer package with the LTX-2.3 22B dev checkpoint. Standard upstream training targets 80GB+ VRAM; this lane targets the upstream 32GB low-VRAM INT8 config. 24GB is not an official supported trainer target.

DreamID-Omni

24GB

DreamID-Omni FP8 talking-head video with identity and voice reference inputs

Z-Image Turbo

8GB

ControlNet & Flux Image Gen

Z-Image

8-24GB

High-quality Z-Image (Q4_K_M GGUF)

Krea 2

8-24GB

Experimental Krea 2 Turbo GGUF Q3 text-to-image via ComfyUI-GGUF with Qwen3-VL text encoder and Qwen Image VAE

PiD

16GB

NVIDIA PiD 4x image super-resolution using the Flux/Z-Image-compatible VAE decoder path.

Anima

8-16GB

Anima text-to-image illustration model

FLUX Kontext

8-24GB

FLUX.1 Kontext [dev] GGUF Q4_K_M for 8GB VRAM; supports optional two-image composition by precomposing the source and identity reference

Qwen Edit

8-12GB

Qwen-Image-Edit-2511 instruction editing plus Qwen-Image-2512 character LoRA text-to-image inference

Qwen LoRA

24GB

Qwen-Image-2512 character LoRA text-to-image inference using the Unsloth Q4_K_M GGUF diffusion model for 24GB GPUs

Qwen Turbo

8GB

Ultra-fast instruction-based editing with an optional second reference image (2 steps)

VibeVoice

8GB

Clone-ready text-to-speech

Chatterbox

8GB

High-quality voice cloning

Qwen3-TTS

8-16GB

Alibaba's multilingual TTS with 9 premium voices

dots.tts MF

6GB

Rednote HiLab dots.tts MeanFlow-distilled continuous TTS and zero-shot voice cloning, smoke tested on the 6GB image

Scenema Audio

16GB

XML-driven expressive speech generation with zero-shot voice cloning

DiffRhythm

8GB

AI-powered music composition

Stable Audio 3

2GB

Stable Audio 3 Small-SFX sound-effects-only text-to-audio

AudioX

16-24GB

AudioX text-to-audio and video-conditioned sound generation

Local LLM

4GB

Local Qwen3 4B LLM for workflow planning on low-VRAM providers

HeartMuLa

16-24GB

4-bit quantized music generation (~8GB VRAM)

ACE-Step 1.5

8-16GB

ACE-Step 1.5 XL music generation for 16GB VRAM

MMAudio V2A

8-16GB

Video-to-audio synthesis (MMAudio small_44k)

Speech Enhancement

2GB

High-efficiency speech restoration and enhancement

PRiSM Audio

8-16GB

High-fidelity video-to-audio generation (PRiSM)

daVinci MagiHuman

16-32GB

daVinci-MagiHuman through Wan2GP with Magi Human Distill SR1080 quanto int8 weights. Use for realistic portrait talking-head/lip-sync shots only. Requires a realistic portrait start image; generates synchronized speech/audio from the text prompt.

SCAIL-2

16-24GB

Experimental SCAIL-2 14B WanGP candidate using the DeepBeepMeep int8 SCAIL-2 weights, SAM3 Magic Mask assets, and the native scail2_14B WanGP model type. Use for OpenFork 16GB smoke testing before production runs.

Vista4D

24GB

Vista4D 384p49 through Wan2GP for source-video novel-view reshooting with predefined camera trajectories.

SparkVSR

24GB

State-of-the-art video super-resolution (24GB VRAM required)

ERNIE-Image

8-24GB

Baidu ERNIE-Image Turbo text-to-image with CPU offload for 8GB GPUs

Ideogram 4

16-32GB

Ideogram 4 text-to-image using NF4 weights with CPU offload and structured JSON layer prompts; true 16GB Vast smoke passed on 2026-06-09

DomainShuttle

80GB

DomainShuttle subject-driven image-to-video using Wan2.2-DomainShuttle-A14B; registered as an 80GB service because the closest published upstream requirement is Wan2.2 A14B inference

TeleStyleV2

80GB

TeleStyleV2 content/style reference transfer on Qwen-Image-Edit-2509 with the TeleStyle and DMD LoRAs; upstream target is H100 80GB

NVIDIA GPU ONLY 8GB+ VRAM RECOMMENDED