|
- Wan: Open and Advanced Large-Scale Video Generative Models
Wan: Open and Advanced Large-Scale Video Generative Models In this repository, we present Wan2 1, a comprehensive and open suite of video foundation models that pushes the boundaries of video generation Wan2 1 offers these key features:
- GitHub - stepfun-ai Step-Video-T2V
We present Step-Video-T2V, a state-of-the-art (SoTA) text-to-video pre-trained model with 30 billion parameters and the capability to generate videos up to 204 frames To enhance both training and inference efficiency, we propose a deep compression VAE for videos, achieving 16x16 spatial and 8x
- HunyuanVideo: A Systematic Framework For Large Video . . . - GitHub
HunyuanVideo introduces the Transformer design and employs a Full Attention mechanism for unified image and video generation Specifically, we use a "Dual-stream to Single-stream" hybrid model design for video generation In the dual-stream phase, video and text tokens are processed independently through multiple Transformer blocks, enabling each modality to learn its own appropriate
- DepthAnything Video-Depth-Anything - GitHub
This work presents Video Depth Anything based on Depth Anything V2, which can be applied to arbitrarily long videos without compromising quality, consistency, or generalization ability Compared with other diffusion-based models, it enjoys faster inference speed, fewer parameters, and higher
- GitHub - Lightricks LTX-Video: Official repository for LTX-Video
Official repository for LTX-Video Contribute to Lightricks LTX-Video development by creating an account on GitHub
- GitHub - Lightricks ComfyUI-LTXVideo: LTX-Video Support for ComfyUI
LTX-Video Support for ComfyUI Contribute to Lightricks ComfyUI-LTXVideo development by creating an account on GitHub
- GitHub - kijai ComfyUI-WanVideoWrapper
Contribute to kijai ComfyUI-WanVideoWrapper development by creating an account on GitHub
- HunyuanCustom: A Multimodal-Driven Architecture for Customized Video . . .
Multimodal Video customization HunyuanCustom supports inputs in the form of text, images, audio, and video Specifically, it can handle single or multiple image inputs to enable customized video generation for one or more subjects Additionally, it can incorporate extra audio inputs to drive the subject to speak the corresponding audio
|
|
|