Video-to-Anime: Mainly based on StyleGAN2 by rosalinity and partly from UGATIT.
Video-Object-Replacement: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation.
Video-Colorization: Colorization using a Generative Color Prior for Natural Images, an implementation of the ECCV 2022 Paper.
Make-Any-Image-Talk: Based on the framework of pix2pix-pytorch and MakeItTalk, ATVG, RhythmicHead, Speech-Driven Animation.
Image-to-Text: The CLIP Interrogator is a prompt engineering tool that combines OpenAI's CLIP and Salesforce's BLIP to optimize text prompts to match a given image. Use the resulting prompts with text-to-image models like Stable Diffusion to create cool art!
AI-Logo-Designer: Erlich is the text2image latent diffusion model from CompVis (with additions from glid-3-xl) finetuned on a dataset collected from LAION-5B named Large Logo Dataset. It consists of roughly 1000K images of logos with captions generated via BLIP using aggressive re-ranking.
Image-Style-Transfer: Image Style Transfer with a Single Text Condition.
Image-Segment: Image segmentation based on Segment Anything Model (SAM).
Image-Restoration: Blind Face Restoration with Vector-Quantized Dictionary and Parallel Decoder.
Object-Removal: Combines Semantic segmentation and EdgeConnect architectures with minor changes in order to remove specified objects from photos.
Speech-to-Text: A general-purpose speech transcription model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech transcription as well as speech translation and language identification.
Text-to-Video: Based on a multi-stage text-to-video generation diffusion model, which inputs a description text and returns a video that matches the text description.
Image-Super-Resolution: Image super-resolution with Stable Diffusion 2.0.
Text-to-Music: Fine-tuned on images of spectrograms paired with text base on stable diffusion 2.0. Audio processing happens downstream of the model.
Text-Recognition: Based on PaddleOCR ch_ppocr_server_v2.0_xx model.
Generate-Detailed-Images-from-Scribbled-Drawings: This model is ControlNet adapting Stable Diffusion 2.0 to use a line drawing (or "scribble") in addition to a text input to generate an output image.
AI-Face-Swap: Based on GHOST (Generative High-fidelity One Shot Transfer).
Voice-Changing: This model adopts the end-to-end framework of VITS for high-quality waveform reconstruction, and propose strategies for clean content information extraction without text annotation.
AI-Clothes-Changer: Fine-tuned Stable Diffusion model trained on clothes changing.
AI-Interior-Design: Generate interior design based on stable diffussion.
Age-Prediction: Computes the similarity age with an input image, using CLIP.
Style-Your-Hair: Latent Optimization for Pose-Invariant Hairstyle Transfer via Local-Style-Aware Hair Alignment based on Barbershop.