computer-vision

Modal lets you run or deploy machine learning models, massively parallel compute jobs, task queues, web apps, and much more, without your own infrastructure. Link

OpenPose has represented the first real-time multi-person system to jointly detect human body, hand, facial, and foot keypoints (in total 135 keypoints) on single images. Link

YOWOv2: A Stronger yet Efficient Multi-level Detection Framework for Real-time Spatio-temporal Action Detection Link

Eden AI provides a unique API connected to the best AI engines Link Github

Copying Tesla's Data Engine (for food images)

A cooking recipe for building Nutrify’s data engine. Link

An unofficial PyTorch implementation of VALL-E, based on the EnCodec tokenizer. Link

In this tutorial, you’ll learn how to deploy diffusion models at scale and build a text-to-image generator Link Muse

Text-to-Image: Diffusion, Text Conditioning, Guidance, Latent Space

Text-to-image has advanced at a breathless pace in 2021 - 2022, starting with DALL·E, then DALL·E 2, Imagen, and now Stable Diffusion. I dug into a couple of papers to learn more about the space and organized my understanding into a few key concepts Link

DALL-E: Inside the Artificial Intelligence program that creates images from textual descriptions

Link

Stable Diffusion is a text-to-image latent diffusion model created by the researchers and engineers from CompVis, Stability AI and LAION. It is trained on 512x512 images from a subset of the LAION-5B database. LAION-5B is the largest, freely accessible multi-modal dataset that currently exists. Link

This video presents our tutorial on Denoising Diffusion-based Generative Modeling: Foundations and Applications. This tutorial was originally presented at CVPR 2022 in New Orleans and it received a lot of interest from the research community. Link

This repository contains a collection of resources and papers on Diffusion Models and Score-matching Models. Link

PYSKL is a toolbox focusing on action recognition based on SKeLeton data with PYTorch. Various algorithms will be supported for skeleton-based action recognition. We build this project based on the OpenSource Project MMAction2. Link

Advanced Computer Vision with Python - Full Course

Learn advanced computer vision using Python in this full course. You will learn state of the art computer vision techniques by building five projects with libraries such as OpenCV and Mediapipe. If you are a beginner, don’t be afraid of the term advance. Even though the concepts are advanced, Link

Real-time face swap fpr PC streamig or video calls

Link

computer-vision

Your end-to-end stack for cloud compute

OpenPose has represented the first real-time multi-person system to jointly detect human body

YOWOv2: A Stronger yet Efficient Multi-level Detection Framework for Real-time Spatio-temporal Action Detection

AI made easy for developers

Copying Tesla's Data Engine (for food images)

VALL-E

How to Deploy Diffusion Models

Text-to-Image: Diffusion, Text Conditioning, Guidance, Latent Space

DALL-E: Inside the Artificial Intelligence program that creates images from textual descriptions

Stable Diffusion with Diffusers

Tutorial on Denoising Diffusion-based Generative Modeling: Foundations and Applications

Diffusion Models and Score-matching Models

PYSKL

Advanced Computer Vision with Python - Full Course

Real-time face swap fpr PC streamig or video calls