Computer Vision Datasets

Sep 24, 2015


Datasets who is the best at X ?

Computer Vision Datasets

Introducing the Open Images Dataset

A parallel download util for Google’s open image dataset

Image & Vision Group - Datasets

Huizhong Chen - Datasets

Classification

A Large-Scale Car Dataset for Fine-Grained Categorization and Verification

The CIFAR-10 dataset

  • intro: The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.
  • homepage: http://www.cs.toronto.edu/~kriz/cifar.html

Face Recognition

The MegaFace Benchmark: 1 Million Faces for Recognition at Scale

MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition

MSR Image Recognition Challenge (IRC)

UMDFaces: An Annotated Face Dataset for Training Deep Networks

Scene Recognition

Places: An Image Database for Deep Scene Understanding

Places2

The Places365-CNNs for Scene Classification

MNIST

EMNIST: an extension of MNIST to handwritten letters

Food

3 Million Instacart Orders, Open Sourced

https://tech.instacart.com/3-million-instacart-orders-open-sourced-d40d29ead6f2

Detection

YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video

Face Detection

FDDB: Face Detection Data Set and Benchmark

WIDER FACE: A Face Detection Benchmark

Pedestrian Detection

Caltech Pedestrian Detection Benchmark

Salieny Detection

MSRA10K Salient Object Database

http://mmcheng.net/msra10k/

Detection From Video

YouTube-Objects dataset v2.2

ILSVRC2015: Object detection from video (VID)

Segmentation

Mapillary Vistas Dataset

Mapillary Vistas Dataset

Releasing the World’s Largest Street-level Imagery Dataset for Teaching Machines to See

http://blog.mapillary.com/product/2017/05/03/mapillary-vistas-dataset.html

PASCAL VOC

Augmented Pascal VOC

http://home.bharathh.info/pubs/codes/SBD/download.html

Microsoft COCO

The Oxford-IIIT Pet Dataset

  • intro: a 37 category pet dataset with roughly 200 images for each class. All images have an associated ground truth annotation of breed, head ROI, and pixel level trimap segmentation
  • homepage: http://www.robots.ox.ac.uk/~vgg/data/pets/

COCO-Stuff

COCO-Stuff: Thing and Stuff Classes in Context

COCO-Stuff 10K dataset v1.1

https://arxiv.org/abs/1612.03716 https://github.com/nightrome/cocostuff

Scene Parsing

MIT Scene Parsing Benchmark

http://sceneparsing.csail.mit.edu/

ADE20K

  • intro: train: 20,120 images, val: 2000 images. contains 150 stuff/object category labels (e.g., wall, sky, and tree) and 1,038 imagelevel scene descriptors (e.g., airport terminal, bedroom, and street).
  • homepage: http://groups.csail.mit.edu/vision/datasets/ADE20K/

Semantic Understanding of Scenes through the ADE20K Dataset

https://arxiv.org/abs/1608.05442

ImageNet

ImageNet-Utils

Captioning / Description

TGIF: A New Dataset and Benchmark on Animated GIF Description

Collecting Multilingual Parallel Video Descriptions Using Mechanical Turk

Video

Dataset # Videos # Classes Year Manually Labeled ?
Kodak 1,358 25 2007
HMDB51 7000 51    
Charades 9848 157    
MCG-WEBV 234,414 15 2009
CCV 9,317 20 2011
UCF-101 13,320 101 2012
THUMOS-2 18,394 101 2014
MED-2014 ≈28,000 20 2014
Sports-1M 1M 487 2014
ActivityNet 27,801 203 2015
FCVID 91,223 239 2015

UCF101 - Action Recognition Data Set

HMDB51: A Large Video Database for Human Motion Recognition

ActivityNet: A Large-Scale Video Benchmark for Human Activity Understanding

Sports-1M

Charades Dataset

  • intro: This dataset guides our research into unstructured video activity recogntion and commonsense reasoning for daily human activities.
  • intro: The dataset contains 66,500 temporal annotations for 157 action classes, 41,104 labels for 46 object classes, and 27,847 textual descriptions of the videos.
  • homepage: http://allenai.org/plato/charades/

FCVID: Fudan-Columbia Video Dataset

YouTube-8M: A Large-Scale Video Classification Benchmark

stabilized video frames

The Kinetics Human Action Video Dataset

e-Lab Video Data Set(s)

  • intro: “Currently, e-VDS35 has 35 classes and a total of 2050 videos of roughly 10 seconds each (see histogram below). We are aiming to collect overall 1750 (50 × 35) videos with your help.”
  • homepage: https://engineering.purdue.edu/elab/eVDS

Scene

SceneNet RGB-D: 5M Photorealistic Images of Synthetic Indoor Trajectories with Ground Truth

OCR

COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images

Retrieval

Oxford5k

Paris6k

Oxford105k

UKB

NUS-WIDE

ImageNet-YahooQA

DeepFashion: In-shop Clothes Retrieval

Person Re-id

PRW (Person Re-identification in the Wild) Dataset

Person Re-identification in the Wild

DukeMTMC-reID

  • intro: DukeMTMC-reID is a subset of the DukeMTMC for image-based re-identification, in the format of the Market-1501 dataset
  • intro: 16,522 training images of 702 identities, 2,228 query images of the other 702 identities and 17,661 gallery images
  • github: https://github.com/layumi/DukeMTMC-reID_evaluation

Fasion

Large-scale Fashion (DeepFashion) Database

Apparel classification with Style

Attribute Datasets

Attribute Datasets

Pedestrian Attribute Recognition

A Richly Annotated Dataset for Pedestrian Attribute Recognition

Pedestrian Attribute Recognition At Far Distance

Market-1501_Attribute

DukeMTMC-attribute

Tracking

UA-DETRAC: A New Benchmark and Protocol for Multi-Object Detection and Tracking

DukeMTMC: Duke Multi-Target, Multi-Camera Tracking Project

  • intro: DukeMTMC aims to accelerate advances in multi-target multi-camera tracking. It provides a tracking system that works within and across cameras, a new large scale HD video data set recorded by 8 synchronized cameras with more than 7,000 single camera trajectories and over 2,000 unique identities
  • homepage: http://vision.cs.duke.edu/DukeMTMC/

Tools

LabelImg: a graphical image annotation tool and label object bounding boxes in images

Pychet Labeller

ml-pyxis: Tool for reading and writing datasets of tensors (numpy.ndarray) with MessagePack and Lightning Memory-Mapped Database (LMDB).

  • intro: Tool for reading and writing datasets of tensors in a Lightning Memory-Mapped Database (LMDB). Designed to manage machine learning datasets with fast reading speeds.
  • github: https://github.com/vicolab/ml-pyxis

Open Image Dataset downloader

Artist

BAM! The Behance Artistic Media Dataset

Resources

CV Datasets on the web

http://www.cvpapers.com/datasets.html

Awesome Public Datasets

Machine Learning Repository

https://archive.ics.uci.edu/ml/datasets.html