Datasets who is the best at X ?
Computer Vision Datasets
- website: http://clickdamage.com/sourcecode/index.html
- code: http://clickdamage.com/sourcecode/cv_datasets.php
- mirror: http://pan.baidu.com/s/1pJmqD4n
Introducing the Open Images Dataset
- blog: https://research.googleblog.com/2016/09/introducing-open-images-dataset.html
- github: https://github.com/openimages/dataset
- Academic Torrents: http://academictorrents.com/details/9e9194e21ce045deee8d811481b4cd676b20b06b
A parallel download util for Google’s open image dataset
Image & Vision Group - Datasets
- intro: Image & Vision , Clothing & Fashion, Computer Graphics, Video Sequences
- homepage: http://caiivg.weebly.com/dataset.html
Huizhong Chen - Datasets
- intro: Google I/O Dataset, Names 100 Dataset, Clothing Attributes Dataset, Stanford Mobile Visual Search Dataset, CNN 2-Hours Videos Dataset
- homepage: http://huizhongchen.github.io/datasets.html#clothingattributedataset
Classification
A Large-Scale Car Dataset for Fine-Grained Categorization and Verification
- project page: http://mmlab.ie.cuhk.edu.hk/datasets/comp_cars/index.html
- arxiv: http://arxiv.org/abs/1506.08959
The CIFAR-10 dataset
- intro: The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.
- homepage: http://www.cs.toronto.edu/~kriz/cifar.html
Face Recognition
The MegaFace Benchmark: 1 Million Faces for Recognition at Scale
MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition
MSR Image Recognition Challenge (IRC)
UMDFaces: An Annotated Face Dataset for Training Deep Networks
Scene Recognition
Places: An Image Database for Deep Scene Understanding
- project page: http://places.csail.mit.edu/index.html
- arxiv: https://arxiv.org/abs/1610.02055
Places2
- intro: Places2 contains more than 10 million images comprising 400+ unique scene categories
- homepage: http://places2.csail.mit.edu/
The Places365-CNNs for Scene Classification
MNIST
EMNIST: an extension of MNIST to handwritten letters
Food
3 Million Instacart Orders, Open Sourced
https://tech.instacart.com/3-million-instacart-orders-open-sourced-d40d29ead6f2
Detection
YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video
- intro: YouTube-BoundingBoxes (YT-BB)
- homepage: https://research.google.com/youtubebb/
- arxiv: https://arxiv.org/abs/1702.00824
Face Detection
FDDB: Face Detection Data Set and Benchmark
- homepage: http://vis-www.cs.umass.edu/fddb/index.html
- results: http://vis-www.cs.umass.edu/fddb/results.html
WIDER FACE: A Face Detection Benchmark
Pedestrian Detection
Caltech Pedestrian Detection Benchmark
Salieny Detection
MSRA10K Salient Object Database
Detection From Video
YouTube-Objects dataset v2.2
ILSVRC2015: Object detection from video (VID)
Segmentation
Mapillary Vistas Dataset
Mapillary Vistas Dataset
- intro: 25,000 high-resolution images, 100 object categories, 60 of those instance-specific https://www.mapillary.com/dataset/
Releasing the World’s Largest Street-level Imagery Dataset for Teaching Machines to See
http://blog.mapillary.com/product/2017/05/03/mapillary-vistas-dataset.html
PASCAL VOC
Augmented Pascal VOC
http://home.bharathh.info/pubs/codes/SBD/download.html
Microsoft COCO
- homepage: http://mscoco.org/
- github: https://github.com/pdollar/coco
The Oxford-IIIT Pet Dataset
- intro: a 37 category pet dataset with roughly 200 images for each class. All images have an associated ground truth annotation of breed, head ROI, and pixel level trimap segmentation
- homepage: http://www.robots.ox.ac.uk/~vgg/data/pets/
COCO-Stuff
COCO-Stuff: Thing and Stuff Classes in Context
COCO-Stuff 10K dataset v1.1
https://arxiv.org/abs/1612.03716 https://github.com/nightrome/cocostuff
Scene Parsing
MIT Scene Parsing Benchmark
http://sceneparsing.csail.mit.edu/
ADE20K
- intro: train: 20,120 images, val: 2000 images. contains 150 stuff/object category labels (e.g., wall, sky, and tree) and 1,038 imagelevel scene descriptors (e.g., airport terminal, bedroom, and street).
- homepage: http://groups.csail.mit.edu/vision/datasets/ADE20K/
Semantic Understanding of Scenes through the ADE20K Dataset
https://arxiv.org/abs/1608.05442
ImageNet
ImageNet-Utils
- intro: Utils to help download images by id, crop bounding box, label images, etc.
- github: https://github.com/tzutalin/ImageNet_Utils
Captioning / Description
TGIF: A New Dataset and Benchmark on Animated GIF Description
Collecting Multilingual Parallel Video Descriptions Using Mechanical Turk
- intro: 1970 YouTube video snippets: 1200 training, 100 validation, 670 test
- homepage: http://www.cs.utexas.edu/users/ml/clamp/videoDescription/
Video
Dataset | # Videos | # Classes | Year | Manually Labeled ? |
---|---|---|---|---|
Kodak | 1,358 | 25 | 2007 | ✓ |
HMDB51 | 7000 | 51 | ||
Charades | 9848 | 157 | ||
MCG-WEBV | 234,414 | 15 | 2009 | ✓ |
CCV | 9,317 | 20 | 2011 | ✓ |
UCF-101 | 13,320 | 101 | 2012 | ✓ |
THUMOS-2 | 18,394 | 101 | 2014 | ✓ |
MED-2014 | ≈28,000 | 20 | 2014 | ✓ |
Sports-1M | 1M | 487 | 2014 | ✗ |
ActivityNet | 27,801 | 203 | 2015 | ✓ |
FCVID | 91,223 | 239 | 2015 | ✓ |
UCF101 - Action Recognition Data Set
- homepage: http://crcv.ucf.edu/data/UCF101.php
HMDB51: A Large Video Database for Human Motion Recognition
ActivityNet: A Large-Scale Video Benchmark for Human Activity Understanding
- homepage: http://activity-net.org/
- download: http://activity-net.org/download.html
- github: https://github.com/activitynet
Sports-1M
- homepage: https://github.com/gtoderici/sports-1m-dataset/blob/wiki/ProjectHome.md
- github: https://github.com/gtoderici/sports-1m-dataset/
- thumbnails: http://cs.stanford.edu/people/karpathy/deepvideo/classes.html
Charades Dataset
- intro: This dataset guides our research into unstructured video activity recogntion and commonsense reasoning for daily human activities.
- intro: The dataset contains 66,500 temporal annotations for 157 action classes, 41,104 labels for 46 object classes, and 27,847 textual descriptions of the videos.
- homepage: http://allenai.org/plato/charades/
FCVID: Fudan-Columbia Video Dataset
- homepage: http://bigvid.fudan.edu.cn/FCVID/
YouTube-8M: A Large-Scale Video Classification Benchmark
- homepage: http://research.google.com/youtube8m/
- arxiv: http://arxiv.org/abs/1609.08675
stabilized video frames
- intro: 9 TB, 35,000,000 clips, 32 frames
- intro: Generating Videos with Scene Dynamics
- homepage: http://web.mit.edu/vondrick/tinyvideo/#data
The Kinetics Human Action Video Dataset
- intro: Google
- homepage: https://deepmind.com/research/open-source/open-source-datasets/kinetics/
- arxiv: https://arxiv.org/abs/1705.06950
e-Lab Video Data Set(s)
- intro: “Currently, e-VDS35 has 35 classes and a total of 2050 videos of roughly 10 seconds each (see histogram below). We are aiming to collect overall 1750 (50 × 35) videos with your help.”
- homepage: https://engineering.purdue.edu/elab/eVDS
Scene
SceneNet RGB-D: 5M Photorealistic Images of Synthetic Indoor Trajectories with Ground Truth
- intro: Imperial College London
- project page: https://robotvault.bitbucket.org/scenenet-rgbd.html
- github: https://arxiv.org/abs/1612.05079
- github: https://github.com/jmccormac/pySceneNetRGBD
OCR
COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images
- homepage: http://vision.cornell.edu/se3/coco-text/
- arxiv: http://arxiv.org/abs/1601.07140
Retrieval
Oxford5k
Paris6k
Oxford105k
UKB
NUS-WIDE
ImageNet-YahooQA
DeepFashion: In-shop Clothes Retrieval
- intro: 7,982 number of clothing items; 52,712 number of in-shop clothes images, and ~200,000 cross-pose/scale pairs; Each image is annotated by bounding box, clothing type and pose type.
- homepage: http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion/InShopRetrieval.html
Person Re-id
PRW (Person Re-identification in the Wild) Dataset
- homepage: http://www.liangzheng.com.cn/Project/project_prw.html
- github: https://github.com/liangzheng06/PRW-baseline
Person Re-identification in the Wild
- intro: CVPR 2017 spotlight
- arxiv: https://arxiv.org/abs/1604.02531
DukeMTMC-reID
- intro: DukeMTMC-reID is a subset of the DukeMTMC for image-based re-identification, in the format of the Market-1501 dataset
- intro: 16,522 training images of 702 identities, 2,228 query images of the other 702 identities and 17,661 gallery images
- github: https://github.com/layumi/DukeMTMC-reID_evaluation
Fasion
Large-scale Fashion (DeepFashion) Database
- intro: Attribute Prediction, Consumer-to-shop Clothes Retrieval, In-shop Clothes Retrieval, and Landmark Detection
- homepage: http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html
Apparel classification with Style
- intro: 15 clothing classes, 88951 images
- homepage: http://people.ee.ethz.ch/~lbossard/projects/accv12/index.html
Attribute Datasets
Attribute Datasets
- intro: in total 41,585 pedestrian samples, each of which is annotated with 72 attributes as well as viewpoints, occlusions, body parts information
- homepage: https://www.ecse.rpi.edu/homepages/cvrl/database/AttributeDataset.htm
Pedestrian Attribute Recognition
A Richly Annotated Dataset for Pedestrian Attribute Recognition
- homepage: http://rap.idealtest.org/
- arxiv: https://arxiv.org/abs/1603.07054
Pedestrian Attribute Recognition At Far Distance
- intro: PEdesTrian Attribute (PETA)
- homepage: http://mmlab.ie.cuhk.edu.hk/projects/PETA.html
- paper: http://personal.ie.cuhk.edu.hk/~pluo/pdf/mm14.pdf
Market-1501_Attribute
DukeMTMC-attribute
Tracking
UA-DETRAC: A New Benchmark and Protocol for Multi-Object Detection and Tracking
- homepage: http://detrac-db.rit.albany.edu/
- arxiv: https://arxiv.org/abs/1511.04136
DukeMTMC: Duke Multi-Target, Multi-Camera Tracking Project
- intro: DukeMTMC aims to accelerate advances in multi-target multi-camera tracking. It provides a tracking system that works within and across cameras, a new large scale HD video data set recorded by 8 synchronized cameras with more than 7,000 single camera trajectories and over 2,000 unique identities
- homepage: http://vision.cs.duke.edu/DukeMTMC/
Tools
LabelImg: a graphical image annotation tool and label object bounding boxes in images
Pychet Labeller
- intro: A python based annotation/labelling toolbox for images. The program allows the user to annotate individual objects in images.
- github: https://github.com/sbargoti/pychetlabeller
ml-pyxis: Tool for reading and writing datasets of tensors (numpy.ndarray) with MessagePack and Lightning Memory-Mapped Database (LMDB).
- intro: Tool for reading and writing datasets of tensors in a Lightning Memory-Mapped Database (LMDB). Designed to manage machine learning datasets with fast reading speeds.
- github: https://github.com/vicolab/ml-pyxis
Open Image Dataset downloader
Artist
BAM! The Behance Artistic Media Dataset
- intro: 2.5M artwork urls, 393K attribute labels, 74K short image descriptions/captions
- project page: https://bam-dataset.org/
- arxiv: https://arxiv.org/abs/1704.08614
Resources
CV Datasets on the web
http://www.cvpapers.com/datasets.html
Awesome Public Datasets
- intro: An awesome list of high-quality open datasets in public domains (on-going). By everyone, for everyone!
- github: https://github.com/caesar0301/awesome-public-datasets
Machine Learning Repository