Video Applications

Oct 9, 2015


Papers

You Lead, We Exceed: Labor-Free Video Concept Learningby Jointly Exploiting Web Videos and Images

Video Fill in the Blank with Merging LSTMs

  • intro: for Large Scale Movie Description and Understanding Challenge (LSMDC) 2016, “Movie fill-in-the-blank” Challenge, UCF_CRCV
  • intro: Video-Fill-in-the-Blank (ViFitB)
  • arxiv: https://arxiv.org/abs/1610.04062

Video Pixel Networks

Robust Video Synchronization using Unsupervised Deep Learning

Video Propagation Networks

Video Frame Synthesis using Deep Voxel Flow

Optimizing Deep CNN-Based Queries over Video Streams at Scale

NoScope: 1000x Faster Deep Learning Queries over Video

http://dawn.cs.stanford.edu/2017/06/22/noscope/

Unsupervised Visual-Linguistic Reference Resolution in Instructional Videos

ProcNets: Learning to Segment Procedures in Untrimmed and Unconstrained Videos

https://arxiv.org/abs/1703.09788

Unsupervised Learning Layers for Video Analysis

  • intro: Baidu Research
  • intro: “The experiments demonstrated the potential applications of UL layers and online learning algorithm to head orientation estimation and moving object localization”
  • arxiv: https://arxiv.org/abs/1705.08918

Look, Listen and Learn

Video Imagination from a Single Image with Transformation Generation

Learning to Learn from Noisy Web Videos

Video Classification

Large-scale Video Classification with Convolutional Neural Networks

Exploiting Image-trained CNN Architectures for Unconstrained Video Classification

Beyond Short Snippets: Deep Networks for Video Classification

Modeling Spatial-Temporal Clues in a Hybrid Deep Learning Framework for Video Classification

Video Content Recognition with Deep Learning

Video Content Recognition with Deep Learning

Efficient Large Scale Video Classification

Fusing Multi-Stream Deep Networks for Video Classification

Learning End-to-end Video Classification with Rank-Pooling

Deep Learning for Video Classification and Captioning

Fast Video Classification via Adaptive Cascading of Deep Models

Deep Feature Flow for Video Recognition

Large-Scale YouTube-8M Video Understanding with Deep Neural Networks

https://arxiv.org/abs/1706.04488

Deep Learning Methods for Efficient Large Scale Video Labeling

Learnable pooling with Context Gating for video classification

Aggregating Frame-level Features for Large-Scale Video Classification

Tensor-Train Recurrent Neural Networks for Video Classification

https://arxiv.org/abs/1707.01786

Hierarchical Deep Recurrent Architecture for Video Understanding

Large-scale Video Classification guided by Batch Normalized LSTM Translator

UTS submission to Google YouTube-8M Challenge 2017

A spatiotemporal model with visual attention for video classification

https://arxiv.org/abs/1707.02069

Cultivating DNN Diversity for Large Scale Video Labelling

Attention Transfer from Web Images for Video Recognition

Action Detection / Activity Recognition

3d convolutional neural networks for human action recognition

Sequential Deep Learning for Human Action Recognition

Two-stream convolutional networks for action recognition in videos

Finding action tubes

  • intro: “built action models from shape and motion cues. They start from the image proposals and select the motion salient subset of them and extract saptio-temporal features to represent the video using the CNNs.”
  • arxiv: http://arxiv.org/abs/1411.6031

Hierarchical Recurrent Neural Network for Skeleton Based Action Recognition

Action Recognition with Trajectory-Pooled Deep-Convolutional Descriptors

Action Recognition by Hierarchical Mid-level Action Elements

Contextual Action Recognition with R*CNN

Towards Good Practices for Very Deep Two-Stream ConvNets

Action Recognition using Visual Attention

End-to-end Learning of Action Detection from Frame Glimpses in Videos

Multi-velocity neural networks for gesture recognition in videos

Active Learning for Online Recognition of Human Activities from Streaming Videos

Convolutional Two-Stream Network Fusion for Video Action Recognition

Deep, Convolutional, and Recurrent Models for Human Activity Recognition using Wearables

Unsupervised Semantic Action Discovery from Video Collections

Anticipating Visual Representations from Unlabeled Video

VideoLSTM Convolves, Attends and Flows for Action Recognition

Hierarchical Attention Network for Action Recognition in Videos (HAN)

Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition

Connectionist Temporal Modeling for Weakly Supervised Action Labeling

CUHK & ETHZ & SIAT Submission to ActivityNet Challenge 2016

Actionness Estimation Using Hybrid FCNs

Real-time Action Recognition with Enhanced Motion Vector CNNs

Temporal Segment Networks: Towards Good Practices for Deep Action Recognition

Temporal Segment Networks for Action Recognition in Videos

Hierarchical Attention Network for Action Recognition in Videos

DeepCAMP: Deep Convolutional Action & Attribute Mid-Level Patterns

Depth2Action: Exploring Embedded Depth for Large-Scale Action Recognition

Dynamic Image Networks for Action Recognition

Human Action Recognition without Human

Temporal Convolutional Networks: A Unified Approach to Action Segmentation

Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Sequential Deep Trajectory Descriptor for Action Recognition with Three-stream CNN

Semi-Coupled Two-Stream Fusion ConvNets for Action Recognition at Extremely Low Resolutions

Spatiotemporal Residual Networks for Video Action Recognition

Action Recognition Based on Joint Trajectory Maps Using Convolutional Neural Networks

Deep Recurrent Neural Network for Mobile Human Activity Recognition with High Throughput

Joint Network based Attention for Action Recognition

Temporal Convolutional Networks for Action Segmentation and Detection

AdaScan: Adaptive Scan Pooling in Deep Convolutional Neural Networks for Human Action Recognition in Videos

ActionFlowNet: Learning Motion Representation for Action Recognition

Higher-order Pooling of CNN Features via Kernel Linearization for Action Recognition

Tube Convolutional Neural Network (T-CNN) for Action Detection in Videos

https://arxiv.org/abs/1703.10664

Temporal Action Detection with Structured Segment Networks

Recurrent Residual Learning for Action Recognition

https://arxiv.org/abs/1706.08807

Projects

A Torch Library for Action Recognition and Detection Using CNNs and LSTMs

2016 ActivityNet action recognition challenge. CNN + LSTM approach. Multi-threaded loading.

LSTM for Human Activity Recognition

Scanner: Efficient Video Analysis at Scale

Charades Starter Code for Activity Classification and Localization

Event Recognition

TagBook: A Semantic Video Representation without Supervision for Event Detection

AENet: Learning Deep Audio Features for Video Analysis

Event Detection

DevNet: A Deep Event Network for Multimedia Event Detection and Evidence Recounting

Detecting events and key actors in multi-person videos

Deep Convolutional Neural Networks and Data Augmentation for Acoustic Event Detection

Efficient Action Detection in Untrimmed Videos via Multi-Task Learning

Abnormality / Anomaly Detection

Fully Convolutional Neural Network for Fast Anomaly Detection in Crowded Scenes

Anomaly Detection in Video Using Predictive Convolutional Long Short-Term Memory Networks

Video Prediction

Deep multi-scale video prediction beyond mean square error

Unsupervised Learning for Physical Interaction through Video Prediction

Generating Videos with Scene Dynamics

PredNet

Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning

Diversity encouraged learning of unsupervised LSTM ensemble for neural activity video prediction

Video Ladder Networks

Unsupervised Learning of Long-Term Motion Dynamics for Videos

One-Step Time-Dependent Future Video Frame Prediction with a Convolutional Encoder-Decoder Neural Network

Video Tagging

Automatic Image and Video Tagging

Tagging YouTube music videos with deep learning - Alexandre Passant

Shot Boundary Detection

Large-scale, Fast and Accurate Shot Boundary Detection through Spatio-temporal Convolutional Neural Networks

https://arxiv.org/abs/1705.03281

Ridiculously Fast Shot Boundary Detection with Fully Convolutional Neural Networks

Video Action Segmentation

TricorNet: A Hybrid Temporal Convolutional and Recurrent Network for Video Action Segmentation

Video2GIF

Video2GIF: Automatic Generation of Animated GIFs from Video (Robust Deep RankNet)

Creating Animated GIFs Automatically from Video

https://yahooresearch.tumblr.com/post/148009705216/creating-animated-gifs-automatically-from-video

Video2Speech

Vid2speech: Speech Reconstruction from Silent Video

Video Captioning

http://handong1587.github.io/deep_learning/2015/10/09/image-video-captioning.html#video-captioning

Video Summarization

Video summarization produces a short summary of a full-length video and ideally encapsulates its most informative parts, alleviates the problem of video browsing, editing and indexing.

Video Summarization with Long Short-term Memory

DeepVideo: Video Summarization using Temporal Sequence Modelling

Semantic Video Trailers

Video Summarization using Deep Semantic Features

Video Highlight Detection

Unsupervised Extraction of Video Highlights Via Robust Recurrent Auto-encoders

  • intro: ICCV 2015
  • intro: rely on an assumption that highlights of an event category are more frequently captured in short videos than non-highlights
  • arxiv: http://arxiv.org/abs/1510.01442

Highlight Detection with Pairwise Deep Ranking for First-Person Video Summarization

Using Deep Learning to Find Basketball Highlights

Real-Time Video Highlights for Yahoo Esports

Video Understanding

Scale Up Video Understandingwith Deep Learning

Slicing Convolutional Neural Network for Crowd Video Understanding

Challenges

THUMOS Challenge 2014

THUMOS Challenge 2015

ActivityNet Challenge 2016