Papers
High-Performance Neural Networks for Visual Object Classification
- intro: “reduced network parameters by randomly removing connections before training”
- arxiv: http://arxiv.org/abs/1102.0183
Predicting Parameters in Deep Learning
- intro: “decomposed the weighting matrix into two low-rank matrices”
- arxiv: http://arxiv.org/abs/1306.0543
Neurons vs Weights Pruning in Artificial Neural Networks
Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation
- intro: “presented a series of low-rank decomposition designs for convolutional kernels. singular value decomposition was adopted for the matrix factorization”
- paper: http://papers.nips.cc/paper/5544-exploiting-linear-structure-within-convolutional-networks-for-efficient-evaluation.pdf
cuDNN: Efficient Primitives for Deep Learning
- arxiv: https://arxiv.org/abs/1410.0759
- download: https://developer.nvidia.com/cudnn
Efficient and accurate approximations of nonlinear convolutional networks
- intro: “considered the subsequent nonlinear units while learning the low-rank decomposition”
- arxiv: http://arxiv.org/abs/1411.4229
Convolutional Neural Networks at Constrained Time Cost
Flattened Convolutional Neural Networks for Feedforward Acceleration
- intro: ICLR 2015
- arxiv: http://arxiv.org/abs/1412.5474
- github: https://github.com/jhjin/flattened-cnn
Compressing Deep Convolutional Networks using Vector Quantization
- intro: “this paper showed that vector quantization had a clear advantage over matrix factorization methods in compressing fully-connected layers.”
- arxiv: http://arxiv.org/abs/1412.6115
Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition
- intro: “a low-rank CPdecomposition was adopted to transform a convolutional layer into multiple layers of lower complexity”
- arxiv: http://arxiv.org/abs/1412.6553
Deep Fried Convnets
- intro: “fully-connected layers were replaced by a single “Fastfood” layer for end-to-end training with convolutional layers”
- arxiv: http://arxiv.org/abs/1412.7149
Fast Convolutional Nets With fbfft: A GPU Performance Evaluation
- intro: Facebook. ICLR 2015
- arxiv: http://arxiv.org/abs/1412.7580
- github: http://facebook.github.io/fbcunn/fbcunn/
Caffe con Troll: Shallow Ideas to Speed Up Deep Learning
- intro: a fully compatible end-to-end version of the popular framework Caffe with rebuilt internals
- arxiv: http://arxiv.org/abs/1504.04343
Compressing Neural Networks with the Hashing Trick
- intro: HashedNets. ICML 2015
- intro: “randomly grouped connection weights into hash buckets, and then fine-tuned network parameters with back-propagation”
- project page: http://www.cse.wustl.edu/~wenlinchen/project/HashedNets/index.html
- arxiv: http://arxiv.org/abs/1504.04788
- code: http://www.cse.wustl.edu/~wenlinchen/project/HashedNets/HashedNets.zip
PerforatedCNNs: Acceleration through Elimination of Redundant Convolutions
- intro: NIPS 2016
- arxiv: https://arxiv.org/abs/1504.08362
- github: https://github.com/mfigurnov/perforated-cnn-matconvnet
- github: https://github.com/mfigurnov/perforated-cnn-caffe
Accelerating Very Deep Convolutional Networks for Classification and Detection
- intro: “considered the subsequent nonlinear units while learning the low-rank decomposition”
- arxiv: http://arxiv.org/abs/1505.06798
Fast ConvNets Using Group-wise Brain Damage
- intro: “applied group-wise pruning to the convolutional tensor to decompose it into the multiplications of thinned dense matrices”
- arxiv: http://arxiv.org/abs/1506.02515
Learning both Weights and Connections for Efficient Neural Networks
Data-free parameter pruning for Deep Neural Networks
- intro: “proposed to remove redundant neurons instead of network connections”
- arxiv: http://arxiv.org/abs/1507.06149
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
- intro: ICLR 2016 Best Paper
- intro: “reduced the size of AlexNet by 35x from 240MB to 6.9MB, the size of VGG16 by 49x from 552MB to 11.3MB, with no loss of accuracy”
- arxiv: http://arxiv.org/abs/1510.00149
- video: http://videolectures.net/iclr2016_han_deep_compression/
Structured Transforms for Small-Footprint Deep Learning
- intro: NIPS 2015
- arxiv: https://arxiv.org/abs/1510.01722
- paper: https://papers.nips.cc/paper/5869-structured-transforms-for-small-footprint-deep-learning
ZNN - A Fast and Scalable Algorithm for Training 3D Convolutional Networks on Multi-Core and Many-Core Shared Memory Machines
Reducing the Training Time of Neural Networks by Partitioning
Convolutional neural networks with low-rank regularization
CNNdroid: Open Source Library for GPU-Accelerated Execution of Trained Deep Convolutional Neural Networks on Android
- arxiv: https://arxiv.org/abs/1511.07376
- paper: http://dl.acm.org/authorize.cfm?key=N14731
- slides: http://sharif.edu/~matin/pub/2016_mm_slides.pdf
- github: https://github.com/ENCP/CNNdroid
EIE: Efficient Inference Engine on Compressed Deep Neural Network
- intro: ISCA 2016
- arxiv: http://arxiv.org/abs/1602.01528
- slides: http://on-demand.gputechconf.com/gtc/2016/presentation/s6561-song-han-deep-compression.pdf
- slides: http://web.stanford.edu/class/ee380/Abstracts/160106-slides.pdf
Convolutional Tables Ensemble: classification in microseconds
SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size
- intro: DeepScale & UC Berkeley
- arxiv: http://arxiv.org/abs/1602.07360
- github: https://github.com/DeepScale/SqueezeNet
- homepage: http://songhan.github.io/SqueezeNet-Deep-Compression/
- github: https://github.com/songhan/SqueezeNet-Deep-Compression
- note: https://www.evernote.com/shard/s146/sh/108eea91-349b-48ba-b7eb-7ac8f548bee9/5171dc6b1088fba05a4e317f7f5d32a3
- github(Keras): https://github.com/DT42/squeezenet_demo
- github(Keras): https://github.com/rcmalli/keras-squeezenet
- github(PyTorch): https://github.com/gsp-27/pytorch_Squeezenet
SqueezeNet-Residual
- intro: Residual-SqueezeNet improves the top-1 accuracy of SqueezeNet by 2.9% on ImageNet without changing the model size(only 4.8MB).
- github: https://github.com/songhan/SqueezeNet-Residual
Lab41 Reading Group: SqueezeNet
Simplified_SqueezeNet
- intro: An improved version of SqueezeNet networks
- github(Caffe): https://github.com/NidabaSystems/Simplified_SqueezeNet
SqueezeNet Keras Dogs vs. Cats demo
Convolutional Neural Networks using Logarithmic Data Representation
DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices
Hardware-oriented Approximation of Convolutional Neural Networks
- arxiv: http://arxiv.org/abs/1604.03168
- homepage: http://ristretto.lepsucd.com/
- github(“Ristretto: Caffe-based approximation of convolutional neural networks”): https://github.com/pmgysel/caffe
Deep Neural Networks Under Stress
- intro: ICIP 2016
- arxiv: http://arxiv.org/abs/1605.03498
- github: https://github.com/MicaelCarvalho/DNNsUnderStress
ASP Vision: Optically Computing the First Layer of Convolutional Neural Networks using Angle Sensitive Pixels
Deep Roots: Improving CNN Efficiency with Hierarchical Filter Groups
- intro: “for ResNet 50, our model has 40% fewer parameters, 45% fewer floating point operations, and is 31% (12%) faster on a CPU (GPU). For the deeper ResNet 200 our model has 25% fewer floating point operations and 44% fewer parameters, while maintaining state-of-the-art accuracy. For GoogLeNet, our model has 7% fewer parameters and is 21% (16%) faster on a CPU (GPU).”
- arxiv: https://arxiv.org/abs/1605.06489
Functional Hashing for Compressing Neural Networks
- intro: FunHashNN
- arxiv: http://arxiv.org/abs/1605.06560
Ristretto: Hardware-Oriented Approximation of Convolutional Neural Networks
YodaNN: An Ultra-Low Power Convolutional Neural Network Accelerator Based on Binary Weights
Learning Structured Sparsity in Deep Neural Networks
Design of Efficient Convolutional Layers using Single Intra-channel Convolution, Topological Subdivisioning and Spatial “Bottleneck” Structure
https://arxiv.org/abs/1608.04337
Dynamic Network Surgery for Efficient DNNs
- intro: NIPS 2016
- intro: compress the number of parameters in LeNet-5 and AlexNet by a factor of 108× and 17.7× respectively
- arxiv: http://arxiv.org/abs/1608.04493
- github(official. Caffe): https://github.com/yiwenguo/Dynamic-Network-Surgery
Scalable Compression of Deep Neural Networks
- intro: ACM Multimedia 2016
- arxiv: http://arxiv.org/abs/1608.07365
Pruning Filters for Efficient ConvNets
- intro: NIPS Workshop on Efficient Methods for Deep Neural Networks (EMDNN), 2016
- arxiv: http://arxiv.org/abs/1608.08710
Accelerating Deep Convolutional Networks using low-precision and sparsity
- intro: Intel Labs
- arxiv: https://arxiv.org/abs/1610.00324
Deep Model Compression: Distilling Knowledge from Noisy Teachers
- arxiv: https://arxiv.org/abs/1610.09650
- github: https://github.com/chengshengchan/model_compression](- github: https://github.com/chengshengchan/model_compression
Fixed-point Factorized Networks
Ultimate tensorization: compressing convolutional and FC layers alike
- intro: NIPS 2016 workshop: Learning with Tensors: Why Now and How?
- arxiv: https://arxiv.org/abs/1611.03214
- github: https://github.com/timgaripov/TensorNet-TF
Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning
- intro: “the energy consumption of AlexNet and GoogLeNet are reduced by 3.7x and 1.6x, respectively, with less than 1% top-5 accuracy loss”
- arxiv: https://arxiv.org/abs/1611.05128
Net-Trim: A Layer-wise Convex Pruning of Deep Neural Networks
LCNN: Lookup-based Convolutional Neural Network
- intro: “Our fastest LCNN offers 37.6x speed up over AlexNet while maintaining 44.3% top-1 accuracy.”
- arxiv: https://arxiv.org/abs/1611.06473
Deep Tensor Convolution on Multicores
- intro: present the first practical CPU implementation of tensor convolution optimized for deep networks of small kernels
- arxiv: https://arxiv.org/abs/1611.06565
Training Sparse Neural Networks
FINN: A Framework for Fast, Scalable Binarized Neural Network Inference
- intro: Xilinx Research Labs & Norwegian University of Science and Technology & University of Sydney
- intro: 25th International Symposium on Field-Programmable Gate Arrays
- keywords: FPGA
- paper: http://www.idi.ntnu.no/~yamanu/2017-fpga-finn-preprint.pdf
- arxiv: https://arxiv.org/abs/1612.07119
- github: https://github.com/Xilinx/BNN-PYNQ
Deep Learning with INT8 Optimization on Xilinx Devices
- intro: “Xilinx’s integrated DSP architecture can achieve 1.75X solution-level performance at INT8 deep learning operations than other FPGA DSP architectures”
- paper: https://www.xilinx.com/support/documentation/white_papers/wp486-deep-learning-int8.pdf
Parameter Compression of Recurrent Neural Networks and Degredation of Short-term Memory
An OpenCL(TM) Deep Learning Accelerator on Arria 10
- intro: FPGA 2017
- arxiv: https://arxiv.org/abs/1701.03534
The Incredible Shrinking Neural Network: New Perspectives on Learning Representations Through The Lens of Pruning
- intro: CMU & Universitat Paderborn]
- arxiv: https://arxiv.org/abs/1701.04465
Deep Learning with Low Precision by Half-wave Gaussian Quantization
- intro: HWGQ-Net
- arxiv: https://arxiv.org/abs/1702.00953
DL-gleaning: An Approach For Improving Inference Speed And Accuracy
- intro: Electronics Telecommunications Research Institute (ETRI)
- paper: https://openreview.net/pdf?id=Hynn8SHOx
Energy Saving Additive Neural Network
- intro: Middle East Technical University & Bilkent University
- arxiv: https://arxiv.org/abs/1702.02676
Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights
- intro: ICLR 2017
- arxiv: https://arxiv.org/abs/1702.03044
Soft Weight-Sharing for Neural Network Compression
- intro: ICLR 2017. University of Amsterdam
- arxiv: https://arxiv.org/abs/1702.04008
- github: https://github.com/KarenUllrich/Tutorial-SoftWeightSharingForNNCompression
A Compact DNN: Approaching GoogLeNet-Level Accuracy of Classification and Domain Adaptation
- intro: CVPR 2017
- arxiv: https://arxiv.org/abs/1703.04071
Deep Convolutional Neural Network Inference with Floating-point Weights and Fixed-point Activations
- intro: ARM Research
- arxiv: https://arxiv.org/abs/1703.03073
DyVEDeep: Dynamic Variable Effort Deep Neural Networks
https://arxiv.org/abs/1704.01137
Incremental Network Quantization: Towards Lossless CNNs with Low-precision Weights
https://openreview.net/forum?id=HyQJ-mclg¬eId=HyQJ-mclg
Bayesian Compression for Deep Learning
https://arxiv.org/abs/1705.08665
A Kernel Redundancy Removing Policy for Convolutional Neural Network
https://arxiv.org/abs/1705.10748
Gated XNOR Networks: Deep Neural Networks with Ternary Weights and Activations under a Unified Discretization Framework
- keywords: discrete state transition (DST)
- arxiv: https://arxiv.org/abs/1705.09283
SEP-Nets: Small and Effective Pattern Networks
- intro: The University of Iowa & Snap Research
- arxiv: https://arxiv.org/abs/1706.03912
ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks
MEC: Memory-efficient Convolution for Deep Neural Network
- intro: ICML 2017
- arxiv: https://arxiv.org/abs/1706.06873
Extremely Low Bit Neural Network: Squeeze the Last Bit Out with ADMM
- intro: Alibaba Group
- keywords: alternating direction method of multipliers (ADMM)
- arxiv: https://arxiv.org/abs/1707.09870
An End-to-End Compression Framework Based on Convolutional Neural Networks
https://arxiv.org/abs/1708.00838
Learning Accurate Low-Bit Deep Neural Networks with Stochastic Quantization
- intro: BMVC 2017 Oral
- arxiv: https://arxiv.org/abs/1708.01001
Pruning
ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression
- intro: ICCV 2017. Nanjing University & Shanghai Jiao Tong University
- arxiv: https://arxiv.org/abs/1707.06342
- github(Caffe): https://github.com/Roll920/ThiNet
Neuron Pruning for Compressing Deep Networks using Maxout Architectures
- intro: GCPR 2017
- arxiv: https://arxiv.org/abs/1707.06838
Fine-Pruning: Joint Fine-Tuning and Compression of a Convolutional Network with Bayesian Optimization
- intro: BMVC 2017 oral. Simon Fraser University
- arxiv: https://arxiv.org/abs/1707.09102
Prune the Convolutional Neural Networks with Sparse Shrink
https://arxiv.org/abs/1708.02439
Quantized Neural Networks
Quantized Convolutional Neural Networks for Mobile Devices
- intro: Q-CNN
- intro: “Extensive experiments on the ILSVRC-12 benchmark demonstrate 4 ∼ 6× speed-up and 15 ∼ 20× compression with merely one percentage loss of classification accuracy”
- arxiv: http://arxiv.org/abs/1512.06473
- github: https://github.com/jiaxiang-wu/quantized-cnn
Training Quantized Nets: A Deeper Understanding
- intro: University of Maryland & Cornell University
- arxiv: https://arxiv.org/abs/1706.02379
Balanced Quantization: An Effective and Efficient Approach to Quantized Neural Networks
- intro: the top-5 error rate of 4-bit quantized GoogLeNet model is 12.7%
- arxiv: https://arxiv.org/abs/1706.07145
Binary Convolutional Neural Networks
XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
- arxiv: http://arxiv.org/abs/1603.05279
- github(Torch): https://github.com/mrastegari/XNOR-Net
A 7.663-TOPS 8.2-W Energy-efficient FPGA Accelerator for Binary Convolutional Neural Networks
Espresso: Efficient Forward Propagation for BCNNs
BMXNet: An Open-Source Binary Neural Network Implementation Based on MXNet
- keywords: Binary Neural Networks (BNNs)
- arxiv: https://arxiv.org/abs/1705.09864
- github: https://github.com/hpi-xnor
ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks
https://arxiv.org/abs/1706.02393
Binarized Convolutional Neural Networks with Separable Filters for Efficient Hardware Acceleration
- intro: Embedded Vision Workshop (CVPRW). UC San Diego & UC Los Angeles & Cornell University
- arxiv: https://arxiv.org/abs/1707.04693
Accelerating / Fast Algorithms
Fast Algorithms for Convolutional Neural Networks
- intro: “2.6x as fast as Caffe when comparing CPU implementations”
- keywords: Winograd’s minimal filtering algorithms
- arxiv: http://arxiv.org/abs/1509.09308
- github: https://github.com/andravin/wincnn
- slides: http://homes.cs.washington.edu/~cdel/presentations/Fast_Algorithms_for_Convolutional_Neural_Networks_Slides_reading_group_uw_delmundo_slides.pdf
- discussion: https://github.com/soumith/convnet-benchmarks/issues/59#issuecomment-150111895
- reddit: https://www.reddit.com/r/MachineLearning/comments/3nocg5/fast_algorithms_for_convolutional_neural_networks/?
Speeding up Convolutional Neural Networks By Exploiting the Sparsity of Rectifier Units
- intro: Hong Kong Baptist University
- arxiv: https://arxiv.org/abs/1704.07724
NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps
https://arxiv.org/abs/1706.01406
Channel Pruning for Accelerating Very Deep Neural Networks
- intro: ICCV 2017. Megvii Inc
- arxiv: https://arxiv.org/abs/1707.06168
Knowledge Distilling / Knowledge Transfer
Distilling the Knowledge in a Neural Network
- intro: NIPS 2014 Deep Learning Workshop
- author: Geoffrey Hinton, Oriol Vinyals, Jeff Dean
- arxiv: http://arxiv.org/abs/1503.02531
- blog: http://fastml.com/geoff-hintons-dark-knowledge/
- notes: https://github.com/dennybritz/deeplearning-papernotes/blob/master/notes/distilling-the-knowledge-in-a-nn.md
Like What You Like: Knowledge Distill via Neuron Selectivity Transfer
- intro: TuSimple
- arxiv: https://arxiv.org/abs/1707.01219
DarkRank: Accelerating Deep Metric Learning via Cross Sample Similarities Transfer
- intro: TuSimple
- keywords: pedestrian re-identification
- arxiv: https://arxiv.org/abs/1707.01220
Code Optimization
Production Deep Learning with NVIDIA GPU Inference Engine
- intro: convolution, bias, and ReLU layers are fused to form a single layer: CBR
- blog: https://devblogs.nvidia.com/parallelforall/production-deep-learning-nvidia-gpu-inference-engine/
speed improvement by merging batch normalization and scale #5
- github issue: https://github.com/sanghoon/pva-faster-rcnn/issues/5
Add a tool to merge ‘Conv-BN-Scale’ into a single ‘Conv’ layer.
https://github.com/sanghoon/pva-faster-rcnn/commit/39570aab8c6513f0e76e5ab5dba8dfbf63e9c68c/
Projects
Accelerate Convolutional Neural Networks
- intro: “This tool aims to accelerate the test-time computation and decrease number of parameters of deep CNNs.”
- github: https://github.com/dmlc/mxnet/tree/master/tools/accnn
OptNet
OptNet - reducing memory usage in torch neural networks
NNPACK: Acceleration package for neural networks on multi-core CPUs
- github: https://github.com/Maratyszcza/NNPACK
- comments(Yann LeCun): https://www.facebook.com/yann.lecun/posts/10153459577707143
Deep Compression on AlexNet
Tiny Darknet
CACU: Calculate deep convolution neurAl network on Cell Unit
model_compression: Implementation of model compression with knowledge distilling method
keras_compressor: Model Compression CLI Tool for Keras
- blog: https://nico-opendata.jp/ja/casestudy/model_compression/index.html
- github: https://github.com/nico-opendata/keras_compressor
Blogs
Neural Networks Are Impressively Good At Compression
https://probablydance.com/2016/04/30/neural-networks-are-impressively-good-at-compression/
“Mobile friendly” deep convolutional neural networks
- part 1: https://medium.com/@sidd_reddy/mobile-friendly-deep-convolutional-neural-networks-part-1-331120ad40f9#.uy64ladvz
- part 2: https://medium.com/@sidd_reddy/mobile-friendly-deep-convolutional-neural-networks-part-2-making-deep-nets-shallow-701b2fbd3ca9#.u58fkuak3
Lab41 Reading Group: Deep Compression
Accelerating Machine Learning
Compressing and regularizing deep neural networks
https://www.oreilly.com/ideas/compressing-and-regularizing-deep-neural-networks
Talks / Videos
Deep compression and EIE: Deep learning model compression, design space exploration and hardware acceleration
Deep Compression, DSD Training and EIE: Deep Neural Network Model Compression, Regularization and Hardware Acceleration
http://research.microsoft.com/apps/video/default.aspx?id=266664
Tailoring Convolutional Neural Networks for Low-Cost, Low-Power Implementation
- intro: tutorial at the May 2015 Embedded Vision Summit
- youtube: https://www.youtube.com/watch?v=xACJBACStaU
Resources
Embedded-Neural-Network
- intro: collection of works aiming at reducing model sizes or the ASIC/FPGA accelerator for machine learning
- github: https://github.com/ZhishengWang/Embedded-Neural-Network