Courses
MIT 6.S897: Large-Scale Systems(Matei Zaharia)
- instructor: Matei Zaharia
- homepage: http://people.csail.mit.edu/matei/courses/2015/6.S897/
Papers
Learning to Hash for Indexing Big Data - A Survey
Random Forests for Big Data
Big data analytics: a survey
A Comparison of Big Data Frameworks on a Layered Dataflow Model
A survey of machine learning for big data processing
Projects
Open Big Data Group
Open Big Data Group
- intro: This website contains a collection of libraries to be used in processing massive data size in highly distributed and paralleled environment
- homepage: http://openbigdatagroup.github.io/
PLDA: Parallel C++ implementation of Latent Dirichlet Allocation
PSVM: Parallelizing Support Vector Machines on Distributed Computers
- homepage: http://openbigdatagroup.github.io/psvm/
- paper: http://papers.nips.cc/paper/3202-parallelizing-support-vector-machines-on-distributed-computers.pdf
- github: https://github.com/openbigdatagroup/psvm
PFP: Parallel FP-Growth for Query Recommendation
Pspectralclustering: A parallel C++ implementation of Parallel Spectral Clustering
- homepage: http://openbigdatagroup.github.io/pspectralclustering/
- github: https://github.com/openbigdatagroup/pspectralclustering
Speedo: Parallelizing Stochastic Gradient Descent for Deep Convolutional Neural Network
- homepage: http://openbigdatagroup.github.io/speedo/
- github: https://github.com/openbigdatagroup/speedo
Videos
Awesome Big Data Algorithms
Blog
Uncovering Big Bias with Big Data