Big Data Resources

Sep 22, 2015


Courses

MIT 6.S897: Large-Scale Systems(Matei Zaharia)

Papers

Learning to Hash for Indexing Big Data - A Survey

Random Forests for Big Data

Big data analytics: a survey

A Comparison of Big Data Frameworks on a Layered Dataflow Model

A survey of machine learning for big data processing

Projects

Open Big Data Group

Open Big Data Group

  • intro: This website contains a collection of libraries to be used in processing massive data size in highly distributed and paralleled environment
  • homepage: http://openbigdatagroup.github.io/

PLDA: Parallel C++ implementation of Latent Dirichlet Allocation

PSVM: Parallelizing Support Vector Machines on Distributed Computers

PFP: Parallel FP-Growth for Query Recommendation

Pspectralclustering: A parallel C++ implementation of Parallel Spectral Clustering

Speedo: Parallelizing Stochastic Gradient Descent for Deep Convolutional Neural Network

Videos

Awesome Big Data Algorithms

Blog

Uncovering Big Bias with Big Data