BigData Applications links
Tue 28 July 2015Adobe Spindle: Next-generation web analytics processing with Scala, Spark, and Parquet Apache Kiji: framework to collect and analyze data in real-time, based on HBase Apache Nutch: open source web crawler Apache OODT: capturing, processing and sharing of data for NASA's scientific archives Apache Tika: content analysis toolkit Domino: Run ...
BigData Benchmarking links
Tue 28 July 2015Apache Hadoop Benchmarking: micro-benchmarks for testing Hadoop performances Berkeley SWIM Benchmark: real-world big data workload benchmark Big-Bench: Big Bench Workload Development Hive-benchmarks: some benchmarking queries for Apache Hive Hive-testbench: Testbench for experimenting with Apache Hive at any data scale. Intel HiBench: a Hadoop benchmark suite Netflix Inviso: performance focused Big ...
BigData Columnar Databases links
Tue 28 July 2015Amazon RedShift: data warehouse service, based on PostgreSQL C-Store: column oriented DBMS Google BigQuery: framework for interactive analysis, implementation of Dremel Google Dremel: framework for interactive analysis, implementation of Dremel MonetDB: column store database Parquet: columnar storage format for Hadoop Pivotal Greenplum: purpose-built, dedicated analytic data warehouse Vertica: is designed ...
BigData Data Warehouse links
Tue 28 July 2015Google Mesa: highly scalable analytic data warehousing system IBM BigInsights: data processing, warehousing and analytics Microsoft Cosmos: Microsoft's internal BigData analysis platform
BigData Distributed Filesystem links
Tue 28 July 2015Apache HDFS: a way to store large files across multiple machines BeeGFS: formerly FhGFS, parallel distributed file system Ceph Filesystem: software storage platform designed Disco DDFS: distributed filesystem Facebook Haystack: object storage system Google Colossus: distributed filesystem (GFS2) Google GFS: distributed filesystem Google Megastore: scalable, highly available storage GridGain: GGFS ...
BigData Distributed Programming links
Tue 28 July 2015AddThis Hydra: distributed data processing and storage system originally developed at AddThis Akela: Mozilla's utility library for Hadoop, HBase, Pig, etc. AMPLab SIMR: run Spark on Hadoop MapReduce v1 AMPLab Succinct: Enabling Queries on Compressed Data Apache Crunch: Java library provides a framework for writing, testing, and running MapReduce ...
BigData Embedded Databases links
Tue 28 July 2015Actian PSQL: ACID-compliant DBMS developed by Pervasive Software, optimized for embedding in applications BerkeleyDB: a software library that provides a high-performance embedded database for key/value data HamsterDB: transactional key-value database HanoiDB: Erlang LSM BTree Storage LevelDB: a fast key-value storage library written at Google that provides an ordered mapping ...
BigData Frameworks links
Tue 28 July 2015Apache Hadoop: framework for distributed processing. Integrates MapReduce (parallel processing), YARN (job scheduling) and HDFS (distributed file system)
BigData Graph Data Model links
Tue 28 July 2015Apache Giraph: implementation of Pregel, based on Hadoop Apache Spark Bagel: implementation of Pregel, part of Spark ArangoDB: multi model distribuited database Facebook TAO: TAO is the distributed data store that is widely used at facebook to store and serve the social graph Faunus: Hadoop-based graph analytics engine for analyzing ...