ownport.github.io/notes

BigData Benchmarking links

Tue 28 July 2015
Apache Hadoop Benchmarking: micro-benchmarks for testing Hadoop performances Berkeley SWIM Benchmark: real-world big data workload benchmark Big-Bench: Big Bench Workload Development Hive-benchmarks: some benchmarking queries for Apache Hive Hive-testbench: Testbench for experimenting with Apache Hive at any data scale. Intel HiBench: a Hadoop benchmark suite Netflix Inviso: performance focused Big ...

BigData Columnar Databases links

Tue 28 July 2015
Amazon RedShift: data warehouse service, based on PostgreSQL C-Store: column oriented DBMS Google BigQuery: framework for interactive analysis, implementation of Dremel Google Dremel: framework for interactive analysis, implementation of Dremel MonetDB: column store database Parquet: columnar storage format for Hadoop Pivotal Greenplum: purpose-built, dedicated analytic data warehouse Vertica: is designed ...

BigData Data Warehouse links

Tue 28 July 2015
Google Mesa: highly scalable analytic data warehousing system IBM BigInsights: data processing, warehousing and analytics Microsoft Cosmos: Microsoft's internal BigData analysis platform

BigData Distributed Filesystem links

Tue 28 July 2015
Apache HDFS: a way to store large files across multiple machines BeeGFS: formerly FhGFS, parallel distributed file system Ceph Filesystem: software storage platform designed Disco DDFS: distributed filesystem Facebook Haystack: object storage system Google Colossus: distributed filesystem (GFS2) Google GFS: distributed filesystem Google Megastore: scalable, highly available storage GridGain: GGFS ...

BigData Distributed Programming links

Tue 28 July 2015
AddThis Hydra: distributed data processing and storage system originally developed at AddThis Akela: Mozilla's utility library for Hadoop, HBase, Pig, etc. AMPLab SIMR: run Spark on Hadoop MapReduce v1 AMPLab Succinct: Enabling Queries on Compressed Data Apache Crunch: Java library provides a framework for writing, testing, and running MapReduce ...

BigData Embedded Databases links

Tue 28 July 2015
Actian PSQL: ACID-compliant DBMS developed by Pervasive Software, optimized for embedding in applications BerkeleyDB: a software library that provides a high-performance embedded database for key/value data HamsterDB: transactional key-value database HanoiDB: Erlang LSM BTree Storage LevelDB: a fast key-value storage library written at Google that provides an ordered mapping ...

BigData Frameworks links

Tue 28 July 2015
Apache Hadoop: framework for distributed processing. Integrates MapReduce (parallel processing), YARN (job scheduling) and HDFS (distributed file system)

BigData Graph Data Model links

Tue 28 July 2015
Apache Giraph: implementation of Pregel, based on Hadoop Apache Spark Bagel: implementation of Pregel, part of Spark ArangoDB: multi model distribuited database Facebook TAO: TAO is the distributed data store that is widely used at facebook to store and serve the social graph Faunus: Hadoop-based graph analytics engine for analyzing ...

BigData Integrated Development Environments links

Tue 28 July 2015
R-Studio: IDE for R

BigData Key-Map Data Model links

Tue 28 July 2015
Actian Vector: column-oriented analytic database Apache Accumulo: distribuited key/value store, built on Hadoop Apache Cassandra: column-oriented distribuited datastore, inspired by BigTable Apache HBase: column-oriented distribuited datastore, inspired by BigTable Facebook HydraBase: evolution of HBase made by Facebook Google BigTable: column-oriented distributed datastore Google Cloud Datastore: is a fully managed ...