BigData Data Ingestion links
Sun 20 September 2015Amazon Kinesis: real-time processing of streaming data at massive scale Apache Chukwa: data collection system Apache Flume: service to manage large amount of log data Apache Samza: stream processing framework, based on Kafla and YARN Apache Sqoop: tool to transfer data between Hadoop and a structured datastore Apache UIMA: Unstructured ...
Hadoop Streaming
Tue 15 September 2015Hadoop Streaming Made Simple using Joins and Keys with Python
BigData Data Visualization links
Mon 14 September 2015Arbor: graph visualization library using web workers and jQuery Bokeh is a Python interactive visualization library for large datasets that natively uses the latest web technologies. Its goal is to provide elegant, concise construction of novel graphics in the style of Protovis/D3, while delivering high-performance interactivity over large data ...
Spark Links
Mon 14 September 2015Articles A Docker Image for Graph Analytics on Neo4j with Apache Spark GraphX
Hive useful Links
Mon 14 September 2015Articles Using GenericUDFs to return multiple values in Apache Hive
BigData Testing
Mon 14 September 2015Slides Testing Big Data: Automated Testing of Hadoop with QuerySurge
BigData Business Intelligence links
Mon 14 September 2015ActivePivot: Java In-Memory OLAP cube stored in columns, with clearly decoupled pre/post processing Adatao: business intelligence and data science platform Apama analytics: platform for streaming analytics and intelligent automated action Atigeo xPatterns: data analytics platform BIME Analytics: business intelligence platform in the cloud Chartio: lean business intelligence platform to ...
BigData System Deployment links
Mon 14 September 2015Ankush: A big data cluster management tool that creates and manages clusters of different technologies. Apache Ambari: operational framework for Hadoop mangement Apache Bigtop: system deployment framework for the Hadoop ecosystem Apache Helix: cluster management framework Apache Mesos: cluster manager Apache Slider: is a YARN application to deploy existing distributed ...
The R-Hadoop technology stack (notes)
Sun 13 September 2015R is a free, open-source statistical programming language originally based on the S programming language. Here are a few reasons why R is a great place to start for data analysis: It’s completely free: SAS and SPSS are expensive to get started with, and you often need to buy ...
BigData Document Data Model links
Sun 23 August 2015Actian Versant: commercial object-oriented database management systems Crate Data: is an open source massively scalable data store. It requires zero administration Facebook Apollo: Facebook’s Paxos-like NoSQL database jumboDB: document oriented datastore over Hadoop LinkedIn Espresso: horizontally scalable document-oriented NoSQL data store MarkLogic: Schema-agnostic Enterprise NoSQL database technology Microsoft DocumentDB ...