Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

README.md

Connecting to Databases

Various Resources for Administration of Databases

  1. Postgres
  2. ElasticSearch
  3. Neo4J
  4. Cassandra

Data Warehouse Engines

Cube-Replacements

  • Apache Druid - Apache Druid (incubating) is a high performance analytics data store for event-driven data.
  • Apache Pinot (Incubating) - A realtime distributed OLAP datastore
  • Apache Kylin - Apache Kylin is an open source Distributed Analytics Engine, contributed by eBay Inc., provides SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets.
  • [AtScale] - Not Open Source

Commercial Data Warehouses

Advantages: Joins, ...

Serviced Cloud, Data Visualization, and Analytics

Provides Querying and Dashboarding options all in one.

Distributed Compute Tools

  • Dask
  • Apache Spark
  • Apache Flink
  • Apache Kafka

Orchestrators, Job Schedulers, Monitors

  • Apache Airflow (created in Airbnb)
  • Luigi (created in Spotify)
  • Azkaban (created in LinkedIn)
  • Apache Oozie (for Hadoop systems)

Database Best Practices