Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: What are recommended tools for OLAP for event data with durations
5 points by ninjakeyboard on Jan 21, 2018 | hide | past | favorite | 2 comments
We have a need to query on "activities" which exist over a period of time. EG Consider a user tracking their time online, if we want to track efficiency and show trends over time of working vs messing about on distracting sites, which datastores are good for this? I was looking at Druid https://en.wikipedia.org/wiki/Druid_(open-source_data_store) And ElasticSearch. Elasticsearch is a fulltext search first and happens to be able to do some of this but it can't really calculate things based on data. Eg if for 10 minute buckets I want to capture productivity % and then be able to also recalculate productivity % at 20 minute buckets, hour buckets, and show trends over time, I need to recalculate the productivity %. I'm wondering if there is a good set of tools for doing this. This isn't my forte so really appreciate some direction.


Take a look to Yandex ClickHouse, this is open-source append-only analytical database. It offers ultimate performance of OLAP queries and data like events log, and its SQL dialect includes a lot of specialized functions for metrics calculation.


How much data? Probably plays a big part in picking a solution.

I've most recently been using BigQuery for stuff like this -- streaming the data into BQ, running rollups using SQL there and either saving those results into new tables or pulling back the resultset and inserting it into another database.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: