Alenka: GPU database engine

nrjdhsbsid · on Jan 18, 2017

I keep hearing the promise of GPU databases but they don't seem to be terribly useful for most real world workloads.

It reminds me of the big hoopla for GPU h264 encoders. When they came out everyone realized the quality was worse and not much faster.

Some things don't lend themselves to parallel processing, notably anything linear like transactions.

I mean yeah the GPU can sort a hundred billion items a second but how often do you really need to sort that many items using a database? In 99.9% of uses you have indexing or limits on the number of results.

Just saying, this program looks more like a stream processing platform with a SQL-like frontend than a full database

arnon · on Jan 18, 2017

You're thinking about transactional databases, and you're right. Transactional databases will probably not benefit hugely from a GPU. That's not saying it's impossible, but probably not worth the effort.

However, there are so many types of databases around. Lambda architectures are all the rage now - you keep one database for your transactionals, and another for analytics. Analytics are huge, in the multi-billions of dollars every year and they've become one of the most important parts of steering a business and deciding on new strategy. Larger businesses don't just 'go for it' anymore, they analyze, and inspect, and dig deep into their historical data to find out if something is worth doing.

GPUs tend to lend themselves well to analytics, contrary to transactions. Specifically, columnar databases. When the columns are all of the same data type, and the data locality is high, GPUs perform /very/ well.

Regarding your sorting point you may not really want to sort everything, you got that bit right. But what if you want to perform a `JOIN` on a bunch of data?

It makes more sense to sort it first, because the JOIN would be much faster - matching keys would be much easier.

Now, if you were performing really fast SORT on a GPU, you're saving precious processing time.

jbooth · on Jan 18, 2017

Doesn't the overhead of moving things back and forth between GPU memory and main memory wipe out most potential gains, though?

If you're running analytical workloads on big data sets, you're typically I/O bound to start with. It seems like managing moving little pieces of it back and forth to the GPU to compute is going to be a big PITA, add lots of little latencies, and gain you absolutely nothing. What am I missing there?

arnon · on Jan 18, 2017

1. Not everything needs to be pushed up to the GPU. Some things are better left in RAM.

2. What if you only push indexes or similar up to the GPU, like an AB-tree index? You're keeping all of the 'heavy' stuff down, and only uploading a representation of it, to be later replaced with the actual data.

3. Think compression/decompression done on the GPU directly.

felipe_aramburu · on Jan 18, 2017

At Blazing we also build GPU db and have always loved what this project (Alenka) is doing. First of all when you are talking about I/O bound which I/O are you talking about? Do you mean from disk? From RAM? There are many ways of getting around some of these I/O bottleknecks like sending compressed data or processing while transferring. You're assumption that these workloads are typically I/O bound is correct but then agian GPU databases aren't always going after the most "typical" workloads. If you are doing large amounts of transformations, or complicated joins then you also can benifite hugely from the use of a gpu. Ever try to join several tables together across multiple columns? If you do then you should probably use a hash join and if you are using a hash join you better believe you are going to want to do be doing computationally intensive things like sorting and hash generation. Have you tried any gpu databases to see if this concern is valid? GPU dbs can take advantage of things like very expensive cascading compression that many normal databases can't.

pvg · on Jan 18, 2017

Different approaches and solutions for transactional processing and analytics, decision-making informed by data analysis, all that's been around for decades along with its own silly jargon - OLAP, OLTP, data mining...

gshulegaard · on Jan 18, 2017

> Transactional databases will probably not benefit hugely from a GPU.

What about parallel queries over a Restriction-Union normalized data model?

I think the benefits would be similar in nature to columnar stores as you note:

> GPUs tend to lend themselves well to analytics, contrary to transactions. Specifically, columnar databases. When the columns are all of the same data type, and the data locality is high, GPUs perform /very/ well.

arnon · on Jan 19, 2017

Maybe, but that's a very specific situation. You don't typically create a full RDBMS for a specific data model

tmostak · on Jan 18, 2017

Not every database is a transactional database. The precise goal of databases like Redshift, Vertica, MemSQL, SAP HANA, Exasol and Impala are to be able to crunch multi-billion row datasets as fast as possible. Indexes often don't help with analytic queries as frequently good chunks of tables need to be scanned. You might apply a limit at the end of the query but that doesn't mean you don't need to scan billions of rows to get to the result that you are applying a limit to. There are many use cases for speed but the simplest one I can think of is powering Tableau/other BI products. If GPUs help make your dashboard refresh interactively instead of taking 30 seconds when filtering/exploring the data, and allow tens to hundreds of concurrent users on a system rather than just a few, that's a huge win for most of the Fortune 1000.

Data warehousing is a $30B/year business and it often boils down to a price/performance game. Any technology (including but not limited to GPUs) that can change the equation by at least an order-of-magnitude will be disruptive in that space.

gshulegaard · on Jan 18, 2017

> Not every database is a transactional database.

...

> The precise goal of databases like Redshift, ...

Huh? It was my understanding that Redshift was a fork of PostgreSQL and is therefore not only transactional but also relational.

Simple Google search confirms transactions in Redshift: http://docs.aws.amazon.com/redshift/latest/dg/r_BEGIN.html

dgudkov · on Jan 18, 2017

You're confused by the word transactional. The parent meant transactional in the sense that it's oriented for transactional systems (OLTP). Redshift and other columnar DBMSs are designed for analytic workloads, rather than transactional. Nevertheless, they are relational and support database transactions.

gshulegaard · on Jan 18, 2017

So I guess the parent's use of transactional was flawed then?

I am also confused by the implication that OLTP and OLAP are mutually exclusive. It is my understanding that they are not...

tmostak · on Jan 18, 2017

People in the database space use the term "transactional database" loosely to refer to databases optimized for handling simultaneous inserts, deletes and updates at rates often measured in thousands per second - the kind that you would use to say power an airline ticketing system or inventory management for a retailer.

Just because a database can support transactions does not mean that it is geared for transactional workloads, and hence would likely not be called transactions.

OLTP+OLAP is called HTAP. Its an active area of research but to my knowledge there is still no silver bullet there - systems that do it often store data in both row and columnar format to ensure they are optimized for both, with associated overhead.

gshulegaard · on Jan 18, 2017

I guess I have always thought of a transaction as an ACID-compliant concept and "transactional database" to refer to a DB that supports transactions[1].

To that end, I always thought of OLTP vs. OLAP model and engine optimizations to be more or less orthogonal to whether or not a DB is transactional. I would even suggest that the inclusion of transactions in Redshift as justification for my seemingly unconventional view:

> Some PostgreSQL features that are suited to smaller-scale OLTP processing, such as secondary indexes and efficient single-row data manipulation operations, have been omitted to improve performance.[2]

But who knows, maybe I am barking up the wrong tree so-to-speak.

[1] https://en.wikipedia.org/wiki/Database_transaction#Transacti...

[2] http://docs.aws.amazon.com/redshift/latest/dg/c_redshift-and...

paulddraper · on Jan 18, 2017

Yes, Redshift is a fork of PostgreSQL.

No, it is not transactional.

Yes, it is relational.

floatboth · on Jan 18, 2017

> GPU h264 encoders

Are there any GPU (shader) encoders?

There are dedicated fixed-function hardware encoders (VCE and NVENC) on graphics cards, and they're very popular, because they're the best way to record game footage if you don't have dedicated capture hardware. You don't want x264/5 hogging the CPU when you're playing games!

Tuna-Fish · on Jan 18, 2017

The fixed-function encoders are now less than half as good (that is, they require double the bitrate for same quality) as x264. Anyone who is serious about streaming gets a 8-core CPU or maybe a second computer to do the encoding.

floatboth · on Jan 18, 2017

No way it's double. 1.5x maybe. And they have H.265 now (in Polaris cards).

Not everybody can afford that setup, and I think even an 8-core will struggle with 4K…

Tuna-Fish · on Jan 18, 2017

When x264 is properly optimized for video games, having the hardware encoders require only double the bitrate for similar results is probably optimistic, if anything. Especially in DOTA/LOL, x264 is just leagues apart in dealing with the semi-static backgrounds. You can tell the people who use the hw encoders because the stream looks like mush, even at high quality settings.

The target for most streamers is Twitch 1080p H.264, because that's the platform where the money is made. 4K is not needed or useful, and H.265 won't help simply because twitch won't stream it.

floatboth · on Jan 18, 2017

4K is needed and useful for recording videos, not streaming.

AlphaSite · on Jan 18, 2017

CPU encoders are popular enough if you can afford a second box (for the improved quality).

kalleboo · on Jan 18, 2017

> they don't seem to be terribly useful for most real world workloads

It seems to me like there are mainly two kinds of cost-limited database workloads - complicated operations on datasets that fit in RAM, or simple operations on datasets that need to be distributed. For the former you're better off writing custom software, and for the latter you're I/O rather than CPU limited... Maybe SSDs or low-power-high-memory ARM servers could change that equation

tmostak · on Jan 18, 2017

By your logic the entire in-memory CPU database space is wasted energy. I'm sure every analyst who wants to do some filters, group bys, joins and subqueries over a big dataset wants to write custom code (in C/C++ to be fast!), hoping their code will be as fast as a database optimized for such purposes, and then rewrite it all as soon as they need to tweak their query.

AndrewUnmuted · on Jan 18, 2017

I agree that at one point GPU h264 encoding was lower quality. But it's always been much faster, for me.

And today's NVENC h264 encoding quality is approaching x264 levels of quality, until you get to the lower end of the bit rate spectrum. x264 really shines at bpp values of 0.05 and lower, a feat NVENV has yet to achieve.

zer0t3ch · on Jan 18, 2017

> everyone realized the quality was worse and not much faster

I can't speak for the h264 situation, as I wasn't involved at the time, but with the newer h265 hardware, it is obnoxiously faster with minimal (in terms of end-user, not distributor) quality loss.

dijit · on Jan 18, 2017

Are those the CPU instructions or the GPU ones?

I remember when h264 CPU accelerators hit the scene. It was a game changer.

zer0t3ch · on Jan 21, 2017

I don't know anything about h265 CPU accelerators, I'm comparing GPU HEVC (h265) encoding with a dedicated chip on the GPU to standard CPU encoding. (Using libx265 IIRC)

worldsayshi · on Jan 18, 2017

It might that the optimal use cases for this kind of tech hasn't really materialized yet because the tech is not widely used?

valarauca1 · on Jan 18, 2017

There are several reasons

1. GPU RAM storage isn't as fine grained as CPU virtual memory. While GPU's have virtual memory they don't have the same degree of Copy On Write hardware utilities. This make rolling back snapshots and transactions within memory very difficult.

2. GPU's don't have dedicated non-volatile storage (well some do, but it is used more as a cache, and is treated as volatile). So for loading data you:

      SSD -> RAM -> CPU *copy* CPU -> RAM-> GPU

This isn't really a question of technological maturity it is more a question of PCIe doesn't allow an NVMe SSD to talk directly to a PCIe GPU. Nor do modern OS's have any model how to do this, nor do GPU's support file systems.

3. SQL based data querying is parallel friendly. At it's core SQL is 99% Filter/Map operations. The original goal of SQL was to be bottle necked by HDD access times so the vast majority of the query work is done in O(n). With a

      SSD -> RAM -> CPU *copy* CPU -> RAM-> GPU

You are already paying that O(n) load+process price at copy time (minus branches).

So the savings are only present if the data can persist on the GPU. Ultimately having multiple CPU threads to do the Map/Filter do the same job

There isn't just one problem:

1. GPU's don't support features to make holding data in RAM useful

2. PCIe doesn't support features to make loading data into the GPU fast.

3. The very design of SQL makes copying data into the GPU moot. Spreading a Map/Filter over 10-20 threads isn't rocket science.

tmostak · on Jan 18, 2017

Your first point is an implementation detail. Why couldn't CPU RAM be used as well in a GPU database system?

Re CPU vs GPU performance you are neglecting the fact that GPU RAM can be an order-of-magnitude faster than CPU RAM, not to mention the fact that complex queries can often become compute bound (geospatial queries are a prominent example).

valarauca1 · on Jan 18, 2017

     Why couldn't CPU RAM be used as well in a GPU database system?

There is no market demand for this currently. GPU's don't need advanced MMU's because nobody wants them. The same is true for PCIe data loading.

    CPU vs GPU performance you are neglecting the fact that GPU RAM can be an order-of-magnitude faster than CPU RAM,

No I'm not. If your data set can remain in GPU ram long term yes there is a big performance speed up. The problem here (as I outlined before)

1. GPU MMU's are less advanced, so they don't handle transactions well (see above comments).

2. As you can't perform transactions efficiently, you need read only data. Which is possible but many databases aren't read only, or databases NOT being written too are rare.

3. Ensuring your full dataset can fit in GPU memory (typically <20GB) is a damning limitation. Streaming data into the GPU is rather sloppy and limits compute though-put.

nine_k · on Jan 18, 2017

The CPU / RAM is a bottleneck between the persistent storage and the fast GPU.

There's a well-known way to increase network throughput and reduce latency by running a whole dedicated IP stack + hardware driver in the user space of the process that needs it. This removes the bottleneck of the OS kernel.

I wonder if there's a way to remove the bottleneck in the GPU case by something similar, by dedicating a piece of hardware to the SSD interconnect without having a CPU as an intermediary. AFAICT you still cannot DMA from a disk directly to GPU RAM, even though you have enough PCIe lanes coming to the GPU. Is this true? If so, can it change in a way that is still compatible with traditional, "normal" operation of the PC architecture?

kakoni · on Jan 18, 2017

For postgres checkout https://github.com/pg-strom/devel

arnon · on Jan 18, 2017

I don't think the development of pg-strom is still active, as far as I can tell

marklit · on Jan 18, 2017

If anyone is interested in getting this setup I put together some install notes along with some speed bumps I encountered while benchmarking Alenka a few months back: http://tech.marksblogg.com/alenka-open-source-gpu-database.h...

general_ai · on Jan 18, 2017

Calling this a "database" is a bit of an exaggeration.

xiphias · on Jan 18, 2017

It's not called a database...it's an engine that looks simple to use for computations on CSV files

keknaut · on Jan 18, 2017

rotten · on Jan 18, 2017

We don't really call pandas a database either. It looks like a data processing tool/library. "real" databases have integrated persistence models, as well as discussions and design tradeoffs regarding ACID transactions and scalability. There are query planners and indexes, constraints (type, value, foreign key) and other logic that can help enforce business rules around the data. There are triggers and embedded functions too.

Still, it is pretty cool technology and I think something that will be integrated with a "real" database near you sometime soon.

paulmd · on Jan 18, 2017

That's a matter of perspective. I don't think this is too different in concept from, say, Bigtable, which is billed by Google as a database.

general_ai · on Jan 19, 2017

BT wasn't actually billed as "database" internally, until they had to sell it in Google Cloud where the definition of what constitutes a database is much looser. Inside Google it's known as a multidimensional hash table.

paulmd · on Jan 19, 2017

I would definitely agree that's a more precise definition. Constraining "database" to only ever refer to RDBMS is what I really have a problem with.

But there really isn't a brightline between what's a table, and what's a filesystem, and what's a database. You can put a blob in a database or BigTable, and MongoDB is really close to being a flat file conceptually. You can have a filesystem or database that is content-accessible. You can have a filesystem that is atomic and supports rollbacks and can store relational data like symlinks. A virtual filesystem like LVM can support schema-like volumes on top of it.

At the end of the day it's all just technology that lets me abstract my writes so I can deal with a more simplistic model backed by certain guarantees about behavior. I want to write a program that does XYZ, not write a filesystem/database driver. From there it's all just various tradeoffs.

vegabook · on Jan 18, 2017

good to see open source activity in this space given the eye-watering prices that mapd is charging.

https://aws.amazon.com/marketplace/pp/B01M0ZY2OV?qid=1484735...

gcp · on Jan 18, 2017

Should say CUDA instead of GPU.

MayeulC · on Jan 18, 2017

That's true. Anyone knows how good are the CUDA -> OpenCL translators? Or has an open CUDA compiler has been written (LLVM front end, likely)?

So, in a nutshell, is it possible to run CUDA on a non-nVidia GPU nowadays (AMD or else)?

floatboth · on Jan 18, 2017

https://github.com/GPUOpen-ProfessionalCompute-Tools/HIP/blo...

CUDA → HIP translator. HIP is an abstraction over CUDA and AMD's HCC: https://github.com/RadeonOpenCompute/hcc/wiki

dagw · on Jan 18, 2017

otoy.com claimed to have developed a CUDA -> OpenCL and CUDA -> CPU compiler, but they've not released anything and they've been very quite about it for the past 9 month or so.

There's also this: https://github.com/vtsynergy/CU2CL but I know nothing about it.

m1sta_ · on Jan 18, 2017

Looks interesting. Need a lot more information.

chatman · on Jan 18, 2017

License information missing.

MayeulC · on Jan 18, 2017

Apache 2.0, going by the source files. Edit: though the "bison" files are GPLv3+, son It might be a safer bet.

nrjdhsbsid · on Jan 18, 2017

Wot? Somebody threw up this "database" without even checking the licenses? I'm sure it works great

krona · on Jan 18, 2017

fyi, the Bison GPL exception is in its documentation: https://www.gnu.org/software/bison/manual/html_node/Conditio...

The exception is littered throughout bison generated code: "This special exception was added by the Free Software Foundation..."

nrjdhsbsid · on Jan 18, 2017

Fair point, I'll leave my comment up so yours make sense. The downvotes should be fun :)