I keep hearing the promise of GPU databases but they don't seem to be terribly useful for most real world workloads.
It reminds me of the big hoopla for GPU h264 encoders. When they came out everyone realized the quality was worse and not much faster.
Some things don't lend themselves to parallel processing, notably anything linear like transactions.
I mean yeah the GPU can sort a hundred billion items a second but how often do you really need to sort that many items using a database? In 99.9% of uses you have indexing or limits on the number of results.
Just saying, this program looks more like a stream processing platform with a SQL-like frontend than a full database
You're thinking about transactional databases, and you're right. Transactional databases will probably not benefit hugely from a GPU.
That's not saying it's impossible, but probably not worth the effort.
However, there are so many types of databases around. Lambda architectures are all the rage now - you keep one database for your transactionals, and another for analytics. Analytics are huge, in the multi-billions of dollars every year and they've become one of the most important parts of steering a business and deciding on new strategy.
Larger businesses don't just 'go for it' anymore, they analyze, and inspect, and dig deep into their historical data to find out if something is worth doing.
GPUs tend to lend themselves well to analytics, contrary to transactions. Specifically, columnar databases. When the columns are all of the same data type, and the data locality is high, GPUs perform /very/ well.
Regarding your sorting point you may not really want to sort everything, you got that bit right. But what if you want to perform a `JOIN` on a bunch of data?
It makes more sense to sort it first, because the JOIN would be much faster - matching keys would be much easier.
Now, if you were performing really fast SORT on a GPU, you're saving precious processing time.
Doesn't the overhead of moving things back and forth between GPU memory and main memory wipe out most potential gains, though?
If you're running analytical workloads on big data sets, you're typically I/O bound to start with. It seems like managing moving little pieces of it back and forth to the GPU to compute is going to be a big PITA, add lots of little latencies, and gain you absolutely nothing. What am I missing there?
1. Not everything needs to be pushed up to the GPU. Some things are better left in RAM.
2. What if you only push indexes or similar up to the GPU, like an AB-tree index? You're keeping all of the 'heavy' stuff down, and only uploading a representation of it, to be later replaced with the actual data.
3. Think compression/decompression done on the GPU directly.
At Blazing we also build GPU db and have always loved what this project (Alenka) is doing. First of all when you are talking about I/O bound which I/O are you talking about? Do you mean from disk? From RAM? There are many ways of getting around some of these I/O bottleknecks like sending compressed data or processing while transferring. You're assumption that these workloads are typically I/O bound is correct but then agian GPU databases aren't always going after the most "typical" workloads. If you are doing large amounts of transformations, or complicated joins then you also can benifite hugely from the use of a gpu. Ever try to join several tables together across multiple columns? If you do then you should probably use a hash join and if you are using a hash join you better believe you are going to want to do be doing computationally intensive things like sorting and hash generation. Have you tried any gpu databases to see if this concern is valid? GPU dbs can take advantage of things like very expensive cascading compression that many normal databases can't.
Different approaches and solutions for transactional processing and analytics, decision-making informed by data analysis, all that's been around for decades along with its own silly jargon - OLAP, OLTP, data mining...
> Transactional databases will probably not benefit hugely from a GPU.
What about parallel queries over a Restriction-Union normalized data model?
I think the benefits would be similar in nature to columnar stores as you note:
> GPUs tend to lend themselves well to analytics, contrary to transactions. Specifically, columnar databases. When the columns are all of the same data type, and the data locality is high, GPUs perform /very/ well.
Not every database is a transactional database. The precise goal of databases like Redshift, Vertica, MemSQL, SAP HANA, Exasol and Impala are to be able to crunch multi-billion row datasets as fast as possible. Indexes often don't help with analytic queries as frequently good chunks of tables need to be scanned. You might apply a limit at the end of the query but that doesn't mean you don't need to scan billions of rows to get to the result that you are applying a limit to. There are many use cases for speed but the simplest one I can think of is powering Tableau/other BI products. If GPUs help make your dashboard refresh interactively instead of taking 30 seconds when filtering/exploring the data, and allow tens to hundreds of concurrent users on a system rather than just a few, that's a huge win for most of the Fortune 1000.
Data warehousing is a $30B/year business and it often boils down to a price/performance game. Any technology (including but not limited to GPUs) that can change the equation by at least an order-of-magnitude will be disruptive in that space.
You're confused by the word transactional. The parent meant transactional in the sense that it's oriented for transactional systems (OLTP). Redshift and other columnar DBMSs are designed for analytic workloads, rather than transactional. Nevertheless, they are relational and support database transactions.
People in the database space use the term "transactional database" loosely to refer to databases optimized for handling simultaneous inserts, deletes and updates at rates often measured in thousands per second - the kind that you would use to say power an airline ticketing system or inventory management for a retailer.
Just because a database can support transactions does not mean that it is geared for transactional workloads, and hence would likely not be called transactions.
OLTP+OLAP is called HTAP. Its an active area of research but to my knowledge there is still no silver bullet there - systems that do it often store data in both row and columnar format to ensure they are optimized for both, with associated overhead.
I guess I have always thought of a transaction as an ACID-compliant concept and "transactional database" to refer to a DB that supports transactions[1].
To that end, I always thought of OLTP vs. OLAP model and engine optimizations to be more or less orthogonal to whether or not a DB is transactional. I would even suggest that the inclusion of transactions in Redshift as justification for my seemingly unconventional view:
> Some PostgreSQL features that are suited to smaller-scale OLTP processing, such as secondary indexes and efficient single-row data manipulation operations, have been omitted to improve performance.[2]
But who knows, maybe I am barking up the wrong tree so-to-speak.
There are dedicated fixed-function hardware encoders (VCE and NVENC) on graphics cards, and they're very popular, because they're the best way to record game footage if you don't have dedicated capture hardware. You don't want x264/5 hogging the CPU when you're playing games!
The fixed-function encoders are now less than half as good (that is, they require double the bitrate for same quality) as x264. Anyone who is serious about streaming gets a 8-core CPU or maybe a second computer to do the encoding.
When x264 is properly optimized for video games, having the hardware encoders require only double the bitrate for similar results is probably optimistic, if anything. Especially in DOTA/LOL, x264 is just leagues apart in dealing with the semi-static backgrounds. You can tell the people who use the hw encoders because the stream looks like mush, even at high quality settings.
The target for most streamers is Twitch 1080p H.264, because that's the platform where the money is made. 4K is not needed or useful, and H.265 won't help simply because twitch won't stream it.
> they don't seem to be terribly useful for most real world workloads
It seems to me like there are mainly two kinds of cost-limited database workloads - complicated operations on datasets that fit in RAM, or simple operations on datasets that need to be distributed. For the former you're better off writing custom software, and for the latter you're I/O rather than CPU limited... Maybe SSDs or low-power-high-memory ARM servers could change that equation
By your logic the entire in-memory CPU database space is wasted energy. I'm sure every analyst who wants to do some filters, group bys, joins and subqueries over a big dataset wants to write custom code (in C/C++ to be fast!), hoping their code will be as fast as a database optimized for such purposes, and then rewrite it all as soon as they need to tweak their query.
I agree that at one point GPU h264 encoding was lower quality. But it's always been much faster, for me.
And today's NVENC h264 encoding quality is approaching x264 levels of quality, until you get to the lower end of the bit rate spectrum. x264 really shines at bpp values of 0.05 and lower, a feat NVENV has yet to achieve.
> everyone realized the quality was worse and not much faster
I can't speak for the h264 situation, as I wasn't involved at the time, but with the newer h265 hardware, it is obnoxiously faster with minimal (in terms of end-user, not distributor) quality loss.
I don't know anything about h265 CPU accelerators, I'm comparing GPU HEVC (h265) encoding with a dedicated chip on the GPU to standard CPU encoding. (Using libx265 IIRC)
1. GPU RAM storage isn't as fine grained as CPU virtual memory. While GPU's have virtual memory they don't have the same degree of Copy On Write hardware utilities. This make rolling back snapshots and transactions within memory very difficult.
2. GPU's don't have dedicated non-volatile storage (well some do, but it is used more as a cache, and is treated as volatile). So for loading data you:
SSD -> RAM -> CPU *copy* CPU -> RAM-> GPU
This isn't really a question of technological maturity it is more a question of PCIe doesn't allow an NVMe SSD to talk directly to a PCIe GPU. Nor do modern OS's have any model how to do this, nor do GPU's support file systems.
3. SQL based data querying is parallel friendly. At it's core SQL is 99% Filter/Map operations. The original goal of SQL was to be bottle necked by HDD access times so the vast majority of the query work is done in O(n). With a
SSD -> RAM -> CPU *copy* CPU -> RAM-> GPU
You are already paying that O(n) load+process price at copy time (minus branches).
So the savings are only present if the data can persist on the GPU. Ultimately having multiple CPU threads to do the Map/Filter do the same job
There isn't just one problem:
1. GPU's don't support features to make holding data in RAM useful
2. PCIe doesn't support features to make loading data into the GPU fast.
3. The very design of SQL makes copying data into the GPU moot. Spreading a Map/Filter over 10-20 threads isn't rocket science.
Your first point is an implementation detail. Why couldn't CPU RAM be used as well in a GPU database system?
Re CPU vs GPU performance you are neglecting the fact that GPU RAM can be an order-of-magnitude faster than CPU RAM, not to mention the fact that complex queries can often become compute bound (geospatial queries are a prominent example).
Why couldn't CPU RAM be used as well in a GPU database system?
There is no market demand for this currently. GPU's don't need advanced MMU's because nobody wants them. The same is true for PCIe data loading.
CPU vs GPU performance you are neglecting the fact that GPU RAM can be an order-of-magnitude faster than CPU RAM,
No I'm not. If your data set can remain in GPU ram long term yes there is a big performance speed up. The problem here (as I outlined before)
1. GPU MMU's are less advanced, so they don't handle transactions well (see above comments).
2. As you can't perform transactions efficiently, you need read only data. Which is possible but many databases aren't read only, or databases NOT being written too are rare.
3. Ensuring your full dataset can fit in GPU memory (typically <20GB) is a damning limitation. Streaming data into the GPU is rather sloppy and limits compute though-put.
The CPU / RAM is a bottleneck between the persistent storage and the fast GPU.
There's a well-known way to increase network throughput and reduce latency by running a whole dedicated IP stack + hardware driver in the user space of the process that needs it. This removes the bottleneck of the OS kernel.
I wonder if there's a way to remove the bottleneck in the GPU case by something similar, by dedicating a piece of hardware to the SSD interconnect without having a CPU as an intermediary. AFAICT you still cannot DMA from a disk directly to GPU RAM, even though you have enough PCIe lanes coming to the GPU. Is this true? If so, can it change in a way that is still compatible with traditional, "normal" operation of the PC architecture?
We don't really call pandas a database either. It looks like a data processing tool/library. "real" databases have integrated persistence models, as well as discussions and design tradeoffs regarding ACID transactions and scalability. There are query planners and indexes, constraints (type, value, foreign key) and other logic that can help enforce business rules around the data. There are triggers and embedded functions too.
Still, it is pretty cool technology and I think something that will be integrated with a "real" database near you sometime soon.
BT wasn't actually billed as "database" internally, until they had to sell it in Google Cloud where the definition of what constitutes a database is much looser. Inside Google it's known as a multidimensional hash table.
I would definitely agree that's a more precise definition. Constraining "database" to only ever refer to RDBMS is what I really have a problem with.
But there really isn't a brightline between what's a table, and what's a filesystem, and what's a database. You can put a blob in a database or BigTable, and MongoDB is really close to being a flat file conceptually. You can have a filesystem or database that is content-accessible. You can have a filesystem that is atomic and supports rollbacks and can store relational data like symlinks. A virtual filesystem like LVM can support schema-like volumes on top of it.
At the end of the day it's all just technology that lets me abstract my writes so I can deal with a more simplistic model backed by certain guarantees about behavior. I want to write a program that does XYZ, not write a filesystem/database driver. From there it's all just various tradeoffs.
otoy.com claimed to have developed a CUDA -> OpenCL and CUDA -> CPU compiler, but they've not released anything and they've been very quite about it for the past 9 month or so.
It reminds me of the big hoopla for GPU h264 encoders. When they came out everyone realized the quality was worse and not much faster.
Some things don't lend themselves to parallel processing, notably anything linear like transactions.
I mean yeah the GPU can sort a hundred billion items a second but how often do you really need to sort that many items using a database? In 99.9% of uses you have indexing or limits on the number of results.
Just saying, this program looks more like a stream processing platform with a SQL-like frontend than a full database