Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
20 tips for MySQL db architects (scribd.com)
58 points by nocivus on Oct 20, 2008 | hide | past | favorite | 26 comments


Obligatory complaint about how scribd takes 5 minutes to load and ends up as a worse user experience than linking to the pdf.

Wait 5 minutes, then click on 'download'. sigh


axod, you can download these slides and other MySQL related presentations directly from http://ronaldbradford.com/presentations/


Very cool. Thanks I'll take a look there :)


The meta-point for me is that SQL is just broken, as are its many implementations. We'd never tolerate a scripting language that made you learn all the implementation details before you were effective with it. And yet we tolerate this from database engines and the languages we use to interact with them.

The opposite end of the spectrum is modern compilers. C/C++ compilers are generally so good that most attempts at nuanced optimization are just pointless; The compiler saw you coming and optimized your code behind your back before you got there. Why can't storage engines be like this? Why do you have to think not only about the type of your data, but how you want it implemented as well? Are there technical reasons why a db can't take generic data declarations ('string' instead of 'varchar(255)') and do the right thing with it when you populate your db? Or treat the use of a transaction as a hint that you'd like a storage engine that supports transactions?

I realize there comes a point where the db design has to be clamped down for production, but in the design stages there's a lot of optimization that a 'SQL compiler' could do behind your back. It currently seems like database product designers have taken the lazy way out and decided that optimization should live in the developers' heads rather than in the engine.

It just seems like SQL is this awful holdover from the days of COBOL and that we seriously need a modern product that let's us think about our programming problems rather than SQL's hangups. Am I missing something really basic, here?

Let me emphasize that, afaik, ronaldbradford knows his stuff, and I'm glad for him if he can make a living off of his MySQL knowledge. I just wish that that knowledge was embedded in the products themselves.


I just wish that that knowledge was embedded in the products themselves

Err, it is. Oracle, Sybase, MS SQL, Informix, all the major database products have sophisticated query optimizers. You use SQL to describe the result set you want, and it figures out the best way to get it.

Sadly the Web 2.0 world is full of people who actually believe MySQL is as good as databases get, having never used anything else. It's probably a good 10-15 years behind the state-of-the-art, at least (and Oracle et al are behind in niche areas).


I hear the points about query optimizers that you and the other repliers have made. (And they are valid points, all. I was just misinformed about MySQL.) I was more addressing the points made about database design decisions w/respect to data type choice and their effect on storage requirements and speed. I was wondering, specifically, if it is technically feasible to allow designers to specify 'text' or 'string' at design time, enter in some typical data, and have the engine choose the optimum type subject to its implementation constraints.

That said, it seems like I need to educate myself on the various products, though any prototypes I build will still use MySQL just because of the price point. :o)


Check out PostgreSQL, Firebird and SAP/DB. I can't think of a single technical or commercial reason to use MySQL.


Correction. There is one reason. Its a lot easier to find MySQL devs. than postgres, firebird, or SAP/DB devs even though the latter are a far superior product technically.


To give an example on varchars. Postgres never uses more space for varchars than strictly necessary. In fact, it is common to use text columns which are varchars extended to 2 gigabytes. Furthermore, postgres is capable of automatically compressing and decompressing data on the fly, no interaction required.

It also has specialized data types for pretty much any task you can imagine and a very robust extension system in case you need to roll your own types.


Agreed. A single look at a performance chart for MySQL under simultaneous transaction load shows it crumbling under. Furthermore, it has had substandard ACID support and substandard ANSI SQL support for generations and these problems are just starting to get fixed. Using MySQL as an indicator for the rest of the industry is deeply misguided.


s/SQL/MySQL

Other dbs like oracle, db2, postgres, and sql server are indeed much better about this. Ie. they only require optimizations on statistics like resident memory, etc... The actual queries do not need to be optimized except in extreme cases.


This seems like it could be really handy if someone could perhaps narrate along with it and help me understand what all of his talking points are about.


A video of the presentation can be found at http://technocation.org/node/533/play


Excellent! Thank you.


Small note about using unsigned ints: In PHP (and possibly others) they will turn into signed ints when you read them.

If you read and write them and never show them, it's probably OK, but if you have logic that checks for < 0, or the like, be aware of it.

But then again, if you have so many rows in a table that your auto_number has reached over 2 billion, you probably have other issues as well.


    ❖Average 25% - 30% saving on engagements
    ❖Better 60% (200GB System)
    ❖Best 78% (8GB per master with 12 masters)
What are they wibbling about?


As a database architect expert, I generally average a 25-30% reduction in the Disk Footprint of the database, sometimes with zero impact for application changes.

I use a very simple equation. DISK = MEMORY = PERFORMANCE Reducing the size of the disk footprint, and therefore the amount of disk reads (disk is the slowest of the 4 physical mediums) you improve performance. By then enabling more data to be stored in the appropriately tuned MySQL Memory buffers, you get a further increase in performance.

For more information on this point, you can contact me at http://ronaldbradford.com/contact


Why does he say varchar(255) is bad design?


If the MySQL documentation is right, it doesn't matter at all in MySQL. VARCHARs up to 255 uses an extra byte for counting the size of the column. There is a reading problem with variable-size records of any kind compared to fixed-size records.

It goes to say that other databases uses different allocation strategies for VARCHARs so the advice can't be carried over. For instance, they might define a fixed-size area together with an optional overflow area. This way one can make it such that most records have a fixed size and gain speedup.


Because 255 is now just an arbitrary choice for a VARCHAR length.

Explanation: Prior to MySQL 5.0.3 (give or take a few point releases - I forget) a VARCHAR column could be 255 characters in length maximum, so VARCHAR(255) was often used as a default. Now, however, you can go up to 65,535 characters on VARCHAR, so if you're still using "255" then that seems arbitrary and not well thought out (or your schema is just old).


That's not totally true. Using a varchar of 255 or less saves one byte vs one over 255. So if you don't know any particular size to use, 255 is reasonable.

And varchar(20) vs varchar(255) for a 10 character string use _exactly_ the same space. So there is no reason to artificially use a smaller value.


> So there is no reason to artificially use a smaller value.

That's not true either. As the document explains, MySQL's internal buffering uses fixed sized columns matching the length limit of the VARCHAR column. So a VARCHAR(255) will cause 255 bytes of memory to be allocated versus only 20 for VARCHAR(20). Of course, it depends on what range of sizes your data fit as to what you do :)


This is also multiplied by the character encoding, so when you have a table with 10 columns at VARCHAR(255) because your too lazy to define them, and they are all UTF8, so that's 255*3 for each column for each row when used in a MySQL Internal buffer (not, these buffers may or may not be used depending on the type of query).

When you have 100s to 1000s of queries per second, knowing and using your memory wisely is very important, because MySQL unlike Oracle for example, does not cap the amount of memory used for the Process, again using the Oracle approach the PGA (Process Global Area).

It's always easy to take sort cuts in design, when your DB will never grow above a few GB's and your users will never grow above say 10.

VARCHAR(255) is just as much a sign of little to no design, and therefore this and many other beginner 101 mistakes are likely, and therefore large performance gains are generally easily made in a system that now is needed to perform.


Kinda the point of UTF encoding is it's variable-width; if you only need ASCII then that's all you'll store. The MySQL people seem to have chosen to use the worst possible case every time regardless; this is not a "feature".


It depends on what you mean by artificial. It's a good idea to use a smaller value if you have an idea of what the actual limit should be for you want the column to contain.


Watch out for using int's for storing IP addresses. It won't work with IPv6.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: