Sunday, August 20, 2006

MySQL Connection Management in PHP - How (Not) To Do Things

OK, I think I see what he is explaining, he has a point on the wordpress constructor... they should create the resource static and check if it has been defined already... he focuses on the reuse of the DB object, so if you create a new DB object only use 1 connection. I was hoping for sharing DB connection between sessions.

Sharing a DB connection between PHP instances can be done with pconnect.. but it's not a good idea.

The problem with wordpress is that the connection is opened at the beginning of every page. Multiple times. (There's a separate connection for the body of the page, and the sidebar, and the individual includes in the sidebar, etc etc).

He solved this problem by only opening a connection if a page is actually going to make a request.

The other problem (which he didn't address), is that wordpress holds open a connection for too long, and does too many queries. Loading the frontpage of a standard configuration wordpress blog issues 27 queries over 3 separate connections. Basically, wordpress is a mess. They went include crazy (just look at how many different includes there are for the header alone.. he went 3 includes deep to find the mysql connection).

#####################
First off, Wordpress only uses one connection for everything. Really. Unless you have a plugin or some other code making its own connection, all Wordpress queries are handled through the same connection.

The way it works is that, at the beginning of the execution, an instance of the wpdb class is created. Every query Wordpress does is through this one instance. It has one connection and it maintains it for the life of the execution cycle (until you see the generated page, basically).

What he's talking about is "lazy loading". See, when the instance of this class is created, it connects to the database right then and there. If the rest of the code then goes on and never uses that instance, you wasted your time connecting, yeah? His solution is to wait to actually connect until you need that connection. Basically, his patch eliminates the connection from the class constructor and creates a separate connect() function. Then, the query function is modified to check that a connection exists, and if not, call the connect() function to build one.

The benefit here is that if your page never hits the database for anything, then it never connects at all. This is smarter than the current Wordpress code.

However, it's also unnecessary, really. With Wordpress in particular, it would be extremely difficult to imagine a scenario where it doesn't actually hit the database. Everything comes from the database. Posts, sidebar content, anything dynamic, it all hits it. So this really isn't saving you anything for your average blog. Yes, he is correct that making the connection lazy makes more sense. However, it's a poor example, because Wordpress virtually *always* uses that database connection. Several times.

He also goes on about caching, and yes, caching is good. He doesn't talk about caching with Wordpress, but there are caching hooks in there and plugins which can use them (WP-Cache, for example). This sort of thing implements caching in a very smart manner... smarter than what he's talking about in his code snippets there, certainly. The upshot is that if you use something like WP-Cache, you get everything he's talking about and then some, making Wordpress extremely quick indeed.

Takes some setup, but what doesn't?
##################

Anyone who uses func_get_args() like that needs to look at how they code. That's some seriously ugly code.

But the general concept is completely right and should be used my most apps.

#################
So now that we got the caching idea into our heads what would be the most effective medium for storing your cache?
Is it filesystem? or could we maybe use something like SQlite for it?

memcached from http://www.danga.com/memcached/ is what you're looking for. It's amazing. I'm doing about a billion cache look-ups a day with it on some older hardware. I'm doing with one old server what previously took four nice big new servers.

####################
Interesting - but he's failed on one minor problem - that his caching algorithm uses the file system - a bigger problem than the one he is solving - the only way that this will work efficiently is if the blocks he is including a very complex (either in the SQL queries used OR in the processing that is performed on the output)

It is admittedly easy to produce and reuse the files - but there are problems with large directories if the site is getting into the sort of traffic that requires this sort of caching...

On a test version of our work server we were using file caching (and having to store temporary images) - we managed to break the file system by creating millions of files overnight - the systems team worked out that it would be quicker to reformat the system disk - and re-install the operating system that deleting all the files with rm (calculated time was somewhere near 32 days to remove files created in 12 hours!)