|
Note: This is a single entry from my online diary. Please note that I'm not always entirely serious and some entries probably won't make sense unless put in context with other entries. |
|
This weekend's geekery was hacking on my want-ads site, Partalistinn.
The first thing I did was upgrade the search engine to use my shared in-memory database, and migrate the data over. This reduced the memory footprint of the system by about 60%, without any noticeable impact on speed: the average search is handled in about 30ms! Turns out keeping stuff in RAM is fast. :-)
Too bad the result quality isn't all that awesome, maybe I'll hack on that some other time...
After deploying those changes, I figured I should try load-testing the system, just to see what breaks. The end result of that was great: the stack (lighttpd/perl httpd/perl in-memory DB) can do about 90 queries per second. Pretty good for Perl, I'd say.
I used http_load to replay actual web-server logs, to get realistic results.
The interesting bit was, the first time I ran the test, it killed the site. It completely stopped responding, dozens of processes in memory, all apparently frozen. When I started poking around, I discovered that the database had crashed. Oops!
Under load, the web server (as designed) raises the number of processes in the worker pool. The back-end DB also likes to fork() now and then, to get long-running requests out of the main execution thread. All of this increases the overall memory usage, until the system runs out. Turns out I wasn't handling errors from fork() properly, so when it failed due to insufficient memory, the DB would get confused, think the main process was a child and exit(0) after handling one of those longer queries. Oops!
(Stupid perl fork() returns undef instead of -1 on error. Grr!)
That was easily fixed, but didn't solve the main problem: the system was using too much RAM. This time it was the kernel out-of-memory killer nuking my processes, eventually killing the DB and again halting the whole system. Ouch.
I reduced the HTTPD max concurrency and made the back-end DB stop trying to fork() when memory was low. That fixed the problem: the tests passed and performance was great. I will also be wrapping the DB in a restarter script, so if it does die, it will get restarted automatically.
Overall, I'd say running the test was time well spent. And it was fun. :-)