Published by Fabian on 02 Oct 2007 at 12:01 am
Propel and PHP Garbage collector
this might be a complaint, or a rant, or something like that.
For some reasons there is about 200 megabytes of data I would like to import into the new web application I am writing.
I thought doing so is easy with propel. Well it actually is, wouldn’t PHP have some strange issues. Perhaps they are propel issues but I guess PHP is also to blame.
I do a mysql_query against my old database and initialize a new propel connection for the new one. I loop over all rows for that table. 63000 rows. it failed with out of memory some times, until I found a memory setting that would allow that loop to complete. hold tight: it is 1GB. The data processed inside this loop weighs 50MB.
I am not really doing complex things. I new my Propel objects, invoke a few setters and then save() them. Using the same connection for all inserts brings already a lot of performance. but using the Propel Objects leaks memory each loop. But more interesting, If I just loop and comment all the propel stuff I also leak memory. or more precise, PHP does. Why does an empty loop over mysql data leak memory? is there an internal loop status object that keeps track of each iteration?
Okay the bulk of the memory leak goes credit to propel. It seems to be that the crossreferenced table-, database- and column maps never free the memory. The PHP garbage collector is unable to collect cyclic island references, so they stay in memory. There are some tickets open in propel trac, so perhaps this will get improved. The only thing I could do now is to explicitly unset all variables I am using myself to limit memory usage and to give PHP that gigabyte for the import. This will be a one time operation, but I wonder if there might be a better way to reduce memory consumption. And yes I would like to keep using the Propel objects.
Here some leaking sample code. The style is not perfect. its just a quick snippet. not that I am using the basePeer::doInsert because I needed here to preserve the ID (which is removed by the save() method); also its a bit faster.
mysql_connect("legacy-db", "olduser", "oldpw"); $data = mysql_query("select oid,title,text from db.table"); $max = mysql_num_rows($data); $databaseManager = new sfDatabaseManager(); $databaseManager->initialize(); $con = Propel::getConnection(EntryPeer::DATABASE_NAME); try { $con->begin(); $i=0; while($row=mysql_fetch_row($data)){ $i++; if ($i % 100 == 0) echo ($i."/".$max."\n"); $e = new Entry(); $e->setId($row[0]); $e->setTitle(utf8_encode($row[1])); $e->setBody(utf8_encode($row[2])); $crit=$e->buildCriteria(); $crit->setDbName(EntryPeer::DATABASE_NAME); BasePeer::doInsert($crit, $con); unset($e); unset($crit); unset($row); } $con->commit(); } catch(PropelException $e) { $con->rollback(); throw $e; } mysql_free_result($data);
Stefan on 02 Oct 2007 at 9:15 am #
There are a lot of performance problems with Propel 1.2 that, as far as I’ve heard, have been fixed in Propel 1.3. So hopefully, once that is out, things will get easier.
But I’ve heard people say that you should use Propel for fast prototyping, but refactor it to use simple querying for your database once a system goes into production. I’m not a big fan of such an approach, especially since I want to use the Propel objects and be able to easily switch between different database types. I guess both PHP and Propel need to be fixed…
Kris Wallsmith on 02 Oct 2007 at 5:59 pm #
I typically disable logging for batch scripts in symfony, which significantly improves memory leak issues.
sfConfig::set(’sf_logging_enabled’, false);
Hope that helps!
Kris
Fabian on 02 Oct 2007 at 6:19 pm #
Hey, thats great Kris!
Did not solve all leakage, but everything which improves performance and reduces leakage is appreciated.
I should have noticed that earlier. Now i am going to delete a huuuuuge log file
Sam C on 02 Oct 2007 at 10:33 pm #
Memory leaks have never really been a big priority in PHP because the processes running the scripts are usually short lived. Mod_php pretty much throws away the interpreter every time a request is made. If you read the paper on scaling Flickr you’ll see that they originally started to write an ftp daemon in PHP and realized that it wasn’t going to work due to memory leaks.
Benjamin Bender on 03 Oct 2007 at 10:07 pm #
Afaik, you are absolutly right – its an issue with propel by design. there are circular-references which (atm) will not caught by the garbage collector of php. In propel 1.3 its even “worser” caused by its instancepooling – but it can be disabled. And therefor its working quite nice for me, even with big datasets.
reference:
http://propel.phpdb.org/trac/ticket/379
http://propel.phpdb.org/trac/ticket/420
I’m using Propel 1.3 now and all the trouble is gone
Also, the change from creole to pdo is such a huge step forward…
Bert-Jan on 05 Oct 2007 at 11:56 am #
Maybe it’ll help if you don’t use a new Entry object every time, but build the Criteria directly and keep re-using the same Criteria object:
while (…) {
$crit->clear();
$crit->add(EntryPeer::ID, $row[0]);
$crit->add(EntryPeer::TITLE, utf8_encode($row[1]));
$crit->add(EntryPeer::BODY, utf8_encode($row[2]));
$crit->setDbName(EntryPeer::DATABASE_NAME);
EntryPeer::doInsert($crit);
}
Bert-Jan on 09 Jan 2008 at 10:56 am #
The trouble with Propel’s circular references are supposed to be solved in the upcoming PHP 5.3, which has an improved garbage collector that can detect them and clean them up, at a slight performance penalty.
Initial testing showed that scripts were a little slower but used on average 30% less memory. The performance hit was still being worked on.
Fabian on 09 Jan 2008 at 11:04 pm #
Hi Bert,
actually today I have read that his feature was dropped from 5.3 postponed to 6. Do you have any references?