Archive for September 15th, 2007

Published by Fabian on 15 Sep 2007

Integrating Lucene into Symfony – a wrap up

I know, I talked in my previous entry about why you should use Symfony plugins instead of reinventing the wheel. But our requirements are not covered by design by the sfLucene plugin and Lucene is the wheel not to reinvent, so it was only up to me for a small integration. Just a few lines of code.

I have done that basically the whole afternoon today, so lets shortly wrap it up:

  • Lucene is a open source search engine.
  • Zend created a PHP implementation called ZendSearchLucene.
  • There are some good blog entries describing the integration
    • Dave Dash provided the initial tutorial, based on some old ZSL implementation
    • Peter van Garderen uses Daves tutorial and adds some comments for newer versions.
    • Johannes Schmidt (blog in german) gave me the final hintes to get UTF-8 working.

I took the latest ZSL 1.0.1 and just the search files. In 1.01 the ones i dropped in my_app/lib are:

Zend/Loader.php
Zend/Exception.php
Zend/Search/*

The next problem was now the autoloading, as he Zend.php file was no longer there. I anyway wanted to create a wrapper class which encapsulates loading the index and running the finds. So because of that I created my own ZendSearchLucene.class.php in my_app/lib:

require_once('Zend/Loader.php');
Zend_Loader::loadClass('Zend_Search_Lucene');
Zend_Loader::loadClass('Zend_Search_Exception');
Zend_Search_Lucene_Search_QueryParser::setDefaultEncoding("UTF-8");
Zend_Search_Lucene_Analysis_Analyzer::setDefault(
new Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8Num(),
new Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8Num());
 
class ZendSearchLucene {
  const INDEXROOT = SF_ROOT_DIR.DIRECTORY_SEPARATOR.'data'.
  DIRECTORY_SEPARATOR.'search'.DIRECTORY_SEPARATOR;
  const VERSION = '1.0.1';
  public static function addOrUpdate($doc,$id,$area){
    $index = null;
    try {
      $index = Zend_Search_Lucene::open($self::INDEXROOT.$area);
    } catch (Zend_Search_Lucene_Exception $e) {
      $index = Zend_Search_Lucene::create($self::INDEXROOT.$area);
    }
    $term = new Zend_Search_Lucene_Index_Term($id, 'myid');
    $query = new Zend_Search_Lucene_Search_Query_Term($term);
    $hits = $index->find($query);
    foreach ($hits AS $hit) {
      $index->delete($hit->id);
    }
    $index->addDocument($doc);
  }
 
  public static function find($query,$area){
    $index = Zend_Search_Lucene::open($self::INDEXROOT.$area);
    return $index->find(mb_strtolower($query,"UTF-8"));
  }
}

Looks complicated, but that is basically all I needed to do. You will find ideas from all three tutorials in that code. So lets give em credit:
The autoloading in the beginning was my creation.
The UTF-8 solution is very fragile. It never worked as described on other pages. The only change that made it possible is to set both Analyzers to UTF-8 but I must not set any UTF-8 somewhere else. (so no UTF-8 param in the field creation) . Thank you Johannes for the ideas. The UTF-8 mess took most of my time today.
Most of the addOrUpdate code comes from Dave, however I modified it. Thanks to the hint of Peter, the API changed and now offers a static open and create method. Unfortunately there is no create if not exists option. So I try to open and on exception I create it.
I also made a variable index for different areas (lets say a forum and a blog) where the search should stay inside that area.
Peters instructions and some comments helped me to resolve the issue with the id column. Giving it an own alias was enough, so I wouldn’t go so far to recommend not to give a DB column the name ID.

And that is already most of the stuff needed to do. It just takes some time to get it sorted out. Final credit again shall go to Dave for his talk about Zend_Search_Lucene, which inspired me to take it and not to use mysql like %xyz% calls.

Hope I could provide with this collection and amendments some help :-)

PS: If you are a Java guy it is very interesting to see how much effort PHP guys put into namespacing. Zend_Search_Lucene is a prime example of that. The even invent their own “classloader” which then maps the underscores to directories :)

Update2:
The UTF-8 was not working correctly, but now I have it. I did not notice that calling strtolower on an utf-8 string will corrupt it, there are some cases where it might work, but to be safe, always use mb_strtolower so my field generation looks like this:

$titleField = Zend_Search_Lucene_Field::Text('title', mb_strtolower($this->getTitle(),"UTF-8"),"UTF-8");
$titleField->boost = 1.5;
$doc->addField($titleField);

So it looks that without this it worked okay on my devenv but not on the test system (most likely because the php locale, and thus the conversion magic, was different). I also updated the code above.

Published by Fabian on 15 Sep 2007

Symfony & Propel Behaviors

There is some discussion in the symfony which is the better ORM layer. Propel or Doctrine.

To be honest, I prefer Propel right now. It is simple, you can easily override functionality (even when its ugly by copy pasting just to force a certain PK to be set). But the best thing is the availability of behavior plugins from the community.

The Plugin-Wiki hosts currently 17 propel plugins. And those are growing from day to day. You might say, oh well why do i need to add a plugin to be able to rate one of my domain objects?
The answer is: You don’t need. you can code it yourself. But why would you, if you get a maintained flexible and readymade plugin for it? Actually I switched from a home grown rating solution to that plugin and contributed my ideas. It is more powerful, more robust and more structured than the solution I came up.

Here my top 3 of the propel plugins:

  1. sfPropelActAsNestedSetBehaviorPlugin : Have you ever tried to build a tree from an RDB? It is a hell of work and this plugin solves this like a charm. There are many improvments discussed on the mailinglist and constantly integrated. This plugin saves you A LOT of work.
  2. sfPropelActAsRatableBehaviorPlugin : As described above. This helps you rating your domain objects. It includes a module for displaying a star rater plugin.
  3. sfPropelActAsCommentableBehaviorPlugin : Pretty new plugin, but similar to ratable. But only for comments. As ratings and comments are two main features of modern web 2.0 apps this is a must too. Again: Not complicated to do yourself, but a solution that is maintained enhanced and gets features you would have liked but not thought of yet, is much better than reinventing the wheel.

And of course I take the chance to point you at sfPokaYokePlugin which brings you degrading javascript validation using exactly the same validation files than your regular symfony serverside validation is using. No extra effort! Why am I referencing that plugin? Because I contributed to that as well :)

Have a look at symfony plugins today. Don’t reinvent the wheel, but contribute to make existing wheels even rounder!

Update:

I just checked how you would do these behaviours in propel. pookey pointed me to something Doctrine calles templates:
http://www.phpdoctrine.org/documentation/manual/trunk?one-page#class-templates

this should be pretty much similar to behaviours. Although there are no readymade symfony plugins for common use cases existing yet. Hope that helps you Doctrine users. I am not (yet) one :-)