Tag: Development

Development

Symfony2 + Propel 1.6 with Memcache

I was looking for a way to put the Propel 1.6 “Instance Pool” into Memcache for a Symfony2 project I am working on, and I have managed it. Here is what I did; it may help you, but all the usual caveats apply. (Your mileage may vary. Use only under the direct supervision of your doctor. Do not allow children to use unsupervised. Not to be taken internally.)

What you have:

I am assuming that you currently have Symfony2 set up with Propel 1.6 as your ORM. If not there are guides (http://symfony.com/doc/current/book/installation.html, http://www.propelorm.org/cookbook/symfony2/working-with-symfony2.html) to help you. Do that first, then come back. I’ll wait.

The setup I am using assumes you will have multiple servers behind a load balancer of some sort, along with a memcache server that is accessible to all of them.

This is a sample of such a setup.

What you need in Symfony2:

Now that you have Symfony2 and Propel 1.6 set up, you need to create a new Symfony bundle to handle your Memcache connections. (You will see the need for this soon.)

What I have done is create a bundle which will contain all my shared utilities among all the apps built on this Symfony2 instance. For the sake of this article we will call the bundle “MyApp.” (I am not going to go into detail on how to create a Symfony2 bundle – the Documentation is your friend.)

Once we have created a bundle and registered it with app/AppKernel.php by adding

new MyApp\MyAppBundle\MyAppBundle(),

in the $bundles array under registerBundles(), we need to create a class to read configurations from the app/config directory (config_dev.yml, config_test.yml and config_prod.yml) so that our Memcache settings can be configured by environment. Note that we are not adding any routes for this bundle, it is used for utility stuff only, not for routable pages.

For this we will also need to make sure that we have MyAppExtension and Configuration in place in src/MyApp/MyAppBundle/DependencyInjection.

This gives us access to anything in the config_XXX.yml files under the “my_app” node.

The configuration class we will call “MyAppConfiguration” and will place this in the src/MyApp directory.

Now that we have a class that can read our own special configurations we need to add a class to handle the Memcache connection.

For this example we will create this class as Poolcache\CacheHandler. In order to do that we add a directory in src/MyApp called Poolcache and create the CacheHandler.php there. This means we can now set up the Memcache server locations based on the environment. For example, in config_dev.yml you might include:

my_app:
    cache:
        server: 127.0.0.1
        port:   11211
        expire: 120

While in config_prod.yml you might use:

my_app:
    cache:
        server: mymemcache.mydomain.dom
        port:   11211
        expire: 3600

(All the sample files can be downloaded here.)

Creating the Propel behavior:

Now that we have Symfony2 set up the way we need, we can add the Propel behavior. In the vendor/propel/generator/lib/behavior directory we will be adding MemcachedPoolBehavior.php.

This class uses the parser to replace some of the code in the generated classes.

<?php
require_once __DIR__.'/../util/PropelPHPParser.php';
class MemcachedPoolBehavior extends Behavior
{

    /**
     * Filter to add the CacheHandler class to the Peer objects so they
     * can use Memcached or whatever other cache you want to use
     */
    public function peerFilter(&$script)
    {
        $keyname = $this->getTable()->getPhpName();
        $newAddInstanceToPool = "
    public static function addInstanceToPool(\$obj, \$key = null)
    {
        if (Propel::isInstancePoolingEnabled()) {
            if (\$key === null) {
                \$key = (string) \$obj->getId();
            } // if key === null
            \$cache = \\MyApp\\Poolcache\\CacheHandler::getInstance();
            \$pool = \$cache->getPool('%s');
            \$pool[\$key] = \$obj;
            \$cache->setPool('%s', \$pool);
            /*self::\$instances[\$key] = \$obj;*/
        }
    }
    ";
        $newAddInstanceToPool = sprintf($newAddInstanceToPool, $keyname, $keyname);
        $parser = new PropelPHPParser($script, true);
        $parser->replaceMethod('addInstanceToPool', $newAddInstanceToPool);
        $script = $parser->getCode();

        $newRemoveInstanceFromPool = "
    public static function removeInstanceFromPool(\$value)
    {
        if (Propel::isInstancePoolingEnabled() && \$value !== null) {
            if (is_object(\$value) && \$value instanceof %s) {
                \$key = (string) \$value->getId();
            } elseif (is_scalar(\$value)) {
                // assume we've been passed a primary key
                \$key = (string) \$value;
            } else {
                \$e = new PropelException(\"Invalid value passed to removeInstanceFromPool().
                    Expected primary key or %s object; got \" .
                    (is_object(\$value) ? get_class(\$value) . ' object.' : var_export(\$value,true)));
                throw \$e;
            }
            \$cache = \\MyApp\\Poolcache\\CacheHandler::getInstance();
            \$pool = \$cache->getPool('%s');
            unset(\$pool[\$key]);
            \$cache->setPool('%s', \$pool);
            /*unset(self::\$instances[\$key]);*/
        }
    }
    ";

        $newRemoveInstanceFromPool = sprintf($newRemoveInstanceFromPool, $keyname, $keyname, $keyname, $keyname);
        //$parser = new PropelPHPParser($script, true);
        $parser->replaceMethod('removeInstanceFromPool', $newRemoveInstanceFromPool);
        $script = $parser->getCode();

        $newGetInstanceFromPool = "
    public static function getInstanceFromPool(\$key)
    {
        if (Propel::isInstancePoolingEnabled()) {
            \$cache = \\MyApp\\Poolcache\\CacheHandler::getInstance();
            \$pool = \$cache->getPool('%s');
            if (isset(\$pool[\$key])) {
                return \$pool[\$key];
            }
        }
        return null; // just to be explicit
    }
    ";

        $newGetInstanceFromPool = sprintf($newGetInstanceFromPool, $keyname);
        //$parser = new PropelPHPParser($script, true);
        $parser->replaceMethod('getInstanceFromPool', $newGetInstanceFromPool);
        $script = $parser->getCode();

        $newClearInstancePool = "
    public static function clearInstancePool()
    {
        \$cache = \\MyApp\\Poolcache\\CacheHandler::getInstance();
        \$cache->clearPool('%s');
        /*self::\$instances = array();*/
    }
    ";

        $newClearInstancePool = sprintf($newClearInstancePool, $keyname);
        //$parser = new PropelPHPParser($script, true);
        $parser->replaceMethod('clearInstancePool', $newClearInstancePool);
        $script = $parser->getCode();
    }

}

Notice that we are calling the static getInstance method on the cache handler class we created earlier, and using that to move the instancePool from the classes static properties to the memcached server.

Final steps:

Back in Symfony2, add the following to app/config/propel.ini:

#memcaching instance pool
propel.behavior.memcachedpool.class = behavior.MemcachedPoolBehavior

And in your Symfony2 application bundles, when you set your Resources/config/schema.xml for Propel you can add <behavior name="memcachedpool" /> to place the instance pool into memcache. Like so:

<?xml version="1.0" encoding="UTF-8"?>
<database name="default" namespace="Acme\HelloBundle\Model" defaultIdMethod="native">

    <table name="book">
        <column name="id" type="integer" required="true" primaryKey="true" autoIncrement="true" />
        <column name="title" type="varchar" primaryString="1" size="100" />
        <column name="ISBN" type="varchar" size="20" />
        <column name="author_id" type="integer" />
        <foreign-key foreignTable="author">
            <reference local="author_id" foreign="id" />
        </foreign-key>
        <behavior name="memcachedpool" />
    </table>

    <table name="author">
        <column name="id" type="integer" required="true" primaryKey="true" autoIncrement="true" />
        <column name="first_name" type="varchar" size="100" />
        <column name="last_name" type="varchar" size="100" />
        <behavior name="memcachedpool" />
    </table>

</database>

This means you can pick and choose which classes are maintained in Memcache as well, so that not every class needs to be there.

Best Practices

How intranet software goes to hell

We have all seen it, many of us have tried to clean it up, and a few of us may have even been responsible for some of the worst written, non-documented, buggy, spaghetti-like code ever – “internal use only” apps. These are apps that are meant to simplify the jobs of your co-workers, meant to automate repetitive tasks and meant to be a means for managing the company’s business. So how do they end up so terrible? You’ve got the hottest, leanest, cleanest code on the public facing side, so you obviously have the talent in-house to make good software. (And let’s be honest, all of us think our own software is the best, because if we didn’t we would die of shame whenever anyone asked where we worked.)

So how does the software we build for ourselves go so wrong? Well, in my observations through many jobs over many years, I have come up with a formula for really lousy internal software.

Step one: Start small
By start small I don’t mean start with a single database with 4 or 5 tables and a couple views and a few report generation scripts. I mean start really, really small. Like “put a page on the intranet that lists all our vendors and their current status.”

By starting with such a small task it is easier to forgo any sort of documentation, architecture planning or requirements specifications. Its also easier to convince yourself that this is unimportant. After all, this is merely a convenience for your fellow workers and not an integral part of the revenue stream. This is the first step on the road to ruin.

Step two: Occasionally add a feature, not too much at once
It is important at this early stage in the gestation of your beastly code that you keep feature adds at least as small as the original task. By not having anything “worthy” of architecture or specification you can guarantee the continued growth of your new monster. These should be things like “Can we also show the vendor’s contact info on that list?” followed a month later by “can we filter the list to only show active vendors?” These changes should not only be small, but should be spaced far enough apart that the developer involved has forgotten about the changes that came before, or at least how many there were.

Step three: Repeat steps one and two, several times
Now that you have a minor little thing here, it is time to add some more. This time, let’s do the same thing, but for, say clients. Because you already have the basics it is a perfect time for some copy and paste development. Change the query, but don’t bother with changing variable names or anything. After all, you already know it works, just use it as is with some text label changes on the output. Easy-peasy and took you about five minutes. At this rate, you could just as quickly add the same sort of thing for employees. And any other sort of list that comes up.

Step four: Time for a big change
Now it becomes time to turn all your “unrelated” (although code-copied) little, unimportant, non-revenue stream items into one full-fledged app. Since you are already convinced that none of this is very important, and most of it is already built and functioning, it is easy to convince yourself that turning this into one contact management app is a small enough task to not need architecture, requirements or even any real documentation. This is generally where the real shape of the beast starts to take form. Now your query and display scripts will need to be able to insert, update and delete, and your one display will need to be diversified into display, edit forms, and perhaps a login page to insure the person using the forms has permissions to edit or delete.

If you really want to do it up right, instead of turning it into the obvious (in this case a contact management app) turn it into something close, but not quite the same. Say, an inventory and order management app. Hey, we already have the client and vendor info, we’re more than halfway there, right?

Step five: The final chapter
The last step is perhaps the easiest. Once the monstrosity is running on your intranet and working (however badly), ignore it. Requests for bug fixes go to the bottom of the queue as it is, after all, not part of your revenue stream. Developer time is better spent on your customer-facing apps and there really is no need to make it work completely, because “we got along fine without it before it was built.”

Of course, during this waiting period the app that horror built becomes a routine part of the workflow of those who use it regularly and they pretty well can’t do their job without it any more.

How to avoid it altogether, in one simple step:
In my experience, the simplest way to avoid these kinds of nightmare creatures of code, is to require a full architecture, specification and documentation cycle for even the simplest little things. You are likely to find that even though you were only asked for a vendor list, what your co-workers really need is far beyond that. Of course, you will only bother with treating it like any other development cycle if you can see the project as an important part of your business, and as having impact on the revenue stream. If it seems too small to bother with treating it like a full project, then either the requester has failed to make clear its importance, or it really is something that should not even be taken on.

Development

Simplified ANSI color term support in PHP

I was working on a script that needed some color terminal output and while it wasn’t particularly complicated, I found it was slowing me down. Flipping back and forth between a list of ANSI color codes and my work was frustrating. So, I did what I am often prone to do, I did a quick Google search for a PHP ANSI color terminal library. I found some things that were old, not maintained and not really fitting what I needed. So then, I did what I always end up doing in that situation, I built one.

The ANSI class is a way to quickly create several different foreground and background color combos along with a few style effects (like underline, inverse, and if you really must, blink). Of course the style effects only work on the standard 16 ANSI VT-100 terminal colors (the normal and “bold” or “bright” versions of black, red, green, yellow, blue, purple, cyan and white.)

The simplest way to use it is to create a new ANSI object for each color combo you want. So if you want red text on a white background, underlined bright green text on a black background and blue text on a yellow background, you could create three objects like so:

include_once('ansi.class.php');
$red_white = new ANSI(ANSI::RED, ANSI::WHITE);
$bright_green_black = new ANSI(ANSI::GREEN, ANSI::BLACK, array(ANSI::BRIGHT, ANSI::UNDERLINE));
$blue_yellow = new ANSI('blue', 'yellow');

Notice that I used a couple different ways of setting the colors, the class constant ints and strings. The effects are set in an array since you can chain multiple effects on a single color scheme (until you get into the extended color space, more on that in a minute.) Once you have these objects, styling your terminal output is simple.

$red_white->p("This is red on a white background, and prints no newline.");
$bright_green_black->p("This is bright green on a black background and prints no newline.");
$blue_yellow->pline("This is blue on a yellow background and will print a newline character.");
$red_white->setInverse(true);
$red_white->pline("This is now white on a red background.");

The p() and pline() methods will spit out the correct escape sequence and color codes to style and color the text, then print the text, then spit out the correct escape sequence and color code to “reset” the term to its default. This means no running a script that displays a warning then leaves your terminal bright yellow text on a red background.

So now that the standard color space is taken care of, how about a little love for the xterm 256 color space? Simple enough. Any int value passed to the color arguments of the constructor greater than 7 will automatically invoke the 256 color space. The first 16 colors (0 – 15) are just the default terminal colors, of course, but colors 16 – 231 are the extended color space, with 24 greyscale values from colors 232 – 255. So how do we know what color is what? Well, we can either call one of the static functions to view the color space (ANSI::showForegroundColors(), ANSI::showBackgroundColors()) or we can pass in a value from the static ANSI::rgb($r, $g, $b) function, which takes, you guessed it, three integer values from 0 – 255. While ANSI::rgb() tries to get to the closest color in the color space it still needs work. The very simplistic manner in which it is currently implemented is not the most accurate. It is on my to-do list somewhere, though.

$grey_gold = new ANSI(ANSI::rgb(31, 31, 31), ANSI::rgb(204, 153, 0));
$grey_gold->pline("This is grey text on a gold background.");
// effects don't work in the extended color space, except for inverse
$grey_gold->setInverse(true);
$grey_gold->pline("This is gold text on a grey background");

If you know of any way to apply the ANSI styles (underline, blink, inverse) in conjunction with the extended color space leave a comment to let me know. If you think the script could use some extra functionality do the same.

It is not incredibly clever or full-featured or any of those sorts of things, but it does what I needed it to do. If you would like to, you can download a copy a copy of the ANSI class (ansi.class.php.zip) – it is released under the MIT license, and is free to use, copy, distribute, etc etc.

Development

PHP Singleton? Not really

If you couldn’t tell by the long silence, things around here have been not so quiet as I had hoped. However, while reading the PHP: Patterns page I came across a large number of implementations of the Singleton pattern.

I happen to like the Singleton pattern, and use it in Java and Python (where the VM maintains the “one and only one” instance) but no so much in PHP.

Why, you ask? It is simply this: you cannot create a true Singleton in a PHP web application. Since every page load is executed in a separate thread, every page load has it’s own instance of the class. Any hope of true Singleton behavior is lost.

As a way to illustrate this, here is a PHP “Singleton” class and an associated PHP page. Throw them up on a test server and hit the page.

Then try to increment the counter. See what happens, I’ll wait.

The class:

<?php
/**
 * test for PHP multi-threaded singleton
 */
class Singleton {
  private static $instance;
  private $counter;

  /**
   * Constructor is private
   */
  private function __construct() {
    $this->counter = 0;
  }

  /**
   * Entry point. Get the static instance of Singleton
   */
  public static function getInstance() {
    if (is_null(self::$instance)) {
      self::$instance = new Singleton();
    }
    return self::$instance;
  }

  public function __clone() {
    trigger_error('Clone not allowed for '.__CLASS__, E_USER_ERROR);
  }

  public function incrementCounter() {
    $this->counter++;
  }

  public function getCounter() {
    return $this->counter;
  }
}
?>

The page:

<?php
include_once('singletontest.php');
$s = Singleton::getInstance();
if (isset($_GET['inc'])) {
  $s->incrementCounter();
}
?>
<html>
<head><title>Multi-threading PHP Singleton? Not Likely</title></head>
<body>
<h3>Singleton test</h3>
<p>The counter is at <?php echo $s->getCounter(); ?></p>
<pre><?php var_dump($s); ?></pre>
<p><a href="<?php echo $_SERVER['PHP_SELF']?>?inc=1">Increment the counter</a></p>
</body>
</html>

In this first version, even within one browser the limitations are clear. The Singleton instance is recreated on every page load. So, what if we serialize our $counter variable to disk? Will that help? Let’s try it.

The modified class:

<?php
/**
 * test for PHP multi-threaded singleton
 */
class Singleton {
  private static $instance;
  private $counter;

  /**
   * Constructor is private
   */
  private function __construct() {
    $init = 0;
    if (file_exists('/tmp/singleton.ser')) {
      $str = file_get_contents('/tmp/singleton.ser');
      $init = unserialize($str);
    }
    $this->counter = $init;
  }

  /**
   * Entry point. Get the static instance of Singleton
   */
  public static function getInstance() {
    if (is_null(self::$instance)) {
      self::$instance = new Singleton();
    }
    return self::$instance;
  }

  public function __clone() {
    trigger_error('Clone not allowed for '.__CLASS__, E_USER_ERROR);
  }

  /**
   * Since PHP does not create "only one" instance globally, but by thread, we
   * need a way to store our instance variables so that each thread is getting
   * the same values.
   * Note that threads holding a version of this will have the old value until
   * they reload the Singleton (by a page refresh, etc).
   */
  public function incrementCounter() {
    // We need to update the serialized value
    $handle = fopen('/tmp/singleton.ser', 'w+');
    // Get an EXCLUSIVE lock on the file to block any other reads/writes while
    // we modify
    if (flock($handle, LOCK_EX)) {
      // Only update the instance variable's value AFTER we have a lock
      $this->counter++;
      // empty the file
      ftruncate($handle, 0);
      // write out the value
      fwrite($handle, serialize($this->counter));
      // and unlock so that everyone else can read the new value
      flock($handle, LOCK_UN);
    } else {
      // You would probably prefer to throw an Exception here
      echo "Couldn't get the lock!";
    }
    fclose($handle);
  }

  public function getCounter() {
    return $this->counter;
  }
}
?>

The modified page:

<?php
include_once('singletontest.php');
$s = Singleton::getInstance();
if (isset($_GET['inc'])) {
  $s->incrementCounter();
} else if (isset($_GET['ext'])) {
  $x = true;
}
?>
<html>
<head><title>Multi-threading PHP Singleton? Not Likely</title></head>
<body>
<h3>Singleton test</h3>
<p>The counter is at <?php echo $s->getCounter(); ?></p>
<pre><?php var_dump($s); ?></pre>
<?php if ($x) for ($i = 0; $i < 1000; $i++) {
  $s = Singleton::getInstance();
  echo '<p>The counter is at '.$s->getCounter().'</p><p>';
  // wait
  for ($j = 0; $j < 10000; $j++) { echo '. '; }
    echo '</p>';
  }
?>
<p><a href="<?php echo $_SERVER['PHP_SELF']?>?inc=1">Increment the counter</a></p>
<p><a href="<?php echo $_SERVER['PHP_SELF']?>?ext=1">Do a long list</a> 
fire this off in one browser and increment in another.</p>
</body>
</html>

Using the modified versions above, open two separate browsers. Point both at the page and increment in one then reload the other. So far so good. Now set off the long list in the one and increment in the other while it is still running. What happened? The Singleton pattern works within a given thread, so for as long as that thread runs, changes made to the Singleton’s serialized data will not be available in another thread. There is a possible work-around, which would be to read and unserialize the value every time getCounter() is called. At the expense of a little more overhead the expected behavior in terms of object state can be obtained. But back to the real question: Is it a Singleton? Well no, not really in the sense that most of us think of a Singleton, which is system or application-wide. But it is at least within its containing thread, which might make it more useful for command-line PHP in long-running scripts. (Like those report generation scripts that you are running in a daily cron job that join 18 tables and generate 500,000 line csv files … no? Just me?)

Apache

Apache and PHP HTTP PUT Voodoo

While trying to work out the details for a PHP REST utility I kept running into a wall when it came to using HTTP PUT (and HTTP DELETE) with Apache 2.2 and PHP 5. There are plenty of scattered tidbits of information relating to this on forums about the web, many old, and many more incomplete or even unhelpful. [As a side note: if someone on a forum you frequent is asking for help with getting HTTP PUT to work in Apache, telling them “Don’t use PUT it lets the hax0rs put files on your server! N00b! Use POST LOL!!11!” is not helping, nor does it make you look intelligent.]

The first hint I came across was putting Script PUT put.php in your httpd.conf in the <Directory> section. (That is, of course, assuming that your script for handling PUT requests is called put.php.)

I tried that and on restarting Apache got the error “Invalid command ‘Script’, perhaps misspelled or defined by a module not included in the server configuration” – which lead to a short bit of research (thanks Google!) that pointed out that the Script directive requires mod_actions be enabled in Apache. I did that and then tried to hit my script with a PUT request, to which I got a 405 error: “The requested method PUT is not allowed for the URL /test/put.php”.

Well, that was certainly strange, so I added <Limit> and <LimitExcept> blocks to my <Directory> section, but to no avail. So I changed the <Directory> directive from <Directory /var/www/test> to <Directory /var/www/test/put.php>. It looked strange, but what the heck, worth a try. I could now do PUT requests, but only as long as the url was /test/put.php, and that is not what is wanted when putting together a RESTful application. Trying to do anything useful, like a PUT to /test/put.php/users/ resulted in more 405 errors, now saying “The requested method PUT is not allowed for the URL /test/put.php/users/”.

So, back to the httpd.conf to change the <Directory> back to the previous. And then on to the other method I saw in a few places, using mod_rewrite to forward PUT (and DELETE) requests to the script. Of course, everywhere I saw this listed it was claimed that this alone (without the Script directive) was enough to enable PUT. So, I commented out the Script directive and added some mod_rewrite statements to the .htaccess file (which is always preferable in development as you can make changes on the fly without reloading or restarting the server.) So I added a RewriteCond %{REQUEST_METHOD} (PUT|DELETE) and a RewriteRule .* put.php.

And, I went back to test it again and, big surprise, got a 405 error again. Now, even when pointing directly at /test/put.php I got a 405 error. So, I decided to try combining the two. I uncommented the lines in the httpd.conf and bumped the server and was pleasantly surprised that PUT (and DELETE) requests to the /test/ directory were properly handled by the script. Now I could do something useful, like add another mod_rewrite rule to send all traffic for /api/ to the /test/put.php and call /api/users/ with a PUT (or DELETE) request and it was properly handled!

So, putting it all together:

In Apache: enable mod_actions and mod_rewrite. In Gentoo: make sure the lines

LoadModule actions_module modules/mod_actions.so

and

LoadModule rewrite_module modules/mod_rewrite.so

in httpd.conf are not commented out. In Debian the commands

a2enmod actions

and

a2enmod rewrite

do the trick.

In the httpd.conf add the following:

<Directory /var/www/test>
    <Limit GET POST PUT DELETE HEAD OPTIONS>
        Order allow,deny
        # You might want something a little more secure here, this is a dev setup
        Allow from all
    </Limit>
    <LimitExcept GET POST PUT DELETE HEAD OPTIONS>
        Order deny,allow
        Deny from all
    </LimitExcept>
    Script PUT /var/www/test/put.php
    Script DELETE /var/www/test/put.php
</Directory>

And finally, in the .htaccess add the rewrite voodoo:

RewriteEngine On
RewriteBase /test
RewriteRule ^/?(api)/? put.php [NC]
RewriteCond %{REQUEST_METHOD} (PUT|DELETE)
RewriteRule .* put.php

Hopefully this works as well for you as it did for me. Now to get back to business of actually writing the code to deal with the request and dispatch it appropriately (which may be a post for another day, or you can have a look at how some others have done it.)

By the way, for testing I have found the Firefox plugin Poster to be immensely useful, as well as the Java based RESTClient.

Development

Comparing PHP array_shift to array_pop

I noticed a note in the PHP documentation about speed differences between array_shift() (pulling the first element off the array) and array_reverse() followed by array_pop() (resulting in the same data, but got to by pulling the last element off the array).

Since I was working on some code to convert URL pieces to program arguments (like turning /admin/users/1/edit into section=admin, module=users, id=1, action=edit – stuff we tend to do every day) I thought I would take a look at the speed differences since I have always used array_shift() for this (after turning the string into an array via explode()).

My initial tests showed that array_shift was much faster than array_reverse followed by array_pop, and I wondered why someone would say that in the first place. But then I thought about it for a bit. When using array_shift the entire remaining array has to be re-indexed every call. For a very short array (like the one I was using) this is negligible. When you start looking at much larger arrays, however, this overhead adds up quickly.

To find out roughly where the break-even point on these two methods lie I whipped up a quick script to run with arrays sized from 10^1 values up to 10^5 values. What I found is that at less than 100 values you are not really gaining much (if anything) by using array_reverse and array_pop versus array_shift. Once you get to the 1000 value array size, however, the differences really add up (as you can see in the logarithmic scaling of the chart below).

shift_vs_pop.jpg

The code I used to generate the numbers (which are shown in the chart as averages over 3 runs, rounded to the nearest millionth of a second) is:

<?php
$counts = array(10,100,1000,10000,100000);
foreach ($counts as $len)
{
	$m2 = $m1 = array();
	$x = 1;
	while ($x <= $len)
	{
		$m2[] = $m1[] = $x;
		$x++;
	}
	echo "Timing with array_shift() for $len items\n";
	echo "000000";
	$s1 = microtime(true);
	while (!empty($m1))
	{
		$tmp = array_shift($m1);
		if ($tmp % 10 == 0)
		{
			echo chr(8),chr(8),chr(8),chr(8),chr(8),chr(8);
			echo str_pad(''.$tmp,6,'0',STR_PAD_LEFT);
		}
	}
	$s2 = microtime(true);
	echo "\nTook ",$s2 - $s1," seconds\n";
	
	echo "Timing with array_reverse and array_pop() for $len items\n";
	$s1 = microtime(true);
	$m2 = array_reverse($m2);
	while (!empty($m2))
	{
		$tmp = array_pop($m2);
		if ($tmp % 10 == 0)
		{
			echo chr(8),chr(8),chr(8),chr(8),chr(8),chr(8);
			echo str_pad(''.$tmp,6,'0',STR_PAD_LEFT);
		}
	}
	$s2 = microtime(true);
	echo "\nTook ",$s2 - $s1," seconds\n";
	echo "\n";
}
?>
Development

Learning Perl – TIMTOWTDI

I have begun learning Perl. Shouldn’t be difficult for someone with several years experience in PHP, Java, Python and Bash, right? While the concepts are not that foreign (at least in terms of functional programming) it seems that the Perl philosophy of ‘TIMTOWDI’ (There is more than one way to do it) does more to hamper the language than help it.

I am sure that there are plenty of Perl lovers out there who will disagree with me on this, but it seems to create more confusion, more work, and less in the way of readable documentation. I am using a combination of the O’Reilly book Programming Perl – Third Edition and various web sources including the documentation at CPAN. However, finding a repeatable, consistent way of doing a thing (which I prefer in most cases) is out of the question. I do have a feeling, however, that as I go I will find the ways of doing things that I am most comfortable with and probably stick with those (unless there is a very good reason not to, as in performance issues.)

I think that perhaps the biggest hurdle I am currently trying to overcome is the way in which Perl handles Object Oriented programming (or rather, doesn’t). The disconnect is (in my opinion) caused by a combination of pseudo-OO concepts bolted on to a functional language, TIMTOWDI (even for inheritance) and the differences in use and require – where require will let you select a Perl file to include by location (for example require '/var/development/perl/MyPerl.pl') but use can only call modules in the @INC path. Further, when modules are called by use they are evaluated at compile time, while modules called with require and then import are evaluated at run time. So if you have a piece you wish to be initialized and evaluated at compile time (i.e. when the script calling it is first loaded) then you either need to install the module to your standard path, or you need to add the module’s directory to @INC in order to find it.

Personally, I think I actually prefer the second method (adding a module’s directory to @INC) since it allows for picking up whole applications, dropping them into a system and having the primary controlling script find it’s cwd and add it and any module-containing sub-directories to @INC. Again, however, I can see the reasoning behind run-time evaluation in situations where there is no guarantee that a module will be needed, unless certain conditions arise. (Like choosing which version of a subclass to include based on configuration or program conditions.)

So, while I can understand the need for more than one way to do a thing, there does seem to be a point where you need to say we have enough ways to do it, stop adding more! One thing I have not decided yet, however, is whether I will be porting SPDO to Perl any time soon. My initial feeling is not likely, but I may end up doing it as a learning excercise anyway.

Database

SPDO 1.0b Release Candidate 1

SPDO version 1.0 beta Release Candidate 1 is out the door. This version adds the following:

Three custom Exception classes: SPDOError, ConnectionFailedError and ForeignKeyViolationError. Which are used (in order) when the program encounters an unknown sql error, a failure to connect to the database, and a violation of a foreign key constraint. Which means, of course, that foreign key constraints are now part of SPDO.

How this was accomplished is through the use of the new DbBuilder class. The DbBuilder class takes a multi-dimensional dict (Python) or array (PHP) of table and field data to build the tables and create triggers in MySQL and SQLite and add foreign keys in PostgreSQL. As an example consider the following:

In Python:

structure = {
    'authors':{
        'id':['primary_auto'],
        'first_name':['varchar(50)'],
        'last_name':['varchar(50)', 'NONULL', 'INDEX']
        },
    'books':{
        'id':['primary_auto'],
        'author_id':['int', "FK_CASCADE('authors','id')", 'INDEX'],
        'title':['varchar(100)','NONULL', 'INDEX']
        }
    }
dbb = DbBuilder(structure)
dbb.create()

In PHP:

$structure = array(
        'authors'=>array(
        'id'=>array('primary_auto'),
        'first_name'=>array('varchar(50)'),
        'last_name'=>array('varchar(50)', 'NONULL', 'INDEX')
    ),
    'books'=>array(
        'id'=>array('primary_auto'),
        'author_id'=>array('int', "FK_CASCADE('authors','id')", 'INDEX'),
        'title'=>array('varchar(100)','NONULL', 'INDEX')
    )
);
$dbb = DbBuilder($structure)
$dbb->create()

This results in the following queries to be executed:

In PostgreSQL:

CREATE TABLE authors (
    first_name VARCHAR(50) ,
    last_name VARCHAR(50) NOT NULL,
    id SERIAL PRIMARY KEY 
 );
CREATE TABLE books (
    author_id INTEGER ,
    id SERIAL PRIMARY KEY ,
    title VARCHAR(100) NOT NULL
);
CREATE INDEX authors_last_name_idx ON authors(last_name_id);
CREATE INDEX books_author_id_idx ON books(author_id);
CREATE INDEX books_title_idx ON books(title);
ALTER TABLE books ADD FOREIGN KEY (author_id) REFERENCES authors(id) ON DELETE CASCADE;

In MySQL:

CREATE TABLE authors (
    id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
    first_name VARCHAR(50),
    last_name VARCHAR(50) NOT NULL
);
CREATE TABLE books (
    id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
    author_id INTEGER NOT NULL,
    title VARCHAR(100) NOT NULL
);
CREATE TABLE fk_error_msg (
    error_msg VARCHAR(100) NOT NULL PRIMARY KEY
);
INSERT INTO fk_error_msg VALUES ('Foreign Key Constraint Violated!');
ALTER TABLE authors ADD INDEX(last_name);
ALTER TABLE books ADD INDEX(author_id);
ALTER TABLE books ADD INDEX(title);
CREATE TRIGGER fki_books_author_id
    BEFORE INSERT ON books
    FOR EACH ROW BEGIN
        IF
            0 = (SELECT COUNT(*) FROM authors WHERE id=new.author_id)
        THEN
            INSERT fk_error_msg VALUES ('Foreign Key Constraint Violated!');
        END IF;
    END;
CREATE TRIGGER fku_books_author_id
    BEFORE UPDATE ON books
    FOR EACH ROW BEGIN 
        IF
            0 = ( SELECT COUNT(*) FROM authors WHERE id = new.author_id )
        THEN
            INSERT INTO fk_error_msg VALUES ('Foreign Key Constraint Violated!');
        END IF ;
    END;
CREATE TRIGGER fkdc_books_author_id
    BEFORE DELETE ON authors
    FOR EACH ROW BEGIN
        DELETE FROM books WHERE author_id=old.id;
    END;

In SQLite:

CREATE TABLE authors (
    id INTEGER NOT NULL PRIMARY KEY,
    first_name VARCHAR(50),
    last_name VARCHAR(50) NOT NULL
);
CREATE TABLE books (
    id INTEGER NOT NULL PRIMARY KEY,
    author_id INTEGER NOT NULL,
    title VARCHAR(100) NOT NULL
);
CREATE INDEX authors_last_name_idx ON authors(last_name);
CREATE INDEX books_author_id_idx ON books(author_id);
CREATE INDEX books_title_idx ON books(title);
CREATE TRIGGER fki_books_author_id
    BEFORE INSERT ON [books]
        FOR EACH ROW BEGIN
            SELECT RAISE(ROLLBACK, 'insert on table "books" violates foreign key constraint "fki_books_author_id"')
            WHERE NEW.author_id IS NOT NULL AND (SELECT id FROM authors WHERE id = NEW.author_id) IS NULL;
        END;
CREATE TRIGGER fku_books_author_id
    BEFORE UPDATE ON [books]
        FOR EACH ROW BEGIN
            SELECT RAISE(ROLLBACK, 'update on table "books" violates foreign key constraint "fku_books_author_id"')
            WHERE NEW.author_id IS NOT NULL AND (SELECT id FROM authors WHERE id = NEW.author_id) IS NULL;
        END;
CREATE TRIGGER fkdc_books_author_id
    BEFORE DELETE ON authors
        FOR EACH ROW BEGIN
            DELETE FROM books WHERE books.author_id = OLD.id;
        END;

Be sure to check out the wiki, and the source code is all readable there, or download and enjoy. There are still a few edits that need to happen in the comments (like versions need to be updated, and changelogs added and there are a few typos probably still lurking about.)

Database

Foreign Key constraints in SPDO

While I have long advocated for doing referential integrity in the application space I have also always made use of things like foreign key constraints in PostgreSQL. I know that MySQL has methods for enforcing foreign key constraints in InnoDB tables, but I typically don’t use InnoDB. And until recently I never thought about using SQLite for anything really serious (you can take this to mean I am thinking about using SQLite for something serious). So I began to look into triggers for SQLite to mimic foreign key constraints. This got me thinking about using triggers in MySQL to do the same thing for MyISAM tables. A quick search turned up the answer. I fully expected foreign key constraints in MySQL 6. Of course this thinking comes from the phrase on MySQL documentation for 4.1, 5.0, 5.1, and (sadly) now 6.0:

For storage engines other than InnoDB, MySQL Server parses the FOREIGN KEY syntax in CREATE TABLE statements, but does not use or store it. In the future, the implementation will be extended to store this information in the table specification file so that it may be retrieved by mysqldump and ODBC. At a later stage, foreign key constraints will be implemented for MyISAM tables as well.

Wishful thinking aside, I did get a request to add foreign key constraints to the SPDO implementation, so I think I will get started on adding the following:

  • A SPDOException class to throw back useful information (such as foreign key constraint violation)
  • A method to create tables
  • A method to add foreign key constraints to tables created by SPDO

While I don’t want to get too convoluted with the structure of SPDO, I do want a well-encapsulated and easy-to-use PDO. In that vein the table creation and addition of foreign key constraints may come about via a Table class that provides a few simple methods to do all these things. I tend to determine my data structure before I begin development and build all my tables first, however, if I am careful about how the Table class is constructed I should be able to build all the tables in whatever RDBMS I happen to be using, with the proper triggers and constraints. entirely programmatically.

Best Practices

Portable code

When developing applications against a single RDBMS there is no need to worry about portability. However, when you are developing with the idea that the application may be deployed against any of a range of RDBMSes then you need to start thinking about how you formulate your queries, table structures, etc. While it is common practice to have separate classes for working with different databases, there are some things you can do to make that even easier.

ENUMs are EVIL

Developers who spend a lot of time developing against MySQL get into the habit of creating ENUM fields. (I will admit a certain amount of guilt here, too). This is not only not portable, but it doesn’t work quite the way you would expect. Even though the values in the ENUM are meant to be the only values you can insert into that field, MySQL does not enforce this quite the way you might think it should.

As an example:

CREATE TABLE enumevil (
    id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
    evil ENUM('foo','bar','baz') NOT NULL DEFAULT 'foo'
);
Query OK, 0 rows affected (0.01 sec)
INSERT INTO enumevil (evil) VALUES ('bip');
Query OK, 1 row affected, 1 warning (0.02 sec)
SELECT * FROM enumevil;
+----+------+
| id | evil |
+----+------+
|  1 |      | 
+----+------+
1 row IN SET (0.00 sec)

Rather than spitting back an error, it quietly sets the value of the ENUM field to the empty string. That is not in the list of allowed values nor is it the default. So you are left with checking the values in code before inserting (which you should do anyway – see the next section on referential integrity.)

Instead of using an ENUM field consider this:

CREATE TABLE noevil_enum_choice (
    choice VARCHAR(3) NOT NULL PRIMARY KEY
);
INSERT INTO noevil_enum_choice VALUES ('foo');
INSERT INTO noevil_enum_choice VALUES ('bar');
INSERT INTO noevil_enum_choice VALUES ('baz');
CREATE TABLE noevil (
    id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
    choice VARCHAR(3) NOT NULL REFERENCES noevil_enum_choice.choice
);

Although MySQL does not enforce referential integrity, this is now something can be easily converted to PostgreSQL as:

CREATE TABLE noevil_enum_choice (
    choice VARCHAR(3) NOT NULL PRIMARY KEY
);
INSERT INTO noevil_enum_choice VALUES ('foo');
INSERT INTO noevil_enum_choice VALUES ('bar');
INSERT INTO noevil_enum_choice VALUES ('baz');
CREATE TABLE noevil (
    id SERIAL PRIMARY KEY,
    choice VARCHAR(3) NOT NULL REFERENCES noevil_enum_choice(choice)
);

And PostgreSQL does enforce referential integrity. This also makes it easier to expand your choice list by just inserting a new row.

Referential Integrity

On the flip-side of the coin, developers who spend a good deal of time developing against PostgreSQL come to rely on the referential integrity built in to that RDBMS. While there is nothing wrong with that, your code should enforce this as well, if you want to be able to move from one RDBMS to another. In the example above we could rely on PostgreSQL’s REFERENCES statement to kick back an error whenever a bad value (like ‘bip’) is inserted into the choice field of the noevil table. However, as soon as you move your application to MySQL or sqlite it will happily insert anything you like into that field (with MySQL truncating it to three characters).

This is why it is important for your applications to take into consideration their own referential integrity. Here’s some python to illustrate:

import string
from spdo import *
 
class noevil(object):
    __init__(self, choice):
        self.db = SPDO('test')
        self.choices = []
        mydict = self.db.select("SELECT * FROM noevil_enum_choice")
        for c in mydict:
            self.choices.append(c['choice'])
        if choice not in choices:
            ex[] = ["Invalid value for choice: ", choice, " Valid options are: "]
            for c in self.choices:
                ex.append[c]
                ex.append[" "]
            raise Exception(string.join(ex, ''))
        # continue with normal processing here

This is by no means the entirety of the topic, or even more than merest tip of the iceberg, but it is a good place to start.