January 2009 – evardsson.com: stuff that w0rks

Development

SPDO Wiki and Updates

I updated SPDO some today – I added in MySQL support for the Python version, and changed the names of the objects and files to match the new moniker. Speaking of which, I have decided on “Simplified (Python|PHP) Database Object” – so yeah, SPDO is the official name now. (Of course I keep seeing it “speedo”).

I also started a Wiki which will be updated with documentation, samples, etc as I make the time. I am hoping to get a bunch done tomorrow afternoon and this weekend.

The version 0.9 files are spdo-py.tar.bz2 and spdo-php.tar.bz2

Development

New Project: SPDO

I have just posted the (embarrassingly empty) page for my new pet project: SPDO (Sjan’s PDO) – in two flavors: PHP and Python.

There are only about a thousand PDOs out there, and perhaps a dozen or so of them are really functional and (in a few cases) very polished pieces of work. So why am I messing with writing my own? A couple of reasons:

I like to have coding signatures that are consistent, whether I am working with PostgreSQL, MySQL or (heaven help us all) SQLite.
I like to have coding signatures that are (reasonably) consistent across different languages – in this case Python and PHP.
I wanted to take advantage of Prepared Statements where I could, even though the PHP implementations of those are pretty weak (especially in the case of MySQL).

Currently implemented in

Python:
- PostgreSQL (with prepared statements)
- SQLite (no prepared statements).
PHP
- PostgreSQL (with prepared statements)
- MySQL (with prepared statements)
- SQLite (no prepared statements)

Here’s an example of how they work (in most simplistic terms):

#in Python
from pyDB import *
db = pyDB('mysite')
newid = db.insert('INSERT INTO test (name, value) VALUES (?,?)',['foo','bar'])
update_count = db.update('UPDATE test SET value=? WHERE id=?',['baz',newid])
results = db.select('SELECT * FROM test')
for row in results:
    for i in row:
        print "\t", i,"\t", row[i]
delete_count = db.delete('DELETE FROM test WHERE id=?',[newid])

//in PHP
require_once('phpdb.php');
$db = new phpDB('test');
$newid = $db->insert('INSERT INTO test (name, value) VALUES (?,?)',array('foo','bar'));
$update_count = $db->update('UPDATE test SET value=? WHERE id=?',array('baz',newid));
$results = db->select('SELECT * FROM test');
foreach($results as $row)
{
    foreach ($row as $key=>$val)
    {
        print "\t$key\t$val";
    }
}
$delete_count = $db->delete('DELETE FROM test WHERE id=?',array($newid));

The page with links to the code is in the list up top, and everything is MIT license. Enjoy.

Database

UUID Why and How

Seeing the post from Aaron the other day on generating UUIDs in PHP got me to thinking about UUIDs in general.

The Why:

So why would we want to use UUIDs? The simple answer is that they are (almost) guaranteed to be a “Universally Unique” ID. So what? Every database table has its own row ids, why do they need to be unique from other database table row ids? Here’s an example:

Jo has three stores X, Y and Z. Each store has a local customer database. That database contains the following table structures:

persons:
  id
  name

phones:
  id
  person_id
  phone_number

(I didn’t say the tables were all that useful, but they work for the sake of discussion.)

Jo looks at the tables from all three stores and decides that it might be a good idea to combine them into one master customer database to share with all the stores. Here is where the trouble starts. When we compare the three databases we find that we cannot simply dump the data from any one of the databases into any of the others.

Store X

Store Y

Store Z

persons
id	name
1	Sam Smith
2	Terry Tyler
phones
id	person_id	phone_number
1	1	(555)555-1212
2	2	(555)555-1234
3	1	(555)555-4321

persons
id	name
1	Perry Paxson
2	Jeremiah Jones
phones
id	person_id	phone_number
1	1	(555)555-2121
2	2	(555)555-4412

persons
id	name
1	Edie Ellerts
2	Francis Fenimore
phones
id	person_id	phone_number
1	1	(555)555-3434
2	2	(555)555-7214
3	2	(555)555-9951

While this makes it simple in terms of looking up a customer’s phone number(s) on a single store, SELECT * FROM persons, phones WHERE phones.person_id = persons.person_id AND persons.name = 'Sam Smith', it leaves us in a tight spot where it comes to combining the databases. We either have to pull all the data programmatically and retain the references from phones to persons and re-insert the whole mess into a new set of tables, or we need to add another field to define which store database the record comes from and change the primary key from the single value (id) to a multiple-field key (id, store_name). This means, of course, that everywhere we reference these values in our code we need to change the SQL. Not a happy solution.

So what if we change the id field type from an integer to a UUID (which is a char(36))? Now our tables would look like:

Store X

persons
id	name
7d20cac1-d558-4a5c-98c0-ee332b554b29	Sam Smith
5fae5033-36e0-42f8-bb50-c00449c6cab0	Terry Tyler
phones
id	person_id	phone_number
82eec34b-897b-40af-9674-fa08e5537d12	7d20cac1-d558-4a5c-98c0-ee332b554b29	(555)555-1212
08400e86-9bd0-4713-bf6e-a5b8cd3e7e22	5fae5033-36e0-42f8-bb50-c00449c6cab0	(555)555-1234
ab64d8d7-0511-4b8e-ab57-9932c3c96b31	7d20cac1-d558-4a5c-98c0-ee332b554b29	(555)555-4321

Store Y

persons
id	name
9ca534e4-2dbd-4dc3-a7b3-98ab8b89fb70	Perry Paxson
4cc10585-b864-45bc-9f20-f9761af82e06	Jeremiah Jones
phones
id	person_id	phone_number
85302b40-c12b-4a05-8dbb-500f6f5cb26f	9ca534e4-2dbd-4dc3-a7b3-98ab8b89fb70	(555)555-2121
106a9524-800c-4e5a-ad4a-6ac9f601b370	4cc10585-b864-45bc-9f20-f9761af82e06	(555)555-4412

Store Z

persons
id	name
4da59fb9-ad00-4a5b-aa00-1bb503fe9c8f	Edie Ellerts
8a39f12c-94c4-4b70-ac70-3cff77be8b01	Francis Fenimore
phones
id	person_id	phone_number
47d0c49e-02df-4366-853b-a0502edb1de6	4da59fb9-ad00-4a5b-aa00-1bb503fe9c8f	(555)555-3434
cb868eca-822f-416a-ac7f-0df12504705f	8a39f12c-94c4-4b70-ac70-3cff77be8b01	(555)555-7214
e70bea84-66e8-407f-bb6c-b164864d22b1	8a39f12c-94c4-4b70-ac70-3cff77be8b01	(555)555-9951

We now have referential data that we can safely combine into a single database, and can even dump back into all the store local database copies.

The How:

When I need to generate my own UUIDs I certainly use the exact same code as Aaron, even cribbed from the exact same source. Of course Python has a UUID module which means you can create a version 1, 3, 4 or 5 UUID quickly and easily. I prefer the version 4 UUIDs myself, since version 1 contains identifiable data about the source, and versions 3 and 5 require other information to generate. (In case you are wondering, there is no version 2).

This is simple enough, but I actually prefer things a little simpler, and when I am using PostgreSQL they do get simpler as I can let the database do the UUID generation. (I am aware of the uuid() function that is built in to MySQL 5, however, that uses version 1 UUIDs which contain system identifiable data. If that is okay with you then by all means feel free to use it.) While there are at least a couple PostgreSQL plugins I have seen for generating UUIDs I actually make use of the CREATE FUNCTION capabilities of PostgreSQL to roll my own. (NOTE: To use this function you must have the PostgreSQL cryptographic libraries installed.)

So first I need to see what I am trying to generate.

From the RFC:

4.4. Algorithms for Creating a UUID from Truly Random or Pseudo-Random Numbers

The version 4 UUID is meant for generating UUIDs from truly-random or
pseudo-random numbers.

The algorithm is as follows:

Set the two most significant bits (bits 6 and 7) of the clock_seq_hi_and_reserved to zero and one, respectively.

Set the four most significant bits (bits 12 through 15) of the time_hi_and_version field to the 4-bit version number from Section 4.1.3.

Set all the other bits to randomly (or pseudo-randomly) chosen values.

The format:
xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

[For more in-depth analysis of how a UUID is constructed refer to http://tools.ietf.org/html/rfc4122]

Creating a version 4 UUID requires a bunch of random hexadecimal data, with one hexadecimal character (x) (equivalent to a nibble – 4 bits for those of you who don’t recall) set to the value of 4 (binary 0100 from Section 4.1.3 of the RFC) and one hexadecimal character (x) set to one of 4 values: 8, 9, a or b; which is what the first nibble of a byte is when the two most significant bits are 10. For proof of this see below:

3	2	1	0	Hex	Dec
Bit
1	0	0	0	8	8
1	0	0	1	9	9
1	0	1	0	a	10
1	0	1	1	b	11

Now let’s look at what means we have for generating random data in hexadecimal format. PostgreSQL includes a random() function which generates a double precision number between 0 and 1. We also have a to_hex() function that will convert integer or binary string data to a hexadecimal number. This is a start, but we really don’t want to be calling random() more often than we need to, so is there a way to generate a bunch of hexadecimal data at one shot? Actually, there is, if we just add a sha1 function. So let’s start there:

CREATE OR REPLACE FUNCTION sha1(text)
  RETURNS character AS
$BODY$
BEGIN
RETURN ENCODE(DIGEST($1, 'sha1'), 'hex');
END;
$BODY$
  LANGUAGE 'plpgsql';

Now we have a quick way to generate lots of hexadecimal characters, all we need now is to randomize, and format according to what we have stated above.

CREATE OR REPLACE FUNCTION uuid()
  RETURNS character AS
$BODY$
SELECT SUBSTRING(x.my_rand FROM 1 FOR 8)||'-'||SUBSTRING(x.my_rand FROM 9 FOR 4)||'-4'||SUBSTRING(x.my_rand FROM 13 FOR 3)||'-'||x.clock_1||SUBSTRING(x.my_rand FROM 16 FOR 3)||'-'||SUBSTRING(x.my_rand FROM 19 FOR 12)
FROM  
(SELECT sha1(''||now()||random()) as my_rand, to_hex(8+(3*random())::int) as clock_1) as x;$BODY$
  LANGUAGE 'sql';

Notice that we are using now() to grab the current timestamp (including milliseconds), and concatenating that with the output of random to feed to sha1() to generate a hex digest. We are also generating a value of either 8, 9, a, or b for the first nibble in the clock_seq_hi field (the character highlighted in blue) by adding the value of 3 * random() cast as an integer to 8 and then returning that as a hex value. We then use SUBSTRING to pull the number of characters we need for each chunk and keep on moving left to right in the hash to grab the next values and concatenate them with dashes where needed and our version number.

For this and other PostgreSQL functions I use regularly you can download this SQL file.

Home

Job Hunting

Anyone who has read this blog for any length of time knows I don’t usually talk about work except to share things that I have learned or discovered while doing said work. This post, however, is a break from that tradition, as I am no longer employed. I was laid off in the latest round of RIF (reduction in force, a more polite way of saying downsized). I am back in the market and finding that there are a few things that make it tricky. As I pointed out in this post none of the code I have worked on recently belongs to me.

Which means that for those who desire code samples I need to do one of three things: either hand over old or unfinished code, write something entirely new, or get permission from my previous employer to share samples of some of the work where I was the only (or at least principal) developer. While the first is not appealing and the second will be ok once I have to time actually build something new or finish something I started back when, the third would be the simplest, at least if I felt that I would get the ok. But seeing how the last day went, I am not so sure. I walked some of the remaining developers through the code and documentation for a project I had headed up, and was on hand to put out fires. I was expecting to be answering questions right up to 5:00 PM. But at 3:30 PM my email account was disabled and shortly after that my Jabber account. While that was not unexpected, the lack of a final message from any of my current bosses was. No thanks, no sorry about the lay off, no good luck, nothing. I did get the farewells, sorry and thanks for everything you’ve done (as well a written LinkedIn reccomendation) from my former boss last week, and I appreciate that.

I don’t know, maybe I am just being overly sensitive. Or maybe I just expect too much from people. But would it really have been that hard to send an email to all the departing employees? Meh. I’ll get over it.