Category: Development

Because apps don’t happen on their own

Best Practices

Portable code

When developing applications against a single RDBMS there is no need to worry about portability. However, when you are developing with the idea that the application may be deployed against any of a range of RDBMSes then you need to start thinking about how you formulate your queries, table structures, etc. While it is common practice to have separate classes for working with different databases, there are some things you can do to make that even easier.

ENUMs are EVIL

Developers who spend a lot of time developing against MySQL get into the habit of creating ENUM fields. (I will admit a certain amount of guilt here, too). This is not only not portable, but it doesn’t work quite the way you would expect. Even though the values in the ENUM are meant to be the only values you can insert into that field, MySQL does not enforce this quite the way you might think it should.

As an example:

CREATE TABLE enumevil (
    id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
    evil ENUM('foo','bar','baz') NOT NULL DEFAULT 'foo'
);
Query OK, 0 rows affected (0.01 sec)
INSERT INTO enumevil (evil) VALUES ('bip');
Query OK, 1 row affected, 1 warning (0.02 sec)
SELECT * FROM enumevil;
+----+------+
| id | evil |
+----+------+
|  1 |      | 
+----+------+
1 row IN SET (0.00 sec)

Rather than spitting back an error, it quietly sets the value of the ENUM field to the empty string. That is not in the list of allowed values nor is it the default. So you are left with checking the values in code before inserting (which you should do anyway – see the next section on referential integrity.)

Instead of using an ENUM field consider this:

CREATE TABLE noevil_enum_choice (
    choice VARCHAR(3) NOT NULL PRIMARY KEY
);
INSERT INTO noevil_enum_choice VALUES ('foo');
INSERT INTO noevil_enum_choice VALUES ('bar');
INSERT INTO noevil_enum_choice VALUES ('baz');
CREATE TABLE noevil (
    id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
    choice VARCHAR(3) NOT NULL REFERENCES noevil_enum_choice.choice
);

Although MySQL does not enforce referential integrity, this is now something can be easily converted to PostgreSQL as:

CREATE TABLE noevil_enum_choice (
    choice VARCHAR(3) NOT NULL PRIMARY KEY
);
INSERT INTO noevil_enum_choice VALUES ('foo');
INSERT INTO noevil_enum_choice VALUES ('bar');
INSERT INTO noevil_enum_choice VALUES ('baz');
CREATE TABLE noevil (
    id SERIAL PRIMARY KEY,
    choice VARCHAR(3) NOT NULL REFERENCES noevil_enum_choice(choice)
);

And PostgreSQL does enforce referential integrity. This also makes it easier to expand your choice list by just inserting a new row.

Referential Integrity

On the flip-side of the coin, developers who spend a good deal of time developing against PostgreSQL come to rely on the referential integrity built in to that RDBMS. While there is nothing wrong with that, your code should enforce this as well, if you want to be able to move from one RDBMS to another. In the example above we could rely on PostgreSQL’s REFERENCES statement to kick back an error whenever a bad value (like ‘bip’) is inserted into the choice field of the noevil table. However, as soon as you move your application to MySQL or sqlite it will happily insert anything you like into that field (with MySQL truncating it to three characters).

This is why it is important for your applications to take into consideration their own referential integrity. Here’s some python to illustrate:

import string
from spdo import *
 
class noevil(object):
    __init__(self, choice):
        self.db = SPDO('test')
        self.choices = []
        mydict = self.db.select("SELECT * FROM noevil_enum_choice")
        for c in mydict:
            self.choices.append(c['choice'])
        if choice not in choices:
            ex[] = ["Invalid value for choice: ", choice, " Valid options are: "]
            for c in self.choices:
                ex.append[c]
                ex.append[" "]
            raise Exception(string.join(ex, ''))
        # continue with normal processing here

This is by no means the entirety of the topic, or even more than merest tip of the iceberg, but it is a good place to start.

Development

SPDO Wiki and Updates

I updated SPDO some today – I added in MySQL support for the Python version, and changed the names of the objects and files to match the new moniker. Speaking of which, I have decided on “Simplified (Python|PHP) Database Object” – so yeah, SPDO is the official name now. (Of course I keep seeing it “speedo”).

I also started a Wiki which will be updated with documentation, samples, etc as I make the time. I am hoping to get a bunch done tomorrow afternoon and this weekend.

The version 0.9 files are spdo-py.tar.bz2 and spdo-php.tar.bz2

Development

New Project: SPDO

I have just posted the (embarrassingly empty) page for my new pet project: SPDO (Sjan’s PDO) – in two flavors: PHP and Python.

There are only about a thousand PDOs out there, and perhaps a dozen or so of them are really functional and (in a few cases) very polished pieces of work. So why am I messing with writing my own? A couple of reasons:

I like to have coding signatures that are consistent, whether I am working with PostgreSQL, MySQL or (heaven help us all) SQLite.
I like to have coding signatures that are (reasonably) consistent across different languages – in this case Python and PHP.
I wanted to take advantage of Prepared Statements where I could, even though the PHP implementations of those are pretty weak (especially in the case of MySQL).

Currently implemented in

Python:
- PostgreSQL (with prepared statements)
- SQLite (no prepared statements).
PHP
- PostgreSQL (with prepared statements)
- MySQL (with prepared statements)
- SQLite (no prepared statements)

Here’s an example of how they work (in most simplistic terms):

#in Python
from pyDB import *
db = pyDB('mysite')
newid = db.insert('INSERT INTO test (name, value) VALUES (?,?)',['foo','bar'])
update_count = db.update('UPDATE test SET value=? WHERE id=?',['baz',newid])
results = db.select('SELECT * FROM test')
for row in results:
    for i in row:
        print "\t", i,"\t", row[i]
delete_count = db.delete('DELETE FROM test WHERE id=?',[newid])

//in PHP
require_once('phpdb.php');
$db = new phpDB('test');
$newid = $db->insert('INSERT INTO test (name, value) VALUES (?,?)',array('foo','bar'));
$update_count = $db->update('UPDATE test SET value=? WHERE id=?',array('baz',newid));
$results = db->select('SELECT * FROM test');
foreach($results as $row)
{
    foreach ($row as $key=>$val)
    {
        print "\t$key\t$val";
    }
}
$delete_count = $db->delete('DELETE FROM test WHERE id=?',array($newid));

The page with links to the code is in the list up top, and everything is MIT license. Enjoy.

Database

UUID Why and How

Seeing the post from Aaron the other day on generating UUIDs in PHP got me to thinking about UUIDs in general.

The Why:

So why would we want to use UUIDs? The simple answer is that they are (almost) guaranteed to be a “Universally Unique” ID. So what? Every database table has its own row ids, why do they need to be unique from other database table row ids? Here’s an example:

Jo has three stores X, Y and Z. Each store has a local customer database. That database contains the following table structures:

persons:
  id
  name

phones:
  id
  person_id
  phone_number

(I didn’t say the tables were all that useful, but they work for the sake of discussion.)

Jo looks at the tables from all three stores and decides that it might be a good idea to combine them into one master customer database to share with all the stores. Here is where the trouble starts. When we compare the three databases we find that we cannot simply dump the data from any one of the databases into any of the others.

Store X

Store Y

Store Z

persons
id	name
1	Sam Smith
2	Terry Tyler
phones
id	person_id	phone_number
1	1	(555)555-1212
2	2	(555)555-1234
3	1	(555)555-4321

persons
id	name
1	Perry Paxson
2	Jeremiah Jones
phones
id	person_id	phone_number
1	1	(555)555-2121
2	2	(555)555-4412

persons
id	name
1	Edie Ellerts
2	Francis Fenimore
phones
id	person_id	phone_number
1	1	(555)555-3434
2	2	(555)555-7214
3	2	(555)555-9951

While this makes it simple in terms of looking up a customer’s phone number(s) on a single store, SELECT * FROM persons, phones WHERE phones.person_id = persons.person_id AND persons.name = 'Sam Smith', it leaves us in a tight spot where it comes to combining the databases. We either have to pull all the data programmatically and retain the references from phones to persons and re-insert the whole mess into a new set of tables, or we need to add another field to define which store database the record comes from and change the primary key from the single value (id) to a multiple-field key (id, store_name). This means, of course, that everywhere we reference these values in our code we need to change the SQL. Not a happy solution.

So what if we change the id field type from an integer to a UUID (which is a char(36))? Now our tables would look like:

Store X

persons
id	name
7d20cac1-d558-4a5c-98c0-ee332b554b29	Sam Smith
5fae5033-36e0-42f8-bb50-c00449c6cab0	Terry Tyler
phones
id	person_id	phone_number
82eec34b-897b-40af-9674-fa08e5537d12	7d20cac1-d558-4a5c-98c0-ee332b554b29	(555)555-1212
08400e86-9bd0-4713-bf6e-a5b8cd3e7e22	5fae5033-36e0-42f8-bb50-c00449c6cab0	(555)555-1234
ab64d8d7-0511-4b8e-ab57-9932c3c96b31	7d20cac1-d558-4a5c-98c0-ee332b554b29	(555)555-4321

Store Y

persons
id	name
9ca534e4-2dbd-4dc3-a7b3-98ab8b89fb70	Perry Paxson
4cc10585-b864-45bc-9f20-f9761af82e06	Jeremiah Jones
phones
id	person_id	phone_number
85302b40-c12b-4a05-8dbb-500f6f5cb26f	9ca534e4-2dbd-4dc3-a7b3-98ab8b89fb70	(555)555-2121
106a9524-800c-4e5a-ad4a-6ac9f601b370	4cc10585-b864-45bc-9f20-f9761af82e06	(555)555-4412

Store Z

persons
id	name
4da59fb9-ad00-4a5b-aa00-1bb503fe9c8f	Edie Ellerts
8a39f12c-94c4-4b70-ac70-3cff77be8b01	Francis Fenimore
phones
id	person_id	phone_number
47d0c49e-02df-4366-853b-a0502edb1de6	4da59fb9-ad00-4a5b-aa00-1bb503fe9c8f	(555)555-3434
cb868eca-822f-416a-ac7f-0df12504705f	8a39f12c-94c4-4b70-ac70-3cff77be8b01	(555)555-7214
e70bea84-66e8-407f-bb6c-b164864d22b1	8a39f12c-94c4-4b70-ac70-3cff77be8b01	(555)555-9951

We now have referential data that we can safely combine into a single database, and can even dump back into all the store local database copies.

The How:

When I need to generate my own UUIDs I certainly use the exact same code as Aaron, even cribbed from the exact same source. Of course Python has a UUID module which means you can create a version 1, 3, 4 or 5 UUID quickly and easily. I prefer the version 4 UUIDs myself, since version 1 contains identifiable data about the source, and versions 3 and 5 require other information to generate. (In case you are wondering, there is no version 2).

This is simple enough, but I actually prefer things a little simpler, and when I am using PostgreSQL they do get simpler as I can let the database do the UUID generation. (I am aware of the uuid() function that is built in to MySQL 5, however, that uses version 1 UUIDs which contain system identifiable data. If that is okay with you then by all means feel free to use it.) While there are at least a couple PostgreSQL plugins I have seen for generating UUIDs I actually make use of the CREATE FUNCTION capabilities of PostgreSQL to roll my own. (NOTE: To use this function you must have the PostgreSQL cryptographic libraries installed.)

So first I need to see what I am trying to generate.

From the RFC:

4.4. Algorithms for Creating a UUID from Truly Random or Pseudo-Random Numbers

The version 4 UUID is meant for generating UUIDs from truly-random or
pseudo-random numbers.

The algorithm is as follows:

Set the two most significant bits (bits 6 and 7) of the clock_seq_hi_and_reserved to zero and one, respectively.

Set the four most significant bits (bits 12 through 15) of the time_hi_and_version field to the 4-bit version number from Section 4.1.3.

Set all the other bits to randomly (or pseudo-randomly) chosen values.

The format:
xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

[For more in-depth analysis of how a UUID is constructed refer to http://tools.ietf.org/html/rfc4122]

Creating a version 4 UUID requires a bunch of random hexadecimal data, with one hexadecimal character (x) (equivalent to a nibble – 4 bits for those of you who don’t recall) set to the value of 4 (binary 0100 from Section 4.1.3 of the RFC) and one hexadecimal character (x) set to one of 4 values: 8, 9, a or b; which is what the first nibble of a byte is when the two most significant bits are 10. For proof of this see below:

3	2	1	0	Hex	Dec
Bit
1	0	0	0	8	8
1	0	0	1	9	9
1	0	1	0	a	10
1	0	1	1	b	11

Now let’s look at what means we have for generating random data in hexadecimal format. PostgreSQL includes a random() function which generates a double precision number between 0 and 1. We also have a to_hex() function that will convert integer or binary string data to a hexadecimal number. This is a start, but we really don’t want to be calling random() more often than we need to, so is there a way to generate a bunch of hexadecimal data at one shot? Actually, there is, if we just add a sha1 function. So let’s start there:

CREATE OR REPLACE FUNCTION sha1(text)
  RETURNS character AS
$BODY$
BEGIN
RETURN ENCODE(DIGEST($1, 'sha1'), 'hex');
END;
$BODY$
  LANGUAGE 'plpgsql';

Now we have a quick way to generate lots of hexadecimal characters, all we need now is to randomize, and format according to what we have stated above.

CREATE OR REPLACE FUNCTION uuid()
  RETURNS character AS
$BODY$
SELECT SUBSTRING(x.my_rand FROM 1 FOR 8)||'-'||SUBSTRING(x.my_rand FROM 9 FOR 4)||'-4'||SUBSTRING(x.my_rand FROM 13 FOR 3)||'-'||x.clock_1||SUBSTRING(x.my_rand FROM 16 FOR 3)||'-'||SUBSTRING(x.my_rand FROM 19 FOR 12)
FROM  
(SELECT sha1(''||now()||random()) as my_rand, to_hex(8+(3*random())::int) as clock_1) as x;$BODY$
  LANGUAGE 'sql';

Notice that we are using now() to grab the current timestamp (including milliseconds), and concatenating that with the output of random to feed to sha1() to generate a hex digest. We are also generating a value of either 8, 9, a, or b for the first nibble in the clock_seq_hi field (the character highlighted in blue) by adding the value of 3 * random() cast as an integer to 8 and then returning that as a hex value. We then use SUBSTRING to pull the number of characters we need for each chunk and keep on moving left to right in the hash to grab the next values and concatenate them with dashes where needed and our version number.

For this and other PostgreSQL functions I use regularly you can download this SQL file.

Development

Learning Python

I was asked today “How do I learn python?” (I’m not making this up, that is a direct quote lifted right out of the chat log!)

Of course my first response was “start at python.org” (as it would be.) That is all well and good, but I started thinking about what specific things helped me? So, here are some specifics that might work well for you if, like me, you learn about half and half from reading (theory) and doing (practice.)

The first step, of course, is to download Python. You don’t need a fancy IDE to develop in Python, although if you have one you are comfortable with and use all the time, chances are that it has a Python plugin. (I know Eclipse does, and I believe NetBeans does, and jEdit has a Python client plugin that lets you run Python code right in the editor.) If you don’t want to muck about with an IDE, though, Python comes with it’s own lightweight editor, IDLE. Open it up, play around a bit.

Ok, so you’re looking at IDLE thinking “well, there isn’t much here is there?” You’re right – time for step two. Head over to the beginner’s guide and go for it! There are tutorials aimed at people who are new to Python and to programming in general, and people who are new to Python but not to programming. There is even a list of Python courses you can buy, and a fairly large list of introductory Python books for those who learn best that way. There isn’t much I can say that isn’t there already. I kind of wish that I had started there myself. When I started playing with Python I skipped over this and went straight for the documentation and googling for specific answers, as I was trying to build a script that I needed at the time to parse 1 -2 GB log files. Java was too much tool for the job, I didn’t have access to a beefy enough Linux box to do it in Bash and PHP was doing nothing but running out of memory. It made for a bit of a rough entry into Python, but I kind of like that sort of challenge.

Which brings us to step three – build something. No, not a “Hello World” app, but something useful, in the immediate sense. Something that will make you smile, or make your life simpler, or even something that will allow you to tick off an item on your to-do list.

So what (besides the end of my rambling in parenthetical asides) are you waiting for?

Development

Learning things the hard way

These are some things I learned the hard way over the past few weeks. Ignore at your peril!

When your setup requires a different database layout (meaning differently named and/or organized databases and/or servers) for development than for live deployment, you would do well to try to mimick the live layout in your development environment before beginning. (This one hurt, bad.) In fact, when you find a need for your application to test for current environment (development vs production), you may want to be extra certain that you really need to set up your development structure that differently.
When your test & deployment environment calls for a server external to the existing systems and database servers, make sure that you:
1. Have a clear understanding of what packages/versions that server will need.
2. Have a clear understanding of how long it will take to get the hardware in place.
3. Have a working copy in a VM on the same OS that the server will be deployed on before you say ok to anything.
4. Ask, beg, plead, whatever, for a copy of that server (even on desktop hardware) to be set up for the testing. If you cannot test against real machines in a real environment for any kind of multi-server deployment, all bets are off. (Um, yeah, again, ouch.)
While the occasional 12+ hour day for developers is inevitable, too many of them in a row will result in more mistakes and coding errors of the most simplistic sort.
When you discover that you need to add functionality to an integral part of the software (like, say, the primary db connection class) do everything within your power to ensure that it still “works the same” – that is, add any modifications in such a way that they do not break existing code. This is not always easy to tell, but usually a simple trip around the application will tell you quickly if you have broken something as basic as this. (No, I didn’t break it, at least not anywhere other than locally. I actually took this step, many times and saved myself great embarrassment.)
Finally, when switching between languages (PHP, Java, Bash, Javascript, etc etc) you may find yourself looking at code that should work, but doesn’t. Chaining calls in PHP seems to be iffy at best. For example:
- This does not work, although it seems it should. The $db is an instance of a legacy database connection class used for the project, do_query returns an array of results, and do_query with a second argument sets the value of the field named as the key for each row:list($account) = array_keys($db->do_query(“SELECT account_id FROM reports WHERE store_db=’$store’ LIMIT 1”, ‘account_id’));
- while this does, using the same class and method:$data = $db->do_query(“SELECT account_id FROM reports WHERE store_db=’$store’ LIMIT 1”, ‘account_id’);
  list($account) = array_keys($data);
  // this also works:
  // $account = $data[0][‘account_id’];
- Really, sometimes an extra line is worth not having the headache.

Development

iPhone icons for your site

Adding a shortcut to a favorite site on an iPhone (or iPod Touch) is as easy as tapping the “+” sign at the bottom of the Safari browser on the phone and selecting “Add to Homepage” – but the icon is not so appealing. In fact, the iPhone defaults to a tiny, cropped screenshot of the site unless it finds a 57×57 pixel png file in the site root. This is similar to the concept of the favicon.ico for web browsers. The file needs to be named apple-touch-icon.png.

The iPhone (or iPod Touch) will round the corners and overlay the glass look to make it blend in to the overall look and feel of the “Springboard” (their name for the desktop on these devices.)

Mine looks like this (before the modifications which take place on the device itself):

And this is what it looks like on the iPhone:

Development

Problems arising from PHP type casting in ==

While trying to work through the issues I mentioned in the last post I started doing some serious digging and testing. Here is what I have found.

PHP seems to use == for determining equivalance when performing array_search, in_array and switch, while using ~~either === or~~ strcmp when doing array_key_exists.

The result of this is that array_search and in_array will return improper results when used on an array with mixed string and integer values. (Another thing I found, that may or may not be related, is that array keys will be cast from strings to integers when those strings are valid integer values.)

array_search() with mixed types

$one = array (
  'abc',
  'abc1',
  111,
  '111b',
  2,
  '2xyz',
  '123a',
  123
);
$two = $one;
for ($i = 0; $i < count($one); $i++) {
  $xkey = array_search($one[$i], $two);
  if(strcmp(strval($one[$i]), strval($two[$xkey])) != 0) {
    // This should NEVER be reached, but it is, often!
    $eq = 'FALSE';
  } else {
    $eq = 'true';
  }
}

Row	$one	$two	Correct?	Found	Notes
0	abc	abc	true	0	abc == abc : array_search($one[0], $two) where $one[0] = string(3) “abc”
1	abc1	abc1	true	1	abc1 == abc1 : array_search($one[1], $two) where $one[1] = string(4) “abc1”
2	111	111	true	2	111 == 111 : array_search($one[2], $two) where $one[2] = int(111)
3	111b	111b	FALSE	2	111b == 111 : array_search($one[3], $two) where $one[3] = string(4) “111b”
4	2	2	true	4	2 == 2 : array_search($one[4], $two) where $one[4] = int(2)
5	2xyz	2xyz	FALSE	4	2xyz == 2 : array_search($one[5], $two) where $one[5] = string(4) “2xyz”
6	123a	123a	true	6	123a == 123a : array_search($one[6], $two) where $one[6] = string(4) “123a”
7	123	123	FALSE	6	123 == 123a : array_search($one[7], $two) where $one[7] = int(123)

array_search() with all strings

$one = array (
  'abc',
  'abc1',
  '111',
  '111b',
  '2',
  '2xyz',
  '123a',
  '123'
);
$two = $one;
for ($i = 0; $i < count($one); $i++) {
  $xkey = array_search($one[$i], $two);
  if(strcmp(strval($one[$i]), strval($two[$xkey])) != 0) {
    // This should NEVER be reached, and with all strings it isn't.
    $eq = 'FALSE';
  } else {
    $eq = 'true';
  }
}

Row	$one	$two	Correct?	Found	Notes
0	abc	abc	true	0	abc == abc : array_search($one[0], $two) where $one[0] = string(3) “abc”
1	abc1	abc1	true	1	abc1 == abc1 : array_search($one[1], $two) where $one[1] = string(4) “abc1”
2	111	111	true	2	111 == 111 : array_search($one[2], $two) where $one[2] = string(3) “111”
3	111b	111b	true	3	111b == 111b : array_search($one[3], $two) where $one[3] = string(4) “111b”
4	2	2	true	4	2 == 2 : array_search($one[4], $two) where $one[4] = string(1) “2”
5	2xyz	2xyz	true	5	2xyz == 2xyz : array_search($one[5], $two) where $one[5] = string(4) “2xyz”
6	123a	123a	true	6	123a == 123a : array_search($one[6], $two) where $one[6] = string(4) “123a”
7	123	123	true	7	123 == 123 : array_search($one[7], $two) where $one[7] = string(3) “123”

in_array() and array_key_exists()

$array = array('111'=>'111', '11b'=>'11b', '222b'=>'222b','2x22'=>'2x22');
$keys = array_keys($array);
$searches = array('111b',222,11,'222b',2);
foreach ($searches as $search) {
  $ia = (in_array($search, $array))?'true':'false';
  $ake = (array_key_exists($search, $array))?'true':'false';
  if ($search === '222b') {
    // This is the only place where either should return true
    $iaf = ($ia == 'true')?" class=\"$true\"":" class=\"$false\"";
    $akef = ($ake == 'true')?" class=\"$true\"":" class=\"$false\"";
    $notes = "** Both should be true **";
  } else {
    $iaf = ($ia == 'false')?" class=\"$true\"":" class=\"$false\"";
    $akef = ($ake == 'false')?" class=\"$true\"":" class=\"$false\"";
    $notes = "Both should be false";
  }
}

Notice how the array keys are cast to type int in both the original array and in array_keys.

$array	$keys
array(4) { [111]=> string(3) "111" ["11b"]=> string(3) "11b" ["222b"]=> string(4) "222b" ["2x22"]=> string(4) "2x22" }	array(4) { [0]=> int(111) [1]=> string(3) "11b" [2]=> string(4) "222b" [3]=> string(4) "2x22" }

$array

$keys

array(4) {
  [111]=>
  string(3) "111"
  ["11b"]=>
  string(3) "11b"
  ["222b"]=>
  string(4) "222b"
  ["2x22"]=>
  string(4) "2x22"
}

array(4) {
  [0]=>
  int(111)
  [1]=>
  string(3) "11b"
  [2]=>
  string(4) "222b"
  [3]=>
  string(4) "2x22"
}

Search Item	in_array	array_key_exists	Notes
111b	false	false	Both should be false
222	true	false	Both should be false
11	true	false	Both should be false
222b	true	true	Both should be true
2	true	false	Both should be false

So, it appears that array_key_exists() uses ~~either or ===~~ strcmp() while in_array() uses ==

NOTE: Calling array_key_exists() with a string value ‘111’ will return true for an item with a key of int 111. This is not the same behavior as ===, but is the same behavior as strcmp() which must be what is used internally for array_key_exists().

The difference between the 3 operations is clear:

$a	$b	$a == $b	$a === $b	strcmp(strval($a),strval($b))
int(123)	string(4) “123b”	bool(true)	bool(false)	int(-1)

Another area where this becomes an issue is in switch statements. Take the following, for example:

switch()

$array = array(111,'222b');
foreach($array as $val)
{
  $row = ($row == 'row')?'offset_row':'row';
  $false = ($false == 'false')?'offset_false':'false';
  $true = ($true == 'true')?'offset_true':'true';
  switch($val)
  {
    case '111b': // this displays
      $match = '111b';
      $f = " class=\"$false\"";
      $notes = "Incorrect: should have fallen through to next case";
      break;
    case 111: // never makes it here even tho this is correct
      $match = 111;
      $f = " class=\"$true\"";
      $notes = "** Correct **";
      break;
    case 222: // this displays
      $match = 222;
      $f = " class=\"$false\"";
      $notes = "Incorrect: should have fallen through to next case";
      break;
    case '222b': // never makes it here even tho this is correct
      $match = '222b';
      $f = " class=\"$true\"";
      $notes = "** Correct **";
      break;
    default:
      $match = 'no match';
      $f = " class=\"$false\"";
      $notes = "Incorrect: should have matched";
      break;
  }
}

Search Item	Match	Notes
111	111b	Incorrect: should have fallen through to next case
222b	222	Incorrect: should have fallen through to next case

Development

PHP array_search implicit cast of search term

There is an error in the values that array_search returns when searching on an array that has a mix of numeric values (123) and alpha-numeric mixed strings that start with the same and follow with alpha characters (‘123a’).

The results are actually kind of bizarre, but explainable by a bug in PHP’s equivalence test. When testing for equivalence (using ==) PHP determines that 123 == ‘123xyz’. PHP casts the string to an integer when doing the comparison (so ‘123xyz’ becomes 123). This is documented in bugs.php.net (http://bugs.php.net/bug.php?id=23110) – but this leads to problems: both switch and array_search use == for comparison.

So, using:

$one = array (
  'abc',
  'abc1',
  111,
  '111b',
  2,
  '2xyz',
  '123a',
  123
);
$two = $one;

foreach($one as $val)
{
  $key = array_search($val, $two);
  if ($key !== false) {
    echo "$val == {$two[$key]} \n";
    if (strcmp(strval($val), strval($two[$key])) == 0) {
      echo "strcmp returns true";
    } else {
      echo "strcmp returns false";
    }
  } else {
    echo "$val not found \n";
  }
}

results in:

abc == abc -- strcmp returns true
abc1 == abc1 -- strcmp returns true
111 == 111 -- strcmp returns true
111b == 111 -- strcmp returns false
2 == 2 -- strcmp returns true
2xyz == 2 -- strcmp returns false
123a == 123a -- strcmp returns true
123 == 123a -- strcmp returns false

This becomes a real problem when you can’t be sure that the values in an array are all of the same type. However, if you are sure that all the values in the array are of type string then array_search works flawlessly.

I am still unsure how to work around this, however, I think having a version of array_search that doesn’t do an implicit cast on the search value would be of great use.

Development

New Class of Exploits: Dangling Pointers

While dangling pointers are a common coding error (especially in C++) there has previously been no way known to exploit them. In fact, they were generally considered a quality control issue rather than a security issue. That is all set to change. According to an article today from SearchSecurity Jonathan Afek and Adi Sharabani of Watchfire Inc have uncovered a way to exploit generic dangling pointers to run shell code on a server in much the same fashion as buffer overflows. According to Danny Allen (also of Watchfire) this technique can be used on any application with dangling pointers.

Afek will be giving a presentation on the technique in August at the Black Hat Briefings in Las Vegas.

Technorati Tags: security, dangling pointers

evardsson.com: stuff that w0rks