Author: sjan

Python

Fix for firehol get-iana script

I have talked before about using firehol to configure iptables. I won’t go into all the details about how wonderful and awesome it is, but trust me, it makes configuring iptables a snap.

Firehol includes a script, get-iana.sh, that downloads the IPv4 address space list from IANA and populates a file called RESERVED_IPS that firehol uses when configuring iptables. Basically, any traffic from outside coming from any reserved or unallocated IP block is dropped automatically. As you can imagine, keeping this file updated regularly is important, as previously unallocated blocks are allocated for use. To this end, whenever firehol starts it checks the age of the RESERVED_IPS file and if it is older than 90 days warns you to update it by running the supplied get-iana.sh.

However, there has been a change recently in how the IANA reserved IPv4 address space file is formatted. There are lots of posts on plenty of forums with patches for get-iana.sh to accept and use the new format plain text file (while the default is now XML rather than plain text) and needless to say I tried every single one I could find. None of them worked, so what to do? How about a complete rewrite in Python? And while we’re at it, let’s use the XML format that IANA wants everyone to use.

So, one lunch hour of hacking and here it is, working like a charm. You can copy this, but I recommend downloading it to avoid whitespace issues.

#!/usr/bin/python

"""
file: get-iana.py

Replacement for get-iana.sh that ships with firehol and no longer seems to work.
This is less code, less confusing, uses the preferred XML format from IANA and works.

Copyright (c) 2010 Sjan Evardsson

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
"""

import urllib
import xml.dom.minidom
import os
urllib.urlretrieve('http://www.iana.org/assignments/ipv4-address-space/ipv4-address-space.xml','address-space.xml')
results = []
x = xml.dom.minidom.parse('address-space.xml')
for i in x.childNodes:
    if i.localName == 'registry':
        for j in i.childNodes:
            if j.localName == 'record':
                for k in j.childNodes:
                    if k.localName == 'prefix':
                        prefix = k.firstChild.data
                    if k.localName == 'status':
                        status = k.firstChild.data
                if status == 'RESERVED' or status == 'UNALLOCATED':
                    results.append(prefix)
outfile = open('iana-temp','w')
for r in results:
    hi = int(r.split('/')[0])
    outfile.write(str(hi)+'.0.0.0/8\n')
outfile.close()
os.remove('address-space.xml')
os.rename('/etc/firehol/RESERVED_IPS','/etc/firehol/RESERVED_IPS.old')
os.rename('iana-temp','/etc/firehol/RESERVED_IPS')

Community

International ME/CFS Awareness Day

Today is (was?) International ME/CFS Awareness Day, and the Sock It 2 ME/CFS project is officially launched. Hoping to do for ME/CFS sufferers, research budgets and families what the AIDS quilt did for HIV, the “sock project” has the potential to open a lot of eyes.

Some quick info from the site:

What is Myalgic Encephalomyelitis/Chronic Fatigue Syndrome?

Myalgic Encephalomyelitis, or Chronic Fatigue Syndrome, as it’s known in the US, is a debilitating disease which has been classified by the World Health Organization (WHO) as an organic, infectious neuro-immune disorder since 1969. It can occur in both epidemic and sporadic forms; over 60 outbreaks of ME/CFS have been recorded worldwide since 1934.

ME/CFS …

causes more functional impairment than diabetes, heart failure or kidney disease.

creates a level of disability comparable to MS, chemotherapy or the final stages of AIDS.

strikes an estimated 17 to 20 million worldwide, impairing function and shortening lives.

like AIDS in the early days, gets inadequate funding due to widespread misunderstanding.

has only recently gained notice in blood banks internationally as an infectious disease concern.

Apache

Apache and PHP HTTP PUT Voodoo

While trying to work out the details for a PHP REST utility I kept running into a wall when it came to using HTTP PUT (and HTTP DELETE) with Apache 2.2 and PHP 5. There are plenty of scattered tidbits of information relating to this on forums about the web, many old, and many more incomplete or even unhelpful. [As a side note: if someone on a forum you frequent is asking for help with getting HTTP PUT to work in Apache, telling them “Don’t use PUT it lets the hax0rs put files on your server! N00b! Use POST LOL!!11!” is not helping, nor does it make you look intelligent.]

The first hint I came across was putting Script PUT put.php in your httpd.conf in the <Directory> section. (That is, of course, assuming that your script for handling PUT requests is called put.php.)

I tried that and on restarting Apache got the error “Invalid command ‘Script’, perhaps misspelled or defined by a module not included in the server configuration” – which lead to a short bit of research (thanks Google!) that pointed out that the Script directive requires mod_actions be enabled in Apache. I did that and then tried to hit my script with a PUT request, to which I got a 405 error: “The requested method PUT is not allowed for the URL /test/put.php”.

Well, that was certainly strange, so I added <Limit> and <LimitExcept> blocks to my <Directory> section, but to no avail. So I changed the <Directory> directive from <Directory /var/www/test> to <Directory /var/www/test/put.php>. It looked strange, but what the heck, worth a try. I could now do PUT requests, but only as long as the url was /test/put.php, and that is not what is wanted when putting together a RESTful application. Trying to do anything useful, like a PUT to /test/put.php/users/ resulted in more 405 errors, now saying “The requested method PUT is not allowed for the URL /test/put.php/users/”.

So, back to the httpd.conf to change the <Directory> back to the previous. And then on to the other method I saw in a few places, using mod_rewrite to forward PUT (and DELETE) requests to the script. Of course, everywhere I saw this listed it was claimed that this alone (without the Script directive) was enough to enable PUT. So, I commented out the Script directive and added some mod_rewrite statements to the .htaccess file (which is always preferable in development as you can make changes on the fly without reloading or restarting the server.) So I added a RewriteCond %{REQUEST_METHOD} (PUT|DELETE) and a RewriteRule .* put.php.

And, I went back to test it again and, big surprise, got a 405 error again. Now, even when pointing directly at /test/put.php I got a 405 error. So, I decided to try combining the two. I uncommented the lines in the httpd.conf and bumped the server and was pleasantly surprised that PUT (and DELETE) requests to the /test/ directory were properly handled by the script. Now I could do something useful, like add another mod_rewrite rule to send all traffic for /api/ to the /test/put.php and call /api/users/ with a PUT (or DELETE) request and it was properly handled!

So, putting it all together:

In Apache: enable mod_actions and mod_rewrite. In Gentoo: make sure the lines

LoadModule actions_module modules/mod_actions.so

and

LoadModule rewrite_module modules/mod_rewrite.so

in httpd.conf are not commented out. In Debian the commands

a2enmod actions

and

a2enmod rewrite

do the trick.

In the httpd.conf add the following:

<Directory /var/www/test>
    <Limit GET POST PUT DELETE HEAD OPTIONS>
        Order allow,deny
        # You might want something a little more secure here, this is a dev setup
        Allow from all
    </Limit>
    <LimitExcept GET POST PUT DELETE HEAD OPTIONS>
        Order deny,allow
        Deny from all
    </LimitExcept>
    Script PUT /var/www/test/put.php
    Script DELETE /var/www/test/put.php
</Directory>

And finally, in the .htaccess add the rewrite voodoo:

RewriteEngine On
RewriteBase /test
RewriteRule ^/?(api)/? put.php [NC]
RewriteCond %{REQUEST_METHOD} (PUT|DELETE)
RewriteRule .* put.php

Hopefully this works as well for you as it did for me. Now to get back to business of actually writing the code to deal with the request and dispatch it appropriately (which may be a post for another day, or you can have a look at how some others have done it.)

By the way, for testing I have found the Firefox plugin Poster to be immensely useful, as well as the Java based RESTClient.

Home

I need a break …

I don’t usually talk much about my day-to-day life here, but that doesn’t mean I never do. This is one of those times. If you just want more tech talk check out the end of this post. The rest is all me whinging anyway. ;)

I need a break. A real break. I mean, I am technically on a break right now from school but it doesn’t really feel that way. I finished out my first year of school a couple weeks ago (26 years after graduating from high school, no less) and I thought “wow, I have an entire month that I can use to rest, catch up on some personal stuff, maybe clean out the garage ….” Unfortunately it is not turning out that way.

Instead I am writing this at 5:54 in the morning as this has been the first chance I have had to pay any attention at all to the blog. So what has been keeping me busy? Well, first, there is work. I did use a little of what would have been study time to modify the script I use to generate weekly work reports from Trac so that it now shows the amount of change for hours on each ticket (which is set in a “custom” field). And holy cow, I put in 58.5 hours last week. At least 8 of that doesn’t really count, though. I messed my back up and spent some time trying to work while under the influence of cyclobenzaprine which means that I wrote, scrapped and rewrote one class method at least 6 times before finally giving up. (Programming and drugs that make you stupid don’t mix!)

Aside from work I have been putting some time into a project for a non-profit that is kicking off on May 12th. I’m not allowed to say too much about it ahead of launch, but I can say that it is about raising awareness about ME/CFS and how badly it has been mismanaged and patients marginalized for the past 25 years.

Finally, I upgraded the WordPress plugin Shorten2Ping which I will continue to pimp as long as it keeps working so well. Of course I like my post tweets to have some hashtag love, so I do a little editing of the shorten2ping.php.

Here is a diff:

--- shorten2ping/shorten2ping.php       2010-04-12 10:22:34.000000000 -0700
+++ shorten2ping.mine/shorten2ping.php  2010-04-19 06:47:58.000000000 -0700
@@ -119,6 +119,15 @@
     $post_url = get_permalink($post_id);
     $post_title = strip_tags($post->post_title);

+               // add some tag bits here
+               $tags = wp_get_post_tags($post_id);
+               $my_tag_list = '';
+               if (is_array($tags)) {
+                       foreach ($tags as $j=>$tag) {
+                               $my_tag_list .= '#'.$tag->slug.' ';
+                       }
+               }
+
     $short_url_exists = get_post_meta($post_id, 'short_url', true);

              if (empty($short_url_exists)) {
@@ -205,9 +214,19 @@

             //get message from settings and process title and link
             $message = $s2p_options['message'];
+                                               $message_bare_char_count = strlen(str_replace(array('[title]','[link]','[tags]'), '', $message));
+                                               $title_count = strlen($post_title);
+                                               $link_count = strlen($short_url);
+                                               $tag_count = strlen($my_tag_list);
+                                               $over = $message_bare_count + $title_count + $link_count + $tag_count - 140;
+                                               if ($over > 0 && $over <= $post_title/2) {
+                                                       // if the overage is more than half the post title then skip it and let tags get truncated
+                                                       $post_title = substr($post_title, 0, $title_count - $over);
+                                               }
             $message = str_replace('[title]', $post_title, $message);
                              $message = str_replace('[link]', $short_url, $message);
-
+                                               $message = str_replace('[tags]', $my_tag_list, $message);
+
             if ($s2p_options['ping_service'] == 'pingfm'){

                send_pingfm($pingfm_user_key,$post_id,$message);

(You can download the diff as well.)

Ooops! Draft Saved at 6:23:27 am. And it is now 8:10, and this would still be a draft if I wasn’t closing browser tabs.

Gentoo

Gentoo emerge conflicts: SQLite and dev-perl/DBD-SQLite

I was having issues with my regular update schedule on my Gentoo server where I kept getting the following message:
('ebuild', '/', 'dev-db/sqlite-3.6.22-r2', 'merge') conflicts with =dev-db/sqlite-3.6.22[extensions] required by ('installed', '/', 'dev-perl/DBD-SQLite-1.29-r2', 'nomerge')

Since I use SQLite fairly regularly and I like to keep it up to date I figured I would focus on getting that updated, then worry about the Perl SQLite. (Had I known that spamassassin relies on the Perl SQLite I may have been a little more hesitant, but it all worked out ok anyway.)

Here is how I managed to update both SQLite and the Perl SQLite. I first unmerged dev-perl/DBD-SQLite with:
emerge --unmerge dev-perl/DBD-SQLite

I then updated SQLite with:
emerge -u sqlite

Which changed the USE settings to “-extensions” which meant that when I tried to emerge DBD-SQLite it failed due to the missing USE requirements. So I took a stab at it and did:
USE="extensions" emerge sqlite
Which built cleanly without any problems, and after which a quick
emerge dev-perl/DBD-SQLite worked great.

So, in a quick and easy cut and paste format the work-around is:
emerge --unmerge DBD-SQLite emerge -u sqlite USE="extensions" emerge sqlite emerge DBD-SQLite

Why the work-around is required I don’t know at the moment as I don’t have the time to dig through the ebuild files and figure out where the issue is, although I am sure if I had waited a bit updated ebuild files will come down the pipeline to correct the issue. (Patience is a virtue, but I have never been all that virtuous.)

Development

Comparing PHP array_shift to array_pop

I noticed a note in the PHP documentation about speed differences between array_shift() (pulling the first element off the array) and array_reverse() followed by array_pop() (resulting in the same data, but got to by pulling the last element off the array).

Since I was working on some code to convert URL pieces to program arguments (like turning /admin/users/1/edit into section=admin, module=users, id=1, action=edit – stuff we tend to do every day) I thought I would take a look at the speed differences since I have always used array_shift() for this (after turning the string into an array via explode()).

My initial tests showed that array_shift was much faster than array_reverse followed by array_pop, and I wondered why someone would say that in the first place. But then I thought about it for a bit. When using array_shift the entire remaining array has to be re-indexed every call. For a very short array (like the one I was using) this is negligible. When you start looking at much larger arrays, however, this overhead adds up quickly.

To find out roughly where the break-even point on these two methods lie I whipped up a quick script to run with arrays sized from 10^1 values up to 10^5 values. What I found is that at less than 100 values you are not really gaining much (if anything) by using array_reverse and array_pop versus array_shift. Once you get to the 1000 value array size, however, the differences really add up (as you can see in the logarithmic scaling of the chart below).

The code I used to generate the numbers (which are shown in the chart as averages over 3 runs, rounded to the nearest millionth of a second) is:

<?php
$counts = array(10,100,1000,10000,100000);
foreach ($counts as $len)
{
	$m2 = $m1 = array();
	$x = 1;
	while ($x <= $len)
	{
		$m2[] = $m1[] = $x;
		$x++;
	}
	echo "Timing with array_shift() for $len items\n";
	echo "000000";
	$s1 = microtime(true);
	while (!empty($m1))
	{
		$tmp = array_shift($m1);
		if ($tmp % 10 == 0)
		{
			echo chr(8),chr(8),chr(8),chr(8),chr(8),chr(8);
			echo str_pad(''.$tmp,6,'0',STR_PAD_LEFT);
		}
	}
	$s2 = microtime(true);
	echo "\nTook ",$s2 - $s1," seconds\n";
	
	echo "Timing with array_reverse and array_pop() for $len items\n";
	$s1 = microtime(true);
	$m2 = array_reverse($m2);
	while (!empty($m2))
	{
		$tmp = array_pop($m2);
		if ($tmp % 10 == 0)
		{
			echo chr(8),chr(8),chr(8),chr(8),chr(8),chr(8);
			echo str_pad(''.$tmp,6,'0',STR_PAD_LEFT);
		}
	}
	$s2 = microtime(true);
	echo "\nTook ",$s2 - $s1," seconds\n";
	echo "\n";
}
?>

Internet

Cisco search patent: my concerns

An article yesterday at bnet.com about Cisco’s patent filing for search has me concerned. Instead of relying on crawling links (and obeying robots.txt) like current search engines do (or at least should), Cisco’s idea is to look into packets at the network level and pull apart network traffic to discover HTTP requests. While that may not sound so terrible, I can see a need to change the way I do some business.

I often have development work, intended for collaboration with clients that is wholly not discoverable via web crawling. It is not that there are any great secrets there (unless the client is particular about not letting anyone know what their new site will look like before it goes live) but it is not meant to be permanent, either. This means that unless you know the full URL to the documents in question you are not likely to find them. These URLs are emailed to the client so they can click on the link in their email and let me know which parts of the app work the way they want, what doesn’t work, UI changes they would like to make, etc. With the standard web-crawlers these pages will never show up in a search listing.

If a layer three network device is picking those URLs out of traffic it is passing, however, those pages might be indexed, and once indexed, added to search. Now, a week later, when the directory x79q3_zz_rev2 is trashed, there are indexed searches pointing at what will return nothing but 404. Not good for me, not good for the client and not good for the individual doing the search.

My second concern is one of bandwidth. Yes, I know, there is lots of bandwidth and “everybody is on broadband these days anyway” (I don’t know how many times I hear that). Be that as it may, the “everybody” that is on broadband is not actually everybody, and anything that adds more delay to packet routing only makes the situation worse. And what happens when user A sends a request through their ISP to get an HTTP resource? How many hops does it cross? And how many of those will be running Cisco devices? (Hint: most). How many of those Cisco devices are going to do introspection on that packet to pull out the URL? How long does that take? Now consider how many HTTP requests your browser actually makes when downloading a web page. The page itself, linked CSS files, linked JS and any images (and let’s please not even consider AJAX requests).

While the idea is novel, I don’t think it is a good idea, and I would actually hope that Cisco gets the patent and sits on it and uses it merely to bludgeon anyone who actually tries to do this.

OS X

Custom Parallels VM icons

I run a lot of VMs in Parallels. (Currently I am running 7, although not all at once, of course.) I end up with a bunch of red generic Parallels VM alias icons on my desktop. Which means that the usual visual quick clues (color, logos, etc) aren’t there and I have to look at the text underneath. Sometimes I am in a rush and start Windows Server 2008 instead of Windows 7 Pro, or Ubuntu Linux instead of Debian Linux (one is set up as a desktop and one as a server with no X).

I really wanted some custom icons for those VMs. My solution, (as usual) when it doesn’t exist make it. So, I opened pvs.icns (contained in the Parallels Desktop.app bundle /Applications/Parallels Desktop.app/Contents/Resources/pvs.icns) in Icon Composer.app, selected the 512 x 512 version and copied it to the clipboard. I then pasted that into a new Photoshop document and began editing. I saved each new version as a 512 x 512 pixel png and then dropped them in img2icns.app which converted them to the icns files I needed to customize my VM launchers.

Behold the glory:

They aren’t perfect, especially the Windows Server 2008, but they are different enough that it is easy to select the right VM in a heartbeat.

You can download the icns files from http://www.evardsson.com/files/parallels_icons.zip

Parallels

Try out Chrome OS in a VM – even Parallels!

If you have been curious about trying out Google’s Chrome OS (or Chromium OS – they seem to call it both) there is a VMWare image available for download at gdgt.com. You will need to set up an account there if you don’t already have one, but it is painless. The VM image is zipped to around 300MB so downloading is not painful at all.

If you are using VMWare Player or VirtualBox or VMWare Fusion (on Mac) there is nothing you need to do but open it up and go. If you are using Parallels, however, there are a couple steps to take.

First you need to convert the vmdk to a raw disk image. To do this you will need to get Qemu (actually, qemu-img, a utility that comes with Qemu.) If you are on a Mac (as most Parallels users are) you can download and install Q, which is a very nice OS X port of Qemu with a GUI (which we won’t be using for this exercise).

The command to convert the disk image is slightly different if you are using the default Qemu package or the one provided with Q. If you are using the default the following should work (assuming your install of Qemu is in /usr/bin/):

/usr/bin/qemu-img convert chrome-os-0.4.22.8-gdgt.vmdk -O raw chrome.hdd

If you are using Q, the version of qemu-img that is included does not quite handle the command line switches correctly. Luckily, it defaults to raw image output. The command if you have Q installed should look like:

/Applications/Q.app/Contents/MacOS/qemu-img convert chrome-os-0.4.22.8-gdgt.vmdk chrome.hdd

Now, start up Parallels, and add a new VM. For type, set it to Other Linux and when it asks whether to create a new disk image or use an existing one tell it to use the disk image you just created.

Start the VM and enjoy(?) the browser as OS experience. Oh, and the login credentials? Your Google account.

Best Practices

Daylight Saving Time Headaches

I have never been particularly fond of the concept of Daylight Saving Time (cutting one off of a blanket and sewing to the other end does not make a longer blanket.) This time around, though, I ran into an issue involving the perfect combination of a monthly cron job, a server set to local time and the switch from Daylight Saving to Standard Time on the first of the month.

At precisely 1:14 am on the first day of the month the cron job ran, as it does the first day of every month, and picked a raffle winner for one of our client’s monthly contests. At 2:00 am the time on the server rolled back to 1:00 am in accordance with the switch to Standard Time for the US. Fourteen minutes later the job ran again, and picked another winner.

Whoops. Now our system has awarded two people a single prize. Telling the second one to get the prize that they didn’t really win would not get us any points with the client, as their customer would be upset. Likewise, charging the client for the second prize is a non-starter, as it is, in fact, our fault. When I inherited these systems I looked through all the cron jobs to get a feel for what the system is doing and when. What didn’t occur to me, however, was that jobs scheduled at the wrong time of day could fall victim to Daylight Saving/Standard Time change-overs.

Any daily job that runs between 1:00 am and 2:00 am will fail to run once a year (Standard -> Daylight Saving when clocks jump ahead an hour) and will run twice once a year (Daylight Saving -> Standard Time when clocks fall back from 2:00 am to 1:00 am). Weekly jobs that run between 1:00 am and 2:00 am on Sundays will likewise misbehave, while monthly jobs, regardless of day of the month, have a small chance of experiencing one of these issues. In this case, the job runs on the 1st, which happened to be the first Sunday in November, and bang: error.

Needless to say, we modified all the cron jobs to ensure than none of them start between 1:00 am and 2:00 am.

evardsson.com: stuff that w0rks

What is Myalgic Encephalomyelitis/Chronic Fatigue Syndrome?

ME/CFS …