December 2009 – evardsson.com: stuff that w0rks

An article yesterday at bnet.com about Cisco’s patent filing for search has me concerned. Instead of relying on crawling links (and obeying robots.txt) like current search engines do (or at least should), Cisco’s idea is to look into packets at the network level and pull apart network traffic to discover HTTP requests. While that may not sound so terrible, I can see a need to change the way I do some business.

I often have development work, intended for collaboration with clients that is wholly not discoverable via web crawling. It is not that there are any great secrets there (unless the client is particular about not letting anyone know what their new site will look like before it goes live) but it is not meant to be permanent, either. This means that unless you know the full URL to the documents in question you are not likely to find them. These URLs are emailed to the client so they can click on the link in their email and let me know which parts of the app work the way they want, what doesn’t work, UI changes they would like to make, etc. With the standard web-crawlers these pages will never show up in a search listing.

If a layer three network device is picking those URLs out of traffic it is passing, however, those pages might be indexed, and once indexed, added to search. Now, a week later, when the directory x79q3_zz_rev2 is trashed, there are indexed searches pointing at what will return nothing but 404. Not good for me, not good for the client and not good for the individual doing the search.

My second concern is one of bandwidth. Yes, I know, there is lots of bandwidth and “everybody is on broadband these days anyway” (I don’t know how many times I hear that). Be that as it may, the “everybody” that is on broadband is not actually everybody, and anything that adds more delay to packet routing only makes the situation worse. And what happens when user A sends a request through their ISP to get an HTTP resource? How many hops does it cross? And how many of those will be running Cisco devices? (Hint: most). How many of those Cisco devices are going to do introspection on that packet to pull out the URL? How long does that take? Now consider how many HTTP requests your browser actually makes when downloading a web page. The page itself, linked CSS files, linked JS and any images (and let’s please not even consider AJAX requests).

While the idea is novel, I don’t think it is a good idea, and I would actually hope that Cisco gets the patent and sits on it and uses it merely to bludgeon anyone who actually tries to do this.

evardsson.com: stuff that w0rks

Month: December 2009

Cisco search patent: my concerns