Well, how did they get here?

Here are some of the searches that led people to this blog since I started paying attention to that.

  1. Pigurines (#1!)
  2. medieval terms of endearment for children
  3. “mail order alien bride”
  4. what kind of chickens have a afro
  5. innsmouth community college¹
  6. pictures of philippine contemporary literature
  7. philadelphia phillies sex toy
  8. sexy german ladies
  9. beelzebub hunks
  10. selling smoke damaged furniture²

That’s all part of our world tonight.

¹Go Sea Devils!
²Change your life, change into a nine year old Hindu boy, get rid of your wife.

¡Huevos!

There’s a sweet little desktop app for OSX called Huevos.

It’s tiny and free, and it’s a search helper. You pick a search site, type in your search, and your browser of choice fires up and searches. I recommend it!

You can drop in your own searches, so I took out the ones I didn’t need and put in somei new ones. In case anyone is interested, my new ones were:

Amazon: http://www.amazon.com/s?index=blended&field-keywords=%@

Blinkx: http://www.blinkx.com/videos/%@

Google Video: http://video.google.com/videosearch?q=%@&sitesearch=

IMDB: http://imdb.com/find?s=all&q=%@&x=0&y=0

Powells: http://www.powells.com/s?kw=%@&x=0&y=0

top search results for my website this month

unicode sliderule
octopus people
mopportunity
determined bush
who invented the zodiac signs and how long have they been around?

Okay, the unicode sliderule is something I put up for torgo_x. And the mopportunity is from a Leisuretown caption. I’m worrierd about the Octopus People and the Determined Bush. I think the Manimals know I’m on to them and don’t want me to tell the President about the danger.

LJ, blog searches, datamining

Google’s new blog search is pretty nifty if you either like searching through people’s weblogs or are an egotist who likes to kiboze. I’m both. Since I’ve always been a shameless self-promoter and I ping all available services, index myself in search engines etc. this is just peachy.

The way LJ did it was to provide a large-scale XML data feed of Livejournal and Typepad blogs. The feed is explicitly intended for use by larger organizations who want to resyndicate or index this huge quantity of data. It’s not usable by end users; it’s an institutional service.

This is great if you’re Google, or AOL, or an MIT grad student doing a thesis on weblogging. However, if you’re an LJ user who checked the “please do not let search engines index me” button, it may be an unwelcome surprise. People who assumed a level of public presence that included friends and internet acquaintances, but not every coworker or family member who Googled them, have now discovered that the verb “to Google” now includes a well-indexed stream of all their public entries since March.

I had a frustrating conversation about this with mendel yesterday (sorry I got ruffled there, Rich) in which I think we were both right about different things. He quite rightly pointed out that public LJ entries were subject to data mining and indexing in a number of ways already, and that the check box for blocking robots did not imply privacy to someone who understands the current state of of the Internet. Certainly my personal expectation is that anything I post, even with the lock on it, could conceivably end up as the lead story on CNN, and I proceed with that risk in mind.

And of course many of the complaints received by Six Apart about this will be from people who are misinformed about technology or the law in various countries or any number of complicated issues. I actually have no idea what U.S. law would say about what a customer can reasonably expect in this situation, and since the technologies involved about about fifteen minutes old, it may be unknown anyway.

My concern was different. Providing a massive datastream only useful to large-scale operations is qualitatively different than allowing spidering, even. Marketers, credit agencies, insurance companies, and government agencies now have an optimized tool for data mining a huge chunk of weblogs. The amount of effort required to monitor and index all of LJ and Typepad just deflated tremendously.

I am reminded, for example, of FedEx providing a stream of their tracking information to the U.S. Department of Homeland Security, or of the supermarket loyalty card information being informally turned over to the government right after 9/11/01. A recent event I posted about in which auto repair records from dealers were aggregated and sold to Carfax comes to mind. I have been told by people in the email appliance business that spammers derive a good chunk of income these days by selling verified email addresses with names attached to insurers and credit reporting agencies as additional identifying information for their records (“appends”).

In short, Database Nation (Amazon link). To my mind these changes are inevitable, irresistible, and both exciting and frightening for different reasons.

But I also think that Six Apart failed their customers, at least in the customer satisfaction/PR department, by not providing a pre-launch opt-out or removing customers who checked that box from their institutional feed.