It was much easier than I guessed it would be to switch from my own dedicated host to a shared host. It’s really nice to have a competent admin taking care of business, and not having to worry about everything myself. Thanks to Jeremy Muhlich and the fdntech guys for a good machine.
I’ll probably keep everything on this server for now, and just put things on their own dedicated server if it’s really necessary.
I’m super happy to FINALLY be off of RHEL3 and it’s super old MySQL 3.23 builds. It was a monstrous pain having to deal with only having such a limited featureset, and it definitely hampered development somewhat.
Re: projects that are coming up soon – i’ve been easing back into productivity and have a few things i’ll be working on off the bat. One is cleanup of a dataset that I’ve been personally really interested in – the EPA emissions records for cars. They provide this data in the public domain, but the format is pretty messy and it requires figuring out how regional automotive regulations impact the cars available in various states. With some more elbow grease, i’ll probably be able to put this into a sensical format, and i’m definitely going to do some analysis myself once that’s done. I may also consider providing some public query access via a Google AppEngine test, as it seems like a perfect way to offload queries on an interesting dataset of public record, where I won’t have to incur high costs for expensive queries myself.
I’m curious in seeing if models exist where people took data from the public domain, cleaned it, then charged a fee for access instead of redistributing the data for free. It seems like a useful thing to be able to accomplish, and I think that there would potentially be commercial use of better data than public domain CSV’s whose format just gives you hints at potential normalization, and leaves all the work as an exercise for the reader. I’ll probably end up making the dataset free for nonprofit or personal use, as it came from the public domain anyway… actual queries i’ll probably have to rely on something like AppEngine, and commercial use i’ll probably ask for a fee. Any opinions?
fdntech — hey, you on soman too? Welcome to the neighbourhood!
Two thoughts:
My brother works for the EPA and might have some thoughts for you on the car emissions data (I don’t think he works with that specifically, but he does deal with some air quality data)
Strikeiron provides some interesting data sources via their web services “super data pack.” I’ve never used it to be honest, but I would guess some of that data comes from the public domain:
http://www.strikeiron.com/ProductDetail.aspx?p=257
This is def big business Gordon. WestLaw and Lexis do this all the time. In fact WestLaw was one of the pioneers of this practice by taking public court cases, adding a bit of taxonomy and headnotes, and then selling them. Without doubt WestLaw definitely added value to the original dataset but some folks feel pretty strongly about this not being acceptable because the data was originally in the public domain. You might read this for some other leads: http://radar.oreilly.com/archives/2007/08/carl-malamud-takes-on-westlaw.html Look forward to hearing more about your project!
Your blog looks eerily like mine. ;]
Thanks for the comments – i’ve written a whole new post about the data republishing issues mentioned.
Justin: Thanks!
Roxan: Hah, yeah, i’m a biter, what can I say.