Archive

Archive for the ‘Tech’ Category

Spending Dinner at Work

April 23rd, 2008

Alley Insider does some back-of-the-napkin math and guesses that Google’s food budget is approximately $7.5k yearly per Googler. This is an interesting calculation to me, even if it’s a wild guess.

Google is one of the only places i’ve eaten at where the cafeteria at dinnertime literally feels busy. The difference in atmosphere between there and its competitors is astonishing. If you have a stint working as a programmer in Silicon Valley, do your best to visit other campuses and see how the culture and environments feel different from company to company. I do not believe that I have ever seen a dinner gathering anywhere near the magnitude of Google’s daily Mountain View cafeterias within a large tech company.

From anecdotal accounts i’ve received, it is not just that Googlers eat, then leave for home. Frequently, they’re eating as a short respite from a long workday, and go back to work, sometimes after meeting with family for dinner. I would fathom a guess that no other established, large tech company gets away with such long work hours from any of its employees as Google.

I don’t even think you’d need to invest in free meals around the clock to get this kind of behavior, either. You could still charge for lunch, but offer a free early (pre-8 or 9am) and late (post-7pm) dinner service in order to encourage people to stay past the regular hours, and still defray some of your lunchtime costs. If you even got a 15% boost in working, productive hours from your employees, I surely think that would make up for the handful of thousands of dollars worth of food you’re sending their way.

Tech

Ethics of Data Transformation and Republishing

April 15th, 2008

After the reaction of a few friends to my last post, I wanted to put up a separate post specifically about my current opinions regarding the ethics of taking public data, doing a lot of cleanup, then potentially charging for commercial use or download of the new (transformed) dataset. My planned project working with EPA data may not be the last time I do something related to data transformation, so i’m trying to understand the issues here.

Daniel Raffel pointed me to some pertinent info about WestLaw, one of the most well-known providers of information that originates in the public domain. WestLaw takes public legal information, incorporates it into its datastores, then charges a fee for legal professionals to use it to perform legal research. It provides proprietary interfaces, and also made several features that have apparently become indispensable to the legal profession, including a proprietary key-oriented classification of legal data.

Recently, Carl Malamud, who works for Resource.Public.Org, began a project with the aim of making all the primary sources of that legal data available on the Internet. It’s clear from his letter that he believes that making primary source data publicly available does not compete directly with the services and tools that WestLaw provides. However, it appears that WestLaw’s summary publication literature, such as the Federal Reporter, may be the only available published information from the primary public domain data.

Carl is essentially saying that since these summary documents may constitute information derived from data in the public domain, he will be attempting to extract the public domain data from the documents commercially produced on behalf of the government by WestLaw. I am not an information expert about this sort of thing, but it seems to me that a reasonable person would believe that if the original public domain data is ONLY made available in any useful form to a commercial vendor who then transforms it into literature, then the data which is public domain should not be covered under copyright for that commercial vendor. How you go about extracting that data is more of a fuzzy area, but presumably if an effort can be shown that bounds were respected, I think a reasonable person would say that reverse engineering is okay. Carl’s letter to WestLaw carries this type of reasoning down its natural path, and even suggests to them that they save everyone some time and just release the entire text of their publications, free to download.

Anyone who spends any time around primary source data knows that not all data is created equal. If your intent is to provide tools and services around data, there is a good amount of time and effort that must go into transforming primary source data into a useful format for some specific purpose. I believe that an appropriate action on West Publishing’s part is to go ahead and publish all historical cases in text format (not the Federal Reporter, etc., itself), and let anyone else who wishes to transform that data into a useful format go ahead with their project. This wouldn’t exactly satisfy Carl’s request, but it would meet the standard suggested earlier that the primary source data be available in at least one form.

It’s important to note that I’m not arguing that the final products (the Federal Reporter, etc.) need to be put into the public domain by WestLaw. By the standard suggested above, if the data’s available for the public domain, I don’t believe there is a strong ethical basis for compelling a commercial venture to take a risk and completely release all of their products for free. This would preserve some of the economies of scale of data manipulation, while righting the “wrong” that public domain source data is not available at all to the public. Put more simply, if a competitor wished to create a similar research product to WestLaw, the cost of transforming the data into a useful information repository with competitive features would still remain. Any proprietary content inside its publications still remains non-free, and presumably anyone who would buy them for their convenience would still do so. Carl’s threat is that he does have a very strong point if the only option to get the original primary data is through extraction from their commercial resource. If the decision makers at WestLaw decide to completely oppose his reverse-engineering, I think it would be a very politically difficult decision to defend, and could cause the government to step in and make the boundaries between public domain and private very clear. As it’s to WestLaw’s advantage to keep those boundaries murky, a compromise of providing just the case data in a text format seems the best solution for their interests.

Now, as to how this pertains to the situation i’ll be going into with regards to the EPA emissions datasets, all of that data is available via their website as more-or-less large CSV downloads. Each year has a somewhat different format. From what I recall from my earlier work on it, it’s kind of a pain to go through and clean up that source data, and it requires some knowledge about automotive industry emissions standards nationwide. Still, the original information is visible, even if it’s in a format that needs some work.

What i’m in the middle of doing is the standard drill - analyze the datasets, design a fairly acceptable standard schema to use as a blueprint for importing the data, then go set by set, programming transformations from the yearly data into the database. Then, an interface to perform queries can be created, as well as a set of useful services to offer on top of the transformed data. Long after all available data is imported, a maintainer might write new transformations yearly in order to keep the data current. This activity of doing work to transform data from a public domain resource into a different format is original work, and does take a lot of time and effort. Nearly all researchers deal with this sort of work on a regular basis.

What i’d like to suggest is that if data that is already in the public domain, and available on the internet is transformed into a version that is more useful to commercial ventures or professionals, that it is perfectly fine to charge a fee for access to tools or regular “dumps” of those transformed dataset. For one, the data’s already available, and it is not the original primary source data that is being offered for sale. The business product would be the combination of transforming the original data into a more useful form, and then offering either the transformed data directly, or simply services and tools on top of that data.

If anyone wanted to do that work, then re-open it up completely to the public domain, I believe that would be a gracious gesture, but I’m of the current opinion that it’s not ethically or morally necessary. Plenty of goodwill could be achieved by offering scholars, nonprofits, or individuals free access, and anyone who thought the cost is too high could attempt to achieve a lower cost by taking the original public domain sources and doing the work themselves.

That’s my current opinion about all this, but it does seem like there’s a lot of strong opinion out there, maybe not as long winded as me. Feel free to use the comments to let me know what you think.

Tech

Server Move Complete

April 11th, 2008

It was much easier than I guessed it would be to switch from my own dedicated host to a shared host. It’s really nice to have a competent admin taking care of business, and not having to worry about everything myself. Thanks to Jeremy Muhlich and the fdntech guys for a good machine.

I’ll probably keep everything on this server for now, and just put things on their own dedicated server if it’s really necessary.

I’m super happy to FINALLY be off of RHEL3 and it’s super old MySQL 3.23 builds. It was a monstrous pain having to deal with only having such a limited featureset, and it definitely hampered development somewhat.

Re: projects that are coming up soon - i’ve been easing back into productivity and have a few things i’ll be working on off the bat. One is cleanup of a dataset that I’ve been personally really interested in - the EPA emissions records for cars. They provide this data in the public domain, but the format is pretty messy and it requires figuring out how regional automotive regulations impact the cars available in various states. With some more elbow grease, i’ll probably be able to put this into a sensical format, and i’m definitely going to do some analysis myself once that’s done. I may also consider providing some public query access via a Google AppEngine test, as it seems like a perfect way to offload queries on an interesting dataset of public record, where I won’t have to incur high costs for expensive queries myself.

I’m curious in seeing if models exist where people took data from the public domain, cleaned it, then charged a fee for access instead of redistributing the data for free. It seems like a useful thing to be able to accomplish, and I think that there would potentially be commercial use of better data than public domain CSV’s whose format just gives you hints at potential normalization, and leaves all the work as an exercise for the reader. I’ll probably end up making the dataset free for nonprofit or personal use, as it came from the public domain anyway… actual queries i’ll probably have to rely on something like AppEngine, and commercial use i’ll probably ask for a fee. Any opinions?

Tech

Starting to tidy things up…

April 2nd, 2008

I constantly live in fear that somewhere in my email handling code lies a header / MIME injection vulnerability. I’ve been caught by them in the past, and each time I get snagged, it’s extremely unpleasant to deal with. Although there are published source solutions, I just never feel completely comfortable with them.

I switched my contact forms today so that no form input makes it into any mail that gets sent to me. It’ll ping me with a notification, and i’ll go check it.

It feels like a pretty bad solution, but i’ve noticed recently that more and more web applications are adopting this style. In public applications, one of the advantages is that one can send out notifications very quickly, but also have a short grace period to check and catch abusive spamming. Additionally, you don’t need to worry as much about email exploits that attempt to co-opt your web applications to run as open mail relays. It’s obviously not going to allow your users to deal with your software as an extension of the email inbox.

The pure psychological penalty of receiving an email notification from a trusted web application that contains spam may be somewhat worse than receiving a notification, then going to a website where the content is spam (where you have “report abuse” links, typically) and where the content may no longer exist (if post-moderation occurred).

I believe that this may be a necessary evil in any future web projects that I write. Either that, or all participants may need to pay a fee to enter.

Tech

How to have a smug laugh at the financial news.

March 31st, 2008

s/investor/gambler/g

Tech

Last few days at Yahoo…

March 13th, 2008

Yup, this is my goodbye post. My last day is tomorrow, Friday the 14th. How did I get here?

Eleven years ago, when I first got to USC, I sidled up to a gadget-lugging fellow headed in the same direction I was, and asked if he was also going to the honors engineering retreat. That’s how I met Leonard Lin, who still amazes me with flights of technological fancy that i’m sure to hear about two years from whenever he finds something interesting.

Then, about four years ago, Leonard Lin introduced me to Andy Baio. Andy was looking for a “perl programmer” to come to work for him at a mutual fund company in Santa Monica. Not just any perl programmer — one who would be able to wear a suit daily. Although I had another option that was seemed far more lucrative (talk about finding a local maxima!), Andy seemed to be growing a really groundbreaking and nerdy team, especially for a mutual fund company. Since one of my great priorities in young life was to get to know the financial world better, and also because I liked the looks of Andy’s team, I accepted that position. As it turns out, that’s one of the best decisions I’ve ever had the chance to make.

After getting called out by Jon Udell when he was at infoworld, Andy was insistent that he’d have Jon’s suggested changes incorporated into Upcoming (then Upcoming.org), his pet people-powered events calendar project, within a week. At that point, I had been practicing development of quick web apps and open source libraries for over a year, and we’d built a great working relationship. I offered to help integrate Freetag into Upcoming, and assist with the rest of the work as well. That’s how I got started working on Upcoming, and ever since then, i’ve had the fortune of working for one of the greatest online communities on the web.

So it’s with a tinge of sadness that I decided to leave Yahoo! a few weeks ago. It’s been a great run, and I can’t begin to enumerate the ways in which I’ve become a better developer, leader, and person through this experience. It’s just the right time for me to move on.

What about Upcoming?

If you actually know that I work on Upcoming (a small crowd indeed), you’ll also know that i’m the kind of person who likes to be responsible in my work. I’m happy to say that I’ve been passing on as much of my knowledge as possible, and that the new generation of Upcoming is looking strong. I have faith that they’ll continue to make decisions with respect for the existing community, and most likely will push out features faster than we had a chance to in the past year or so.

What about BravoNation?

This is a trickier one. It’s certainly an experimental project that I’m thrilled to have been able to take from Hack Day to private beta within the context of Brickhouse. It’s also an experiment whose future has not yet been completely decided. I leave a well-documented and fairly mature platform in Brickhouse’s hands, and I hope that those who have used it were intrigued by the idea of combining peer-to-peer recognition with an open network award platform for integration. The possibilities behind the core BravoNation idea are really nothing original; it’s simply the combination of concepts from the video gaming world and the social media web that arose from my experiences investigating the gaming side of SXSW 2007.

I’ve been working on something internal lately which should help see my work on BravoNation live on in a very helpful way. With a little luck, it will be a great legacy. If it sees the light of day, i’ll see if I can score an interview about it.

What’s next?

Well, first, I think I need a little break. I think i’ve never had a serious break - the last time I was out of work for a serious bit of time, I spent it building a collaboration tool for designers, whose needs I got to know intimately in a previous job. Actually, that whole tool was the entire reason why Andy decided to hire me in the first place.

So, it’s safe to say that i’ll be working on personal projects like that old one once again. I just need some time to refresh myself, and get out of my current mindset and get used to being on my own. In actuality, the freedom to work on new things is what I’m most excited about. I’ll probably be developing some toys, some tools, some more artistic abilities, and maybe even a few waffles. Maybe I will even blog more. In any case, if you’re ever in the Los Angeles area, look me up and i’ll be happy to show you around town.

I’d like to thank all the Yahoos who made my time at the big Y! enjoyable, productive, and fruitful. Stewart and Caterina, for introducing us to Y! Local. Paul Levine, for believing in us and giving us the freedom we needed. Vince Maniago, Neil Kandalgaonkar, Kelsey Parker, and Shawn Shen, for being a powerful but small team that helped us get stuff done. The new Upcoming generation, for carrying on the torch! Kevin Cheng, Ernie Hsiung, Nikhil Bobb, Ray McClure, Jeffery Bennett, Salim Ismail, and the rest of the Brickhouse team, for putting up with my neurotics that got BravoNation out the door. Edward Ho, for all the Mario Kart, car talk, and inspiration. Kevin Krawez, for being my navigator in the scary world of ops. Bradley Horowitz, for creating the environment inside Yahoo! that made us feel comfortable in the first place. Anand, Ronny, Van, Peng, Eric, Don, Ganesh, and all the Local folks for not killing us over noise problems. Chad Dickerson, for creating the Hack program, often imitated, never recreated. Tara Kirchner, for changing my mind about PR people. Eric Wu, for caring so much about Y!, and welcoming me to the valley. Legal and the paranoids, for saving us from ourselves. Daniel Raffel, for seeding knowledge I used to build BN. Jay Janssen, for all the crazy MySQL knowledge. A certain team inside Y! for powering so much of Upcoming and being my favorite technology. And, of course, Andy and Leonard, for being two of the most unique and interesting personalities i’ve had the fortune to know so well. Apologies if i’ve forgotten anyone, it’s just that the full list runs a mile long, and I fear that it’ll turn into the entire content of this blog post. Be certain that I won’t forget the way you helped make my experience at Yahoo! better.

Tech

Is the IE8 Standards Mode a Result of Politics?

March 4th, 2008

Peter Bright at ArsTechnica posted a story this morning about Microsoft’s about-face to make standards-compliant rendering the “default” mode of the new Internet Explorer 8. This is in contrast to its earlier position of attempting to render in a IE7-compliant mode by default, switching to “standards” mode only if the website opts-in.

While I don’t have anything really interesting to say about the technical decision, other than “hooray,” I did find that the latter half of the article tried to explain the change as a hive mind reacting protectively to an external political influence. My experience at a large company leads me to believe that this was not the case.

Microsoft is citing its new interoperability initiative as the impetus behind the change. This move, designed primarily to stave off further EU intervention, emphasizes support and promotion of open standards in a way that the company hasn’t previously done. This move should also help to fend off Opera’s antitrust complaint, which argues that the EU should force IE into better standards compliance.[...]
If the company honestly believed that its approach was, from a technical perspective, the best one—and the software giant certainly put quite some effort into designing and defending it—then it should be of some concern that politics should have caused it to switch. Don’t get me wrong—I’m glad that they’re going to make “standards” mode standard. I just wish they were doing so for the right reasons.

To me, this is a prejudice rooted in the way that the outside world prefers to think of large companies — giant, monolithic entities of a single mind. This occurs time and time again in journalism. The preference to characterize corporate behavior as if the organization were a single, albeit giant, individual is almost always inaccurate. It also does the reader a disservice, as people who follow tech then follow the journalist’s lead, and this becomes the common way to think about and understand what large companies do.

First of all, all large companies are comprised of individual actors. Each of them has their own goals, preferences, and opinions about the company’s decisions. Within Microsoft, undoubtedly there are proponents of interoperability and web standards. There are certainly prominent ones in the public eye, but I’d bet that there are many pockets of culture that live, breathe, and prefer open standards. For example: the engineering staff building IE8. These people must make their case for use of open standards to their management, and must consider the company and team’s objectives and stated goals when constructing arguments for spending dev time on open standards.

Here’s my guess as to what happened at Microsoft.

Imagine that you are an influential lead in a team of developers working on the world’s incumbent majority web browser, and you know that the work that you do impacts an enormous amount of people. However, the company that you work for has a history of prioritizing backwards compatibility, especially with prior work that involves the company’s own products and closed standards. In this environment, it is difficult to make the case for prioritizing the adherence to open standards over compliance with earlier products (in this case, IE7-style rendering). You understand that, and so you consent, perhaps against your better judgement or personal feelings, to go with backwards compatibility.

Suddenly, there’s a cultural shift that comes down from the top, stating that the company is now prioritizing interoperability and open standards. Undoubtedly, this is a strategic shift, and one can imagine that it developed as a result of some parts politics, some parts market environment, and some parts executive staff composition. Since you’re on the IE8 team, and you’ve always bought into making open standards the default mode for your browser for the good of the Internet, now’s your chance to really push your case.

You can suddenly frame your argument in the context of new, important, shiny corporate objectives. Your powerpoint decks make their way up the chain, as managers above consider how they might use the opportunity to prove their “leadership” in following new corporate objectives. This is how the management chain can now evaluate the decision in a completely different internal political light. All of them agree that the decision makes sense. The go-ahead is given.

The GM can then post a blog announcement on the IE8 blog that explains the technical details without a modicum of hype or fuss — but it is likely that many people within that team feel very validated at the moment.


So that’s my theory of what might have caused this. As you can see, I do think that there were most likely politics involved in the decision, but only partially in the way that Peter supposed. Most corporate decisions or changes have their start as external market forces bearing down on the executive staff, which then cause a shift in stated objectives or behavior. Whether these things are communicated to the employees internally (memos) or externally (through PR), the message is sent. Especially in a tech company, the individual workers and engineers now tend to rethink and reframe their decisions in light of the new information. So, although the chain of events frequently may have their basis in external political events and relationships, decisions eventually get made through internal politics and relationships.

That’s my understanding, and if you don’t believe me, feel free to go work at a large company for a couple years, and see how things really happen for yourself.

Tech

Andy Baio to do Daily Blogging; Disdain for the Echo Chamber

February 3rd, 2008

From Andy Baio’s latest entry:

Very few weblogs do any kind of original research on a daily basis. Most either spend their time repurposing (or just linking to) original research from mainstream media or other sources, or they do commentary and analysis. Their most important role is as information filters, distilling everything going on in the world relevant to their audience and presenting only the good stuff. That’s definitely valuable, but at the end of the day, have they created anything new?

Andy’s going to have a hard time doing original net journalism with so many competitors already in this space. I mean, daily blogging is good, but what you’d really want is at least two or three authors to keep the churn up. It’s going to be difficult for his little Oregon-based reporting startup to get his series A, especially with only a pure advertising business model.

I think Digg has shown that original content is not important; the most important thing is having a big network of friends all pointing to the same stuff as you, all day long. Sure, it means that we’re all little but glorified filters, but that’s how you make the big bucks nowadays.

Good luck on that, buddy! You’ll need it.

Tech

The Bravo Book

January 29th, 2008

Just a HUGE shout-out to Jeffery Bennett for helping us out and creating the cool Bravo Book. You can go get your own if you’re a Bravoteer. I’ll spend some time soon on a blog redesign so I can incorporate this in the base template.


Tech

Why call BravoNation a Rough Draft?

January 2nd, 2008

In response to some tongue-in-cheek criticism of polluting the “alpha/beta” project status namespace with yet another term, I’d like to explain.

My experience is that outside the valley, alpha/beta never makes sense to non-developers. One could even argue that it’s been misused, as many products have been released as “beta” even though they are hardly feature-complete. I believe that the original permanent beta tags on Flickr were somewhat of an in-joke (which the step up to gamma sort of shows). On the other hand, pretty much everyone in the states has gone through phased writing exercises, where a rough draft might touch on all the fundamentals of an argument in an unpolished, rough, and sometimes erroneous way. The later drafts reflect a more polished work, but final drafts of high quality imply a high level of editing.

This is much more close to what we are trying to experiment with at BravoNation than alpha/beta. We are also targeting not only developers or people from the Bay Area, so we are also more comfortable with reinventing the idiosyncrasies of the Valley when they don’t work for us. Our goal with the Rough Draft label is to build the expectation among non-technical AND technical audiences that we will be editing, modifying, and re-architecting the entire thing according to the usage that we see. This is the type of project which could gain strong momentum early on in its life, but needs to have a chance to grow quickly while making small mistakes before hitting a huge, mainstream usership.

Anyway, it’s important to note that this was, in major ways, our project. Yahoo! believes in our team to the point that we were given a high degree of latitude in the way we’ve developed, launched, and branded BravoNation, while supporting us and getting us prepared for the types of issues that we’ll need to deal with as a Yahoo! project. So when we call it a Rough Draft, that’s the 4 of us in the main project team, not the mothership, making that call.

Tech