Archive

Archive for November, 2004

Internationalization in Perl

November 30th, 2004

http://aut.dyndns.org//webl10n/webl10n.html

Article by the author of Locale::Maketext::Lexicon. I was wondering about this, and this one article looks pretty comprehensive and practical.

Uncategorized

Decent Explanations of mod_perl / Mason sessions

November 23rd, 2004

Because I think the Mason Book’s explanation of sessions is pretty crufty, here are some descriptions that are, in my opinion, better explanations.

http://www.lerner.co.il/atf/atf_76 – Looks like plone! – doh. listings don’t work. explanation is still pretty good. man Apache::Session

Uncategorized

The Black Cards

November 17th, 2004

I’d heard rumors of superyuppies carrying around exclusive ‘black’ credit cards. At first, I thought these cards would carry a permanent 2% apr in the hope of capturing very expensive impulse purchase habits, but apparently, I was wrong.

http://money.msn.co.uk/Bank_Plan/Cards/SpecFeat/TakeTheCredit/Update1/default.asp The Amex Centurion epinions review The Amex Centurion Urban Legends Page (What’s the opposite of debunk?) The Beyond Black from Quintessentially.

I can see why heavy travelers can make use of free traveler’s upgrades, but honestly, is it a race to see who can have the credit card with the highest fees? Admittedly, it might be fun to pay for a Bentley on a piece of plastic… but to me, these are clearly not the realm of the self-made millionaire.

Uncategorized

Did Clinton Gut the Military?

November 13th, 2004

I’m kind of sick of hearing this from various sources, so I went to do some research into government-supplied budgetary data. I threw it into some Excel and ran some charts. Have a look for yourself. I plan on adding some more research there to investigate the various claims I hear from time to time.

One thing I did learn from this process is that budgeting and policy is pretty complex stuff. You might think that our leaders might have to back up their claims with better data than a simple dollar figure spent on National Defense. You might also think that a single number wouldn’t fully disclose changing natures of defense and military structure.

It happens to be that you don’t even need a single number to make a good sound bite. You can get away with a generalization, and as long as you repeat it long enough, those who hear it will believe.

Uncategorized

Summarizations of Applying Access Control to Search

November 8th, 2004

One of the issues with providing search functionality to corporate knowledge is that access control is in full effect; you can’t simply do full indexing of everything available, because each user has a different set of data available to them. It also seems like everyone writes their own access control system as well (including yours truly), complicating matters. I’ll overview some of the interesting stuff going on and then go on to list some ideas for implementation in open source.

My Summary:

The critical issue is whether to integrate your access control system into your indexing process, or to modularize it into its own component and provide an interface to the search application. If you modularize, you end up doing a roundtrip check to the security module while iterating over result sets interminably. If you integrate, then you have to inflate your indexes considerably, which probably doesn’t scale too well. Not to mention that you now probably have an mini (or full) access control system in parallel to your main access control system, which you now must maintain and replicate successfully.

Either way, it seems like a tricky problem. Here are my thoughts about practical implementation:

  • If you really want to keep the modularization, it might be possible to create some sort of batch access control check. Instead of iterating one check at a time, bundle up a chunk of your result set, send it over the wire to the access control system, and get back a matrix of results. Might work a little better and would probably incur less network overhead, even though it’s still the primitive solution.
  • If your set of credentials and content is manageable (and if you don’t mind being a jackass), you can try an unscalable solution of performing an exhaustive indexing operation at scheduled intervals for each credential over the entire set of content, cached at the search application level. This is also a primitive solution but probably would result in fast queries.
  • If your access control system does caching, that will help, but the first time is still going to be quite a nasty hit, and why would you search twice on the same terms in the same session?
  • This is kind of a stupid idea, but what if you could decouple the search application from your standard web idea of a search application, and treat it more like a P2P network search? In P2P applications, you enter terms, hit search, and alt-tab or go away and come back when it’s had more time to look around. This probably isn’t acceptable in web application UX unless the user understands that secure web applications with private content requires special handling. Good luck on that one. If they’re in their browser, they probably expect google.
  • For a more metadata-oriented access control solution, it might be possible to run and maintain multiple indexes, partitioned by metadata property, that basically consider themselves static content sets. Then, when you search over a user credential, you can leverage parallel checks to multiple indexes for each metadata property the user has access to. Then, you’ve probably got some set mathematics to perform on the parallel result sets that are returned, based upon the relationships between the metadata. This is some limited integration with the access control system, and is a pretty heinous idea, but it might work better in heavy or complex data sets if the processing power is available.

Research References:

There are very few providers or researchers that i’ve found doing work in this area. It even seems like nobody’s coined a proper term for such functionality in search, so I list some terms you can google for at the bottom.

Netegrity / Inktomi – SiteMinder

Netegrity collaborating with Inktomi have apparently abstracted out RBAC into Netegrity’s SiteMinder software. It connects to LDAP on the user management side, and integrates with Inktomi’s Enterprise Search Security Module to basically do a last-step check on each search result returned. It’s the primitive solution, and has a host of performance issues involved with abstracting the permissions system out of the search component. This is apparently the only commercial solution to the problem that I could find. They even say they’re the only vendor inside their PDF! If it’s in the PDF, it must be true.

  • If your search wants 100 results, do you just use that as a parameter for the initial grab of results? Or do you use that as a goal, and continue checking results until it all adds up to 100?

http://www.netegrity.com/partners/related/InktomiDatasheet.pdf

XenIntranet

There’s a reference in the changelogs for XenIntranet to adding access control to search. It looks like they use a custom ACL solution, and probably integrated it directly. See comments far below.

http://www.xenintranet.com/changelogs.php

Stanford Peers

This paper from Stanford people Mayank Bawa, Roberto Bayardo Jr., and Rakesh Agrawal describes a Privacy-Preserving Index. It also complains about the lack of Private information search technology, but the solution it posits seems to be more about preventing reverse engineering of data availability through special algorithms for building distributed indexes. The powerpoint below has animations describing the techniques.

*** Update – Mayank Bawa was kind enough to write me and point me to the original powerpoint slides for the presentation, so I changed the link below and removed my snarky comment about Stanford (full disclosure: I went to USC). Thanks, Mayank!

http://www-db.stanford.edu/~bawa/Pub/ppi.ppt

They do list a couple of interesting links.

The Stanford Peers P2P homepage. That’s interesting, that resource discovery over P2P networks may have a lot to do with access-controlled search. This page lists a lot of resources for reading on P2P network topics, but it’s a little stuffy in there.

IBM’s YouServ, a distributed personal webserver at use within IBM for web hosting / file sharing.

Chris Weider

This early paper (‘96!) from Chris Weider seems to touch briefly upon some of these issues in the second from last paragraph. It seems to be more concerned with exposing for-pay content to public users via normal search tools. Some solutions it describes are indexing proxies, which might index and expose for-pay content via search tools only. Think a9’s searching through book content for keywords.

http://www.isoc.org/isoc/whatis/conferences/inet/96/proceedings/a2/a2_1.htm

MIT Computation Structure Group

This bunch of people from MIT seem to be barking up the right tree. However, they like to use a bunch of big words. I think that when you’re dealing with a subject as complex as access control mapped onto search, you need to give your reader a bit of a break when it comes to academic huffing and puffing. Anywho, to summarize, they also complain about the performance implications of Netegrity-Inktomi-adopted approach of completely modularizing access control, and are working on integrating ACL’s into the Intenational Naming System.

http://www.csg.lcs.mit.edu/pubs/reports/search3.pdf, referenced in the MIT Computation Structure Group’s Search Project

If you want to do more research

Here are some of the terms I searched with that turned up goodies:

“permission-based search” “access-controlled search”

Possible outlets for implementation

  • Integrating a Lucene port with a standardized access control system – I might end up doing this with a customized access control system.

Anyway, if you’re researching or doing development on anything like these things, I would love to hear from you. gluk AT padtie dot com.

Uncategorized

Blogware comparison chart

November 7th, 2004

http://www.asymptomatic.net/blogbreakdown.htm

This looks to be a very thorough and nicely-formatted comparison chart between various blogging software.

Uncategorized

Lost America – photos of abandoned desert America.

November 6th, 2004

http://www.lostamerica.com/lostframe.html

A compelling long-exposure look at abandoned desert American lifestyles.

Uncategorized

Hard Drivin

November 5th, 2004