Thoughts on data organization

Bennett Todd bet at rahul.net
Wed Dec 31 08:42:36 PST 2003


2003-12-31T11:34:08 Justin Knierim:
> First, I wish everyone a happy new year!

Yes, indeed!

> My question is, what are your thoughts, and what methods of
> organization do you all use?

I'm not into multimedia, so my problems are limited to plain text.

I've got my source code archives, built along software package
lines, I've got a few dozen projects of stuff I'm working on, and
I've got a big honkin' email archive.

I get the problem you're describing with my email archive in
particular.

For that one, I automatically file mailing list traffic into folders
named after the list, using procmail; various filters pull out
various sorts of dreck into special folders (e.g. spamassassin,
clamav, bogofilter); and individual correspondence is filed by the
email addr of the correspondent (mutt makes that easy).

While that organization works for most things, sometimes I want a
different view. For a while I used a full-text indexer, I forget the
name, it was associated with agrep, and I believe it came out of the
project that produced squid. But that indexer sorta went away, it
wasn't open source and after a while it failed to track new OSes and
wouldn't compile any more. I occasionally look for a replacement,
but haven't found one that suits, so lately I just find|xargs egrep
the thing. Either way, since I archive in Maildir, the output is a
list of files containing matches, then I link them into a tmp folder
and view that with mutt; mutt's limit command in particular is a joy
for further refining the view.

If I had the problem you described, my first inclination would be to
archive the material on the simplest structural basis --- one
archive for each major media type, organized however made the best
sense to me below that --- then produce the multi-media views, e.g.
everything in every media associated with a particular artist ---
using automatically-constructed symlink-populated directories. The
"automatically-constructed" bit is key, because I'd have a cronjob
rebuild these alternate-view link trees daily.

Hope this is helpful,

-Bennett
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.linuxfromscratch.org/pipermail/lfs-chat/attachments/20031231/0d6d690b/attachment.sig>


More information about the lfs-chat mailing list