Tag Archives: knowledge-base

Homebrewing Knowledge-Base from HBD Archives?

Uh-oh!

Started thinking again. This time about a way to repurpose messages on the HomeBrew Digest into a kind of database of brewing knowledge. I can just see it. It’d be ah-some!

Anybody knows how to transform email messages from well-structured digests into database entries? Seems to me that it should be a trivial task, especially for someone well-versed in Perl and/or PHP. But what do I know?
That venerable HBD mailing-list contains a wealth of information about pretty much every single dimension of beer homebrewing. For a large number of reasons, content from the HBD.org site turns up quite often in Web searches for brewing terms.

One issue with the HBD, though, is that it’s a bit hard to search. There used to be a custom-built search feature on the site but we now need to rely on Google and AltaVista. This wouldn’t be too much of an issue if not for the fact that those engines search complete digests instead of individual messages. So the co-occurrence of two terms in the same digest can be due to two messages on completely different subjects.

Another issue with the HBD (as with many other mailing-lists) is the relatively high redundancy in message content. Some topics came cyclically on the mailing-list and though some kind souls were gracious enough to respond to the same queries over and over again, the mailing-list often looks like an outlet for FAQs. Among HBD “perennials” (or cyclical topics) are discussions of the effects of HSA (hot-side aeration), decoction mashing, and batch sparging, to name but a few technical issues.

Unfortunately, it looks like the HBD might need to be retired at some point in the not-so-distant future, at least for lack of sponsorship. Also, Pat Babcock, the digest’s “janitor,” recently asked for mirror space and announced the retrieval of some of the older digests (from the late 1980s).

Of course, there are lots of other brewing resources out there. So many, in fact, that it can be overwhelming to the newbie brewer. One impact of having so much information so easily available about homebrewing (and commercial brewing, for that matter) is a “democratization of beer knowledge.” Contrary to brewing guilds of medieval times, brew groups are open and free. Yet a side-effect of this is that there isn’t a centralized authority to prevent disinformation. Also, because the accumulated knowledge is difficult to peruse, people tend to “reinvent the wheel.”

In Internet terms, the HBD is the closest equivalent to a historical source. Few other mailing-lists have been running continuously since 1986.

Luckily, all the digests since October 1988 are available as HTML files. And the digest format has remained almost unchanged since that time.
All of the content is in plain ASCII. Messages never exceed a certain
length. IIRC, line length is also controlled. And HTML was officially
not admitted. Apparently, some messages did contain a bit of HTML
code
, but that shouldn’t be an issue.

Here’s what I imagine could be done:

  1. “Burst” out digests into individual messages (with each message containing digest information)
  2. Put all the individual messages (350MB worth) into a Content Management System
  3. Host the archived messages in the form of a knowledge-base
  4. Process those entries for things like absolute links and line breaks
  5. Collect messages in threads
  6. Add relevant del.icio.us-like tags and slashdot- or digg-like ratings
  7. Use this knowledge-base for wiki-like collaborative editing
  8. Assess some key issues to be taken up by brewing communities
  9. Add to the brewing knowledge-base
  10. Build profiles for major contributors and major groups

Because I couldn’t help it, I started writing down some potential tags I might use to label messages on the HBD. It could be part “folksonomy,” part taxonomy. For one thing, it’d be useful to distinguish messages based on “type” (general queries about a brewing technique vs. recipe posted after a competition) since many of the same terms and tags would be found in radically different messages.