Æmail: Personal Archival Email for Unix

(Version 1.4.1)

Copyright © 1998 FlaterCo, Inc.
Copyright © 2003 David Flater.

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program.  If not, see <http://www.gnu.org/licenses/>.

THE ÆMAIL SOFTWARE DISTRIBUTION IS AVAILABLE FROM: ftp://ftp.flaterco.com/aemail/

Contents

  1. Overview
  2. Installation and Configuration
  3. Retrieving Email
  4. Recovery
  5. Support
  6. Changelog

Overview

A message received in email is no less important than a message received on paper.  Unfortunately, most offices are less equipped to maintain file copies of email messages than they are to keep file copies of paper memos.  As a result, the email equivalent of a "paper trail" is frequently in shambles.

Æmail is the email equivalent of a file cabinet.  Every email message is automatically archived and indexed for future retrieval.  This frees the user from the mail archiving task and allows him or her to concentrate on mail reading.

Æmail is distributed only as source code.  It is not "plug and play" software.  However, anyone who is familiar with the routine of compiling and installing Unix software should be able to install Æmail.

Æmail depends on two commonly installed Unix freeware packages, PostgreSQL and Procmail.  More information on these is provided below.

Æmail is designed for use by individuals.  It is not an "enterprise" solution.


Installation and Configuration

Required:  PostgreSQL

A compatible version of PostgreSQL must be installed and configured for access by the Æmail user(s).  PostgreSQL is freeware that may be obtained from http://www.postgresql.org/.  The most recent version tested with Æmail will be mirrored in ftp://ftp.flaterco.com/aemail/.  Some help in installing Postgres is available with this documentation.

Required:  libdstr

Libdstr is a small library that can be obtained from http://www.flaterco.com/util/index.html.

Getting Æmail

The Æmail software distribution is available from:  ftp://ftp.flaterco.com/aemail/

Compiling Æmail

The normal route:

bash-3.1$ ./configure
bash-3.1$ make
bash-3.1$ su
bash-3.1# make install

Æmail is packaged with the popular and portable GNU automake, so all usual GNU tricks should work.  Help on configuration options can be found in the INSTALL file or obtained by entering ./configure --help.

If your PostgreSql or libdstr installation is in a nonstandard location, you will need to set CPPFLAGS and LDFLAGS; e.g.,

bash-3.1$ ./configure CPPFLAGS="-I/usr/local/pgsql/include" LDFLAGS="-L/usr/local/pgsql/lib"

Creating a database for Æmail

Although Æmail can store its tables in any Postgres database, automatic recovery and migration to new versions of Æmail requires that a separate database be created for Æmail.  Use the createdb command to do this:

$ createdb aemail

Starting an email archive

The freeware mail processing program Procmail should be used to build the archive.  Procmail is a stable, well-documented program that can act as the system-wide mail delivery agent or just the mail processor for a single user.  Source code and binary distributions are widely available on the Internet, and many systems already use Procmail as the default mail delivery agent.

To create a single-user archive:

  1. Make sure that Procmail is installed and activated.
  2. Create an empty directory with the desired permissions:
     mkdir $HOME/aemail; chmod 700 $HOME/aemail 
  3. Add a rule similar to the following at the top of $HOME/.procmailrc:
    # Archival email
    :0c:
    $HOME/aemail
    

Archive maintenance

Although messages will begin to accumulate in the archive as soon as Procmail is configured, they will not be available through the query interface until they have been processed by aecronaecron eliminates duplicate copies of messages and then updates the Postgres database of email metadata.

As its name suggests, aecron is intended to be run as a nightly cron job.  (See the Unix man page on crontab for more information about cron jobs.)

To perform this task every night at midnight, add a line like the following to your crontab file:

0 0 * * * LD_LIBRARY_PATH=/lib:/usr/lib:/usr/local/pgsql/lib; export LD_LIBRARY_PATH; /home/myacct/bin/aecron /home/myacct/aemail aemail

The first command-line argument to aecron is the directory where you are archiving email.  The second is the name of the Postgres database to use.

On its first run, aecron will create the tables needed to store the metadata for the email archive.  This will result in diagnostics like the following:

Creating table AESCHEMAVERSION.
Creating table AEMESSAGES.
Creating index AEMESSAGES_SUBJECT_INDEX.
Creating index AEMESSAGES_MSGDATE_INDEX.
Creating index AEMESSAGES_ENVDATE_INDEX.
Creating table AEFROM.
Creating index AEFROM_MSGID_INDEX.
Creating index AEFROM_ADDR_INDEX.
Creating table AETO.
Creating index AETO_MSGID_INDEX.
Creating index AETO_ADDR_INDEX.
Creating table AEREFERENCES.
Creating index AEREFERENCES_MSGID_INDEX.
Creating index AEREFERENCES_REF_INDEX.
Creating table AETHREAD.

You should not receive these diagnostics any more once the tables have been created.

NOTE:  To avoid conflicts with Procmail, aecron does not process messages that are less than two hours old.

Environment

Usage of the query interface aemail will be simplified if the environment variables AEARC and AEDB are defined in the user's rc file (~/.bashrc, ~/.cshrc, or other similar file).  AEARC should be set to the aemail archiving directory; AEDB should be set to the name of the Postgres database used by Æmail.  In ~/.bashrc, one would enter:

export AEARC=/home/myacct/aemail
export AEDB=aemail

In ~/.cshrc, the syntax would be:

setenv AEARC /home/myacct/aemail
setenv AEDB aemail

Optionally, AEMBOX can be set to override the default mailbox for saving retrieved mail (normally ~/aemailbox):

export AEMBOX=/tmp/tempmail

Retrieving Email

Email is retrieved from the archive using the program aemail.  If the environment variables AEARC and AEDB have been defined as described above, no command-line arguments are needed to run aemail.  Otherwise, the aemail archiving directory and the database name can be provided on the command line as described by the on-line help:

Usage:  aemail [arcpath dbname] [-n]
  arcpath:  the aemail archiving directory
  dbname:  the name of the Postgres database
  -n:  retrieve new messages that have not been indexed yet
If not specified on the command line, arcpath and dbname must be provided in
the environment variables AEARC and AEDB.

Selecting messages

When invoked, aemail shows the number of messages in the database and then provides a simple query interface to support retrieval of messages based on message attributes.

The user is first prompted whether a regular query or a group query is desired.  A regular query retrieves messages where the sender matches a given address or where the sender does not matter.  The results can be further restricted based on the recipients.  A group query retrieves messages where EITHER the sender or a recipient appears in a given list of addresses.  Both kinds of query also allow you to restrict the results by date, subject, and content.

Once regular or group query is chosen, the user is prompted for the various attributes.  In each case, simply hitting the Enter key to accept the default value avoids putting any conditions on the attribute in question.  Thus, accepting the default for every attribute results in all available messages being selected.

Select messages
  From ? [default anyone]: 
  To ? [default anyone]: 
  Where date >= ? [default -infinity]: 
  And date <= ? [default infinity]: 
  And subject ~* ? [default anything]: 
  And message text contains ? [default anything]: 

The usage of each attribute is described in detail below.

Regular queries

From ? [default anyone]:

If a pattern is entered at this prompt, only messages having a matching address in the From, From:, Return-Path:, Sender:, X-Sender:, Resent-From:, or Resent-Sender: fields will be selected.

In its simplest usage, the pattern is just a username or a real name, such as "myacct" or "My Account."  However, advanced users may wish to use regular expressions.  The match is performed using the ~* Postgres operator, which is a case-insensitive regular expression comparison operator.  Please see the Postgres documentation for regular expression syntax.

To ? [default anyone]:

If a pattern is entered at this prompt, only messages having a matching address in the To:, X-To:, Cc:, X-Cc:, Bcc:, Resent-To:, Resent-Cc:, or Resent-Bcc: fields will be selected.  You will be prompted for additional recipients as follows:

  Also to ? [default done]:

Just hit Enter to accept the default "done" when you have entered enough recipients.  All of the specified recipients must appear in a message in order for that message to be selected.  The pattern matching is the same as for the From attribute.

Group queries

From or To ? [default anyone]:

If a pattern is entered at this prompt, messages having a matching address in the From, From:, Return-Path:, Sender:, X-Sender:, Resent-From:, Resent-Sender:, To:, X-To:, Cc:, X-Cc:, Bcc:, Resent-To:, Resent-Cc:, or Resent-Bcc: fields will be selected.  You will be prompted for additional recipients as follows:

  OR From or To ? [default done]:

Just hit Enter to accept the default "done" when you have entered enough addresses.  A message will be selected if any of the specified addresses appear as the sender or as a recipient.  The pattern matching is the same as for the From attribute.

All queries

Where date >= ? [default -infinity]:
And date <= ? [default infinity]:

If dates are supplied for one or both of the above prompts, only messages that were either sent or received within the specified interval will be selected.  Although many different syntaxes for the date can be parsed, the following syntax is recommended to be unambiguous: YYYY-MM-DD HH:MM.  For example, the beginning of the day of June 1, 1999 in your local time zone would be specified as 1999-06-01 00:00.  Optionally, the time zone can be specified explicitly with the syntax YYYY-MM-DD HH:MM[+-]HH:MM.  For example, the same date in the time zone two hours ahead of GMT would be 1999-06-01 00:00+02:00.

And subject ~* ? [default anything]:

If a pattern is entered at this prompt, only messages having matching text in the Subject: field will be selected.  The pattern matching is the same as for the From attribute.

And message text contains ? [default anything]:

If a text string is entered at this prompt, only messages whose full text includes the specified string will be selected.  Regular expression matching is not done in this case; however, the matching is case-insensitive.  You will be prompted for additional strings as follows:

  And also contains ? [default done]:

Just hit Enter to accept the default "done" when you have entered enough strings.  All of the specified strings must appear in a message in order for that message to be selected.

Selecting entire threads

After the query is executed, the number of matching messages will be shown and you will be given the option of "exploding" the message list to include all related messages:

Do you want to expand the list to include the entire threads in which these
messages appear? [default n]: 

If you answer this prompt with "y," several iterations of expansion will occur as aemail chases down all messages that are mentioned in References: and In-Reply-To: fields of selected messages, or that likewise mention any selected messages.  As this process continues, the increasing count of selected messages will be printed until the process completes.

Saving and reading messages

Once the set of selected messages has been finalized, you will have the option of saving these messages to a standard mailbox for browsing with your favorite mail reader.  If you do not wish to save the messages, you can enter "q" at the following prompt:

Enter filename, or q to abort [default /home/myacct/aemailbox]: 

The format of the saved mailbox is identical to that in which mail is originally delivered.  Most mail reading programs should give you a way to browse and manipulate the mailbox.  For example, under the VM mail reader, simply "visit" the mailbox to view the results.

Accessing new messages

If aecron is run nightly, then messages received on the same day will not yet be searchable.  However, these messages can still be retrieved from the archive using the -n command-line switch with aemail.  This option does not permit searching by attributes, but simply saves all of the as-yet unindexed messages to a mailbox.


Recovery

Æmail puts nothing in the database that cannot be regenerated from the email archive.  Should the database become corrupt, simply destroy it and re-create it empty per the example below.  aecron will repopulate the database on its next run.

$ dropdb aemail
$ createdb aemail

Support

Any questions, problems, or bug reports for Æmail should be directed to dave@flaterco.com.


Changelog

Version 1.4.1, 2008-04-13

  1. (Legal) Relicensed as GPLv3.
  2. (Cleanup) Fixed compiler warnings from GCC 4.3.0.
  3. (Cleanup) Removed AM_MAINTAINER_MODE from configure.ac; made 'make dist-bzip2' do the right thing; fixed distcheck failures.
  4. (Nit) Removed a comment from ae.hh.
  5. (Nit) Get version from configure.ac.
  6. (Documentation) Tested with and updated documentation for PostgreSQL 8.3.1.

Version 1.4, 2007-02-19

  1. (Cleanup) Modernized code to use STL and libdstr.
  2. (Usability) Repackaged with Automake.
  3. (Code rot) Updated for PostgreSQL 8.2.3.

Version 1.3, 2005-12-26

  1. (Code rot) Installed latest Dstr.
  2. (Robustness) Tweaked parsing of header lines to reduce likelihood of incorrect matches.
  3. (Robustness) Added X-Sender, X-Cc and X-To to the list of fields that are indexed.  This works around certain mailing list software that obfuscates the original headers.  The database must be rebuilt for this to work on old messages.
  4. (Robustness) Declared all message-ids longer than 175 characters to be invalid to prevent failure "rename: File name too long."

Version 1.2.2, 2005-06-16

  1. (Cleanup) Installed canonicalized Dstr.
  2. (Note) Tested with PostgreSQL 8.0.1.
  3. (Legal) Relicensed under GPL (i.e. gave up trying to sell it).

Version 1.2.1, 2004-09-04

(Nit) Cosmetic changes to screen verbiage.

Version 1.2, 2004-09-03

  1. (Robustness) Do the right thing unmangling Message-ID headers that are (improperly) folded across lines.
  2. (Feature) Added group query capability.

Version 1.1.4, 2004-05-20

  1. (Code rot) Made compatible with PostgreSQL 7.4.2.  Required schema changes (now version 2).
  2. (Robustness) Tolerate invalid header Date-warning.

Version 1.1.3, 2003-04-28

  1. (Bug) Fixed problem saving messages with null date.
  2. (Code rot) Expunged obsolete C++ streams.
  3. (Robustness) Assign arbitrary IDs to messages with missing or invalid Message-IDs instead of rejecting them.

Version 1.1.2, 2001-08-20

(Nit) Quashed more needless "no data found" messages from aemail.

Version 1.1.1, 2001-08-04

  1. (Bug) Fixed bug that caused messages with bogus date lines to appear as extraneous hits on queries over specific ranges of dates.
  2. (Code rot) Updated recognition of errors caused by bad timestamps in Date: field to work with newer Postgres.
  3. (Workaround) Removed vacuum from aecron to avoid the new Postgres error 'ERROR: VACUUM cannot run inside a BEGIN/END block'.
  4. (Nit) Fixed a typo in Dstr.cc that caused free() to be invoked on null pointers.
  5. (Nit) Quashed a needless "no data found" message from aemail.
  6. (Note) This version is tested with PostgreSQL 7.1.2.

Version 1.1, 1999-07-28

  1. (Code rot) Updated binding from PostgreSQL 6.3.2 to PostgreSQL 6.5.1 + ECPG 2.6.1 / libecpg 3.0.1 (a.k.a. postgresql-6.5.1-modified).
  2. (Nit) Always set permissions on retrieved mail to private instead of using default perms.
  3. (Feature) Added AEMBOX environment variable to select default mailbox for retrieved messages.
  4. (Note) Portability has increased with the new version of Postgres.

Version 1.0.4, 1999-05-16

(Robustness) Ignore trailing garbage and/or RFC 822 compliant comments on the Message-ID line instead of rejecting the message.

Version 1.0.3, 1999-02-16

(Red Hat workaround) Changes to Makefile only: added a workaround to help users of the Red Hat Postgres RPM avoid the error 'Cannot open include file sqlca' from ecpg.  (If Postgres is installed as suggested in its documentation, ecpg can find its include files without trouble.)

Version 1.0.2, 1999-01-29

(Bug) Fixed aecron problem with message IDs that contain slashes (some X400 mailer did this).

Version 1.0.1, 1999-01-07

  1. (Cosmetic) Quenched redundant error messages from query interface when unable to write the result mailbox to disk.
  2. (Robustness) Fixed bug in aecron that caused it to quit without processing all messages if Postgres rejected some particular message.  Added helpful diagnostics to explain what happens in these cases.
  3. (Nit) Changed printing of Postgres diagnostics to go to stderr instead of stdout.
  4. (Robustness) If the date in a message is not parseable, null it out instead of throwing out the whole message.

Version 1.0, 1998-11-23

Schema version 1, Postgres 6.3.2.  There were definite problems with Postgres 6.4.

Home