
$Id: mb2md.pl,v 1.26 2004/03/28 00:09:46 juri Exp $

mb2md-3.20.pl      Converts Mbox mailboxes to Maildir format.

Public domain.

currently maintained by:
Juri Haberland <juri@koschikode.com>
initially wrote by:
Robin Whittle

This script's web abode is http://batleth.sapienti-sat.org/projects/mb2md/ .
For a changelog see http://batleth.sapienti-sat.org/projects/mb2md/changelog.txt

The Mbox -> Maildir inner loop is based on  qmail's script mbox2maildir, which
was kludged by Ivan Kohler in 1997 from convertandcreate (public domain)
by Russel Nelson.  Both these convert a single mailspool file.

The qmail distribution has a maildir2mbox.c program.

What is does:
=============

Reads a directory full of Mbox format mailboxes and creates a set of
Maildir format mailboxes.  Some details of this are to suit Courier
IMAP's naming conventions for Maildir mailboxes.

  http://www.inter7.com/courierimap/

This is intended to automate the conversion of the old
/var/spool/mail/blah file - with one call of this script - and to
convert one or more mailboxes in a specifed directory with separate
calls with other command line arguments.

Run this as the user - in these examples "blah".

This version supports conversion of:

   Date    The date-time in the "From " line of the message in the
           Mbox format is the date when the message was *received*.
           This is transformed into the date-time of the file which
           contains the message in the Maildir mailbox.

           This relies on the Date::Parse perl module and the utime
           perl function.

           The script tries to cope with errant forms of the
           Mbox "From " line which it may encounter, but if
           there is something really screwy in a From line,
           then perhaps the script will fail when "touch"
           is given an invalid date.  Please report the
           exact nature of any such "From " line!


  Flagged
  Replied
  Read = Seen
  Tagged for Deletion

           In the Mbox message, flags for these are found in the
           "Status: N" or "X-Status: N" headers, where "N" is 0
           or more of the following characters in the left column.

           They are converted to characters in the right column,
           which become the last characters of the file name,
           following the ":2," which indicates IMAP message status.


               F -> F      Flagged
               A -> R      Replied
               R -> S      Read = Seen
               D -> T      Tagged for Deletion (Trash)

           This is based on the work of Philip Mak who wrote a
           completely separate Mbox -> Maildir converter called
           perfect_maildir and posted it to the Mutt-users mailing
           list on 25 December 2001:

              http://www.mail-archive.com/mutt-users@mutt.org/msg21872.html

           Michael Best originally integrated those changes into mb2md.


  In addition, the names of the message files in the Maildir are of a
  regular length and are of the form:

      7654321.000123.mbox:2,xxx

  Where "7654321" is the Unix time in seconds when the script was
  run and "000123" is the six zeroes padded message number as
  messages are converted from the Mbox file.  "xxx" represents zero or
  more of the above flags F, R, S or T.


---------------------------------------------------------------------


USAGE
=====

Run this as the user of the mailboxes, not as root.


mb2md -h
mb2md [-c] -m [-d destdir]
mb2md [-c] -s sourcefile [-d destdir]
mb2md [-c] -s sourcedir [-l wu-mailboxlist] [-R|-f somefolder] [-d destdir] [-r strip_extension]

 -c            use the Content-Length: headers (if present) to find the
               beginning of the next message
               Use with caution! Results may be unreliable. I recommend to do
               a run without "-c" first and only use it if you are certain,
               that the mbox in question really needs the "-c" option

 -m            If this is used then the source will
               be the single mailbox at /var/spool/mail/blah for
               user blah and the destination mailbox will be the
               "destdir" mailbox itself.


 -s source     Directory or file relative to the user's home directory,
               which is where the the "somefolders" directories are located.
               Or if starting with a "/" it is taken as a
               absolute path, e.g. /mnt/oldmail/user

               or

               A single mbox file which will be converted to
               the destdir.

 -R		 If defined, do not skip directories found in a mailbox 
	 directory, but runs recursively into each of them, 
		 creating all wanted folders in Maildir.
	 Incompatible with '-f'

 -f somefolder Directories, relative to "sourcedir" where the Mbox files
               are. All mailboxes in the "sourcedir"
               directory will be converted and placed in the
               "destdir" directory.  (Typically the Inbox directory
               which in this instance is also functioning as a
               folder for other mailboxes.)

               The "somefolder" directory
               name will be encoded into the new mailboxes' names.
               See the examples below.

               This does not save an UW IMAP dummy message file
               at the start of the Mbox file.  Small changes
               in the code could adapt it for looking for
               other distinctive patterns of dummy messages too.

               Don't let the source directory you give as "somefolders"
               contain any "."s in its name, unless you want to
               create subfolders from the IMAP user's point of
               view.  See the example below.

               Incompatible with '-f'


 -d destdir    Directory where the Maildir format directories will be created.
               If not given, then the destination will be ~/Maildir .
               Typically, this is what the IMAP server sees as the
               Inbox and the folder for all user mailboxes.
               If this begins with a '/' the path is considered to be
               absolute, otherwise it is relative to the users
               home directory.

 -r strip_ext  If defined this extension will be stripped from
               the original mailbox file name before creating
               the corresponding maildir. The extension must be
               given without the leading dot ("."). See the example below.

 -l WU-file    File containing the list of subscribed folders.  If
               migrating from WU-IMAP the list of subscribed folders will
               be found in the file called .mailboxlist in the users
               home directory.  This will convert all subscribed folders
               for a single user:
               /bin/mb2md -s mail -l .mailboxlist -R -d Maildir
               and for all users in a directory as root you can do the
               following:
               for i in *; do echo $i;su - $i -c "/bin/mb2md -s mail -l .mailboxlist -R -d Maildir";done


 Example
 =======

We have a bunch of directories of Mbox mailboxes located at
/home/blah/oldmail/

    /home/blah/oldmail/fffff
    /home/blah/oldmail/ggggg
    /home/blah/oldmail/xxx/aaaa
    /home/blah/oldmail/xxx/bbbb
    /home/blah/oldmail/xxx/cccc
    /home/blah/oldmail/xxx/dddd
    /home/blah/oldmail/yyyy/huey
    /home/blah/oldmail/yyyy/duey
    /home/blah/oldmail/yyyy/louie

With the UW IMAP server, fffff and ggggg would have appeared in the root
of this mail server, along with the Inbox.  aaaa, bbbb etc, would have
appeared in a folder called xxx from that root, and xxx was just a folder
not a mailbox for storing messages.

We also have the mailspool Inbox at:

    /var/spool/mail/blah


To convert these, as user blah, we give the first command:

   mb2md -m

The main Maildir directory will be created if it does not exist.
(This is true of any argument options, not just "-m".)

   /home/blah/Maildir/

It has the following subdirectories:

   /home/blah/Maildir/tmp/
   /home/blah/Maildir/new/
   /home/blah/Maildir/cur/

Then /var/spool/blah file is read, split into individual files and
written into /home/blah/Maildir/cur/ .

Now we give the second command:

   mb2md  -s oldmail -R

This reads recursively all Mbox mailboxes and creates:

   /home/blah/Maildir/.fffff/
   /home/blah/Maildir/.ggggg/
   /home/blah/Maildir/.xxx/
   /home/blah/Maildir/.xxx.aaaa/
   /home/blah/Maildir/.xxx.bbbb/
   /home/blah/Maildir/.xxx.cccc/
   /home/blah/Maildir/.xxx.aaaa/
   /home/blah/Maildir/.yyyy/
   /home/blah/Maildir/.yyyy.huey/
   /home/blah/Maildir/.yyyy.duey/
   /home/blah/Maildir/.yyyy.louie/

 The result, from the IMAP client's point of view is:

   Inbox -----------------
       |
       | fffff -----------
       | ggggg -----------
       |
       - xxx -------------
       |   | aaaa --------
       |   | bbbb --------
       |   | cccc --------
       |   | dddd --------
       |
       - yyyy ------------
            | huey -------
            | duey -------
            | louie ------

Note that although ~/Maildir/.xxx/ and ~/Maildir/.yyyy may appear
as folders to the IMAP client the above commands to not generate
any Maildir folders of these names.  These are simply elements
of the names of other Maildir directories. (if you used '-R', they 
whill be able to act as normal folders, containing messages AND folders)

With a separate run of this script, using just the "-s" option
without "-f" nor "-R", it would be possible to create mailboxes which
appear at the same location as far as the IMAP client is
concerned.  By having Mbox mailboxes in some directory:
~/oldmail/nnn/ of the form:

    /home/blah/oldmail/nn/xxxx
    /home/blah/oldmail/nn/yyyyy

then the command:

  mb2md -s oldmail/nn

will create two new Maildirs:

   /home/blah/Maildir/.xxx/
   /home/blah/Maildir/.yyyy/

Then what used to be the xxx and yyyy folders now function as
mailboxes too.  Netscape 4.77 needed to be put to sleep and given ECT
to recognise this - deleting the contents of (Win2k example):

   C:\Program Files\Netscape\Users\uu\ImapMail\aaa.bbb.ccc\

where "uu" is the user and "aaa.bbb.ccc" is the IMAP server

I often find that deleting all this directory's contents, except
"rules.dat", forces Netscape back to reality after its IMAP innards
have become twisted.  Then maybe use File > Subscribe - but this
seems incapable of subscribing to folders.

For Outlook Express, select the mail server, then click the
"IMAP Folders" button and use "Reset list".  In the "All"
window, select the mailboxes you want to see in normal
usage.


This script did not recurse subdirectories or delete old mailboxes, before addition of the '-R' parameter :)

Be sure not to be accessing the Mbox mailboxes while running this
script.  It does not attempt to lock them.  Likewise, don't run two
copies of this script either.


Trickier usage . . .
====================

If you have a bunch of mailboxes in a directory ~/oldmail/doors/
and you want them to appear in folders such as:

~/Maildir/.music.bands.doors.Jim
~/Maildir/.music.bands.doors.John

etc. so they appear in an IMAP folder:

   Inbox -----------------
       | music
             | bands
                   | doors
                         | Jim
                         | John
                         | Robbie
                         | Ray

Then you could rename the source directory to:

 ~/oldmail/music.bands.doors/

then use:

  mb2md -s oldmail -f music.bands.doors


Or simply use '-R' switch with:
  mb2md -s oldmail -R


Stripping mailbox extensions:
============================= 

If you want to convert mailboxes that came for example from
a Windows box than you might want to strip the extension of
the mailbox name so that it won't create a subfolder in your
mail clients view.

Example:
You have several mailboxes named Trash.mbx, Sent.mbx, Drafts.mbx
If you don't strip the extension "mbx" you will get the following
hierarchy:

Inbox
     |
      - Trash 
     |       | mbx
     |
      - Sent 
     |       | mbx
     |
      - Drafts 
             | mbx

This is more than ugly!
Just use:
  mb2md -s oldmail -r mbx

Note: don't specify the dot! It will be stripped off
automagically ;)

