This document defines dotmail file format, which is intended to replace mbox file format, which has a number of well-known problems. A dotmail file stores multiple Internet mail messages in a single text file, using a mechanism from SMTP. RFC 822 (and successors) specifies the format of a mail message sent over the Internet, but doesn't include a way to mark the end of the message. That's done by SMTP (RFC 821 and successors). SMTP's DATA command -- the one that transfers a message -- follows the last line of the message with a line containing one character, '.', a period. Of course, the message itself could also contain such a line, so the DATA command escapes any message line that starts with '.' by preceding it with another '.' (this is called dot-stuffing). That dot-stuffing is easily undone by the receiver, restoring the original message. See full details below, in an excerpt from RFC 5321. A dotmail file stores a sequence of RFC 822 messages in a text file, using SMTP's mechanism for end of message. Each message, including the last, is followed by a line containing only "."; message lines that start with '.' are dot-stuffed. Software that reads the file takes a line containing only "." as end of message, and undoes the dot-stuffing of message lines. This is the algorithm from RFC 5321. A dotmail file is a valid text file on whatever system it's on. That means the end-of-line characters vary per system (CR LF, LF alone, CR alone, or possibly some entirely different mechanism). When a dotmail file is transferred, say, from a Unix system to a Microsoft system, the file has to be converted from one text file format to the other. (This is different from RFC 822 and SMTP, which are concerned only with sending mail across the Internet, not storing it in files. The network form always uses CR LF.) A message in a dotmail file is NOT preceded by a line like: From address@hidden Thu Jan 1 00:00:00 1970 and is not followed by a blank line. Those two lines are a frame added by mbox; they are not part of the message. Messages in a dotmail file should not contain the non-standard header fields Lines: and Content-Length:. Those fields are attempts to work around mbox's problems; they should not exist outside mbox files. If either is present in a message in a dotmail file, it MUST NOT be used to find the end of the message. Only the dot line may be used for that. ================================================================= Below are a sample RFC 822 message, that message in dotmail format, and that message in mbox format. The lines of hyphens are not part of any of those; they're to show the boundaries of the examples within this document. ----------------------------------------- From: address@hidden To: address@hidden Message-ID: <0> Date: Thu, 01 Jan 1970 00:00:00 +0000 From the beginning of time, mbox has been lame. dog .dog . ----------------------------------------- From: address@hidden To: address@hidden Message-ID: <0> Date: Thu, 01 Jan 1970 00:00:00 +0000 From the beginning of time, mbox has been lame. dog ..dog .. . ----------------------------------------- From address@hidden Thu Jan 1 00:02:17 1970 From: address@hidden To: address@hidden Message-ID: <0> Date: Thu, 01 Jan 1970 00:00:00 +0000 >From the beginning of time, mbox has been lame. dog .dog . ----------------------------------------- ================================================================= From RFC 5321 (SMTP, successor to RFC 821): 4.5.2. Transparency Without some provision for data transparency, the character sequence "." ends the mail text and cannot be sent by the user. In general, users are not aware of such "forbidden" sequences. To allow all user composed text to be transmitted transparently, the following procedures are used: o Before sending a line of mail text, the SMTP client checks the first character of the line. If it is a period, one additional period is inserted at the beginning of the line. o When a line of mail text is received by the SMTP server, it checks the line. If the line is composed of a single period, it is treated as the end of mail indicator. If the first character is a period and there are other characters on the line, the first character is deleted. The mail data may contain any of the 128 ASCII characters. All characters are to be delivered to the recipient's mailbox, including spaces, vertical and horizontal tabs, and other control characters. If the transmission channel provides an 8-bit byte (octet) data stream, the 7-bit ASCII codes are transmitted, right justified, in the octets, with the high-order bits cleared to zero. See Section 3.6 for special treatment of these conditions in SMTP systems serving a relay function. In some systems, it may be necessary to transform the data as it is received and stored. This may be necessary for hosts that use a different character set than ASCII as their local character set, that store data in records rather than strings, or which use special character sequences as delimiters inside mailboxes. If such transformations are necessary, they MUST be reversible, especially if they are applied to mail being relayed.