[Nmh-workers] m_getfld() and Friends.

nmh-workers
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Nmh-workers] m_getfld() and Friends.

From:	Ralph Corderoy
Subject:	[Nmh-workers] m_getfld() and Friends.
Date:	Tue, 23 May 2017 18:51:59 +0100
Hi,

I've been poking m_getfld() a bit, trying to get a firm understanding of
what all its callers demand.  Thought I'd pick the list's brains.

    int m_getfld(m_getfld_state_t *gstate,
        char name[NAMESZ],
        char *buf, int *bufsz,
        FILE *iob)

On entry, *bufsz is the size of buf.  I've been using buf[7] just to
have it be too small quite easily.  m_getfld() returns one of

    #define LENERR  (-2)    /* Name too long error from getfld  */
    #define FMTERR  (-3)    /* Message Format error             */
    #define FLD      0      /* Field returned                   */
    #define FLDPLUS  1      /* Field returned with more to come */
    #define BODY     3      /* Body  returned with more to come */
    #define FILEEOF  5      /* Reached end of input file        */

to indicate the type of buf's contents.

I temporarily modified uip/scan.c so it just loops until FILEEOF is
returned.  Here's the small test email I use.

    $ wc -c email
    75 email
    $
    $ cat email
    a: A
    ab: A
    abc: A
    abcd: A
    f: ABCDEFGHIJKLMNOPQRSTUVWXYZ

    body1
    body2
    body3
    $

And the output.

    state: field       read:  5   0- 75  name: 'a'       buf: ' A\n\0' =4
    state: field       read:  6  75- 75  name: 'ab'      buf: ' A\n\0' =4
    state: field       read:  7  75- 75  name: 'abc'     buf: ' A\n\0' =4
    state: field       read:  8  75- 75  name: 'abcd'    buf: ' A\n\0' =4
    state: field-plus  read:  7  75- 75  name: 'f'       buf: ' ABCDE\0' =7
    state: field-plus  read:  6  75- 75  name: 'f'       buf: 'FGHIJK\0' =7
    state: field-plus  read:  6  75- 75  name: 'f'       buf: 'LMNOPQ\0' =7
    state: field-plus  read:  6  75- 75  name: 'f'       buf: 'RSTUVW\0' =7
    state: field       read:  5  75- 75  name: 'f'       buf: 'XYZ\n\0' =5
    state: body        read:  7  75- 75  name: ''        buf: 'body1\n\0' =7
    state: body        read:  6  75- 75  name: ''        buf: 'body2\n\0' =7
    state: body        read:  6  75- 75  name: ''        buf: 'body3\n\0' =7
    state: eof         read:  0  75- 75  name: ''        buf: '\0' =1

`read' is the value of *bufsz after the call.  It seems to be telling me
how many bytes of input have been processed?

The `0- 75' is ftello(3)'s result before and after m_getfld().  This
email is small enough that it can read the file in one go on the first
call into some buffer of its own;  buf[7] being too small.

For field and field-plus state return values, `name' is the header's
name.  A sequence of field-plus is terminated by a field.

I print buf[]'s contents until the NUL, the `=4' is how many bytes were
printed.  Note, it does not tally with `read';  that's fine.

The sum of read's `5 6 7 8 7 6 6 6 5 7 6 6 0' is 75, matching wc(1)
above.

The body state doesn't have the `plus' variation, like field, despite
the `with more to come' part of the comment.

    #define FLDPLUS  1      /* Field returned with more to come */
    #define BODY     3      /* Body  returned with more to come */

Looking more closely at the read values,

    state: field       read:  5   0- 75  name: 'a'       buf: ' A\n\0' =4
    state: field       read:  6  75- 75  name: 'ab'      buf: ' A\n\0' =4
    state: field       read:  7  75- 75  name: 'abc'     buf: ' A\n\0' =4
    state: field       read:  8  75- 75  name: 'abcd'    buf: ' A\n\0' =4

The 5 is `f: A\n'.  6, 7, and 8 are similar with growing header names.
So far, 5 6 7 8 sum to 26, and that checks out.

    $ od -Ad -cN26 email
    0000000   a   :       A  \n   a   b   :       A  \n   a   b   c   :
    0000016   A  \n   a   b   c   d   :       A  \n

Next,

    state: field-plus  read:  7  75- 75  name: 'f'       buf: ' ABCDE\0' =7

`f: ABCDE' is eight, but read is 7.

    $ od -Ad -cj26 -N7 email
    0000026   f   :       A   B   C   D

sizeof buf is 7 so ' ABCDE\0' =7 above is correct;  buf has been fully
utilised.  Should read be 8, or have I misunderstood its intent?

    state: field-plus  read:  6  75- 75  name: 'f'       buf: 'FGHIJK\0' =7
    state: field-plus  read:  6  75- 75  name: 'f'       buf: 'LMNOPQ\0' =7
    state: field-plus  read:  6  75- 75  name: 'f'       buf: 'RSTUVW\0' =7

Next three field-plus are back on track;  six read each time.

    state: field       read:  5  75- 75  name: 'f'       buf: 'XYZ\n\0' =5

The end of the `f' header, `XYZ\n' is four read, but I think read=5
because it's including the `\n' that ends the headers' section?  Let's
assume that.

sum 5 6 7 8 7 6 6 6 5 is 56.

    $ od -Ad -cN56 email
    0000000   a   :       A  \n   a   b   :       A  \n   a   b   c   :
    0000016   A  \n   a   b   c   d   :       A  \n   f   :       A   B   C
    0000032   D   E   F   G   H   I   J   K   L   M   N   O   P   Q   R   S
    0000048   T   U   V   W   X   Y   Z  \n

Still one adrift after the earlier problem;  header-ending `\n' not included.

    state: body        read:  7  75- 75  name: ''        buf: 'body1\n\0' =7

`body1\n' is only six but read=7, so is this also counting the `\n' that
never ends up in buf, but precedes the body?  That means it features
twice in `read' as an extra, but never in buf.

The double counting here "fixes" the shortage earlier at the first
field-plus.  reads of 5 6 7 8 7 6 6 6 5 7 sum to 63.

    $ od -Ad -cN63 email
    0000000   a   :       A  \n   a   b   :       A  \n   a   b   c   :
    0000016   A  \n   a   b   c   d   :       A  \n   f   :       A   B   C
    0000032   D   E   F   G   H   I   J   K   L   M   N   O   P   Q   R   S
    0000048   T   U   V   W   X   Y   Z  \n  \n   b   o   d   y   1  \n

We're back in sync!

    state: body        read:  6  75- 75  name: ''        buf: 'body2\n\0' =7
    state: body        read:  6  75- 75  name: ''        buf: 'body3\n\0' =7

These correctly have a read of 6.

    state: eof         read:  0  75- 75  name: ''        buf: '\0' =1

eof state neatly says read=0 and makes sure nothing is in buf.

m_getfld() has another mode where it keeps the FILE's position set.
With that, ftello(3) before and after show different positions.  Nothing
else changes, including the `read's.

    state: field       read:  5   0-  5  name: 'a'       buf: ' A\n\0' =4
    state: field       read:  6   5- 11  name: 'ab'      buf: ' A\n\0' =4
    state: field       read:  7  11- 18  name: 'abc'     buf: ' A\n\0' =4
    state: field       read:  8  18- 26  name: 'abcd'    buf: ' A\n\0' =4
    state: field-plus  read:  7  26- 33  name: 'f'       buf: ' ABCDE\0' =7
    state: field-plus  read:  6  33- 39  name: 'f'       buf: 'FGHIJK\0' =7
    state: field-plus  read:  6  39- 45  name: 'f'       buf: 'LMNOPQ\0' =7
    state: field-plus  read:  6  45- 51  name: 'f'       buf: 'RSTUVW\0' =7
    state: field       read:  5  51- 56  name: 'f'       buf: 'XYZ\n\0' =5
    state: body        read:  7  56- 63  name: ''        buf: 'body1\n\0' =7
    state: body        read:  6  63- 69  name: ''        buf: 'body2\n\0' =7
    state: body        read:  6  69- 75  name: ''        buf: 'body3\n\0' =7
    state: eof         read:  0  75- 75  name: ''        buf: '\0' =1

The cumulation of the `read's matches the `after' file position.

    $ dc <<<'0 5+p 6+p 7+p 8+p 7+p 6+p 6+p 6+p 5+p 7+p 6+p 6+p 0+p' | fmt
    5 11 18 26 33 39 45 51 56 63 69 75 75

Reading those ranges of positions gives

    0000000   a   :       A  \n
    0000005   a   b   :       A  \n
    0000011   a   b   c   :       A  \n
    0000018   a   b   c   d   :       A  \n
    0000026   f   :       A   B   C   D
    0000033   E   F   G   H   I   J
    0000039   K   L   M   N   O   P
    0000045   Q   R   S   T   U   V
    0000051   W   X   Y   Z  \n
    0000056  \n   b   o   d   y   1  \n
    0000063   b   o   d   y   2  \n
    0000069   b   o   d   y   3  \n
    0000075

This matches the above account;  out of sync at the `E'.  The separating
`\n' is in the range for the first `body'.

Questions:
Should the file position always be just after what's returned in buf?
And cumulative `read's to that point match the position?
buf should never have the separating `\n', but the `read' that skipped
it for the first `body' return will be one higher to keep the cumulation
in sync.

I think that makes the desired output

    state: field       read:  5   0-  5  name: 'a'       buf: ' A\n\0' =4
    state: field       read:  6   5- 11  name: 'ab'      buf: ' A\n\0' =4
    state: field       read:  7  11- 18  name: 'abc'     buf: ' A\n\0' =4
    state: field       read:  8  18- 26  name: 'abcd'    buf: ' A\n\0' =4
    state: field-plus  read:  8¹ 26- 34  name: 'f'       buf: ' ABCDE\0' =7
    state: field-plus  read:  6  34- 40  name: 'f'       buf: 'FGHIJK\0' =7
    state: field-plus  read:  6  40- 46  name: 'f'       buf: 'LMNOPQ\0' =7
    state: field-plus  read:  6  46- 52  name: 'f'       buf: 'RSTUVW\0' =7
    state: field       read:  4² 52- 56  name: 'f'       buf: 'XYZ\n\0' =5
    state: body        read:  7³ 56- 63  name: ''        buf: 'body1\n\0' =7
    state: body        read:  6  63- 69  name: ''        buf: 'body2\n\0' =7
    state: body        read:  6  69- 75  name: ''        buf: 'body3\n\0' =7
    state: eof         read:  0  75- 75  name: ''        buf: '\0' =1

where

    1.  read=8 not 7 to include the `E' in buf.
    2.  read=4 not 5 to exclude the seperating `\n' not in buf.
        state returned is `field', not `field-last', so I don't think
        read should deviate.
    3.  read=7 still to cumulate the seperating '\n' not in buf.

-- 
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy
[Prev in Thread]
Current Thread
[Next in Thread]
[Nmh-workers] m_getfld() and Friends., Ralph Corderoy <=
Prev by Date: Re: [Nmh-workers] Request Deprecation of mts.conf's mmdelim1 and mmdelim2.
Next by Date: Re: [Nmh-workers] Request Deprecation of mts.conf's mmdelim1 and mmdelim2.
Previous by thread: [Nmh-workers] Call for 1.7, again
Next by thread: [Nmh-workers] Nabbing /usr/bin Space.
Index(es):
- Date
- Thread