Re: documentation for (web ...)

guile-devel
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: documentation for (web ...)

From:	Neil Jerram
Subject:	Re: documentation for (web ...)
Date:	Fri, 21 Jan 2011 23:05:44 +0000
User-agent:	Gnus/5.13 (Gnus v5.13) Emacs/23.1 (gnu/linux)
Andy Wingo <address@hidden> writes:

> Heya Neil,

Hi Andy,

I've properly read all your responses on this now, and basically agree
with them all.  I've just added a few further comments on specific
points below.  I've also looked at the updated docs, and think they're
great.

I noticed a few minor glitches in the doc text, and a patch for those is
attached.  I've also attached patches for updating your (tekuti
mod-lisp) to the latest (web ...) API.

Thanks for working on this area; it's great to have this function in the
Guile core.

> So this default port stuff is a poor-man's scheme-specific
> normalization, to not display a port component in a serialization, if
> the port is the default for the scheme.

Thanks, I see that now.

>>>  -- Function: uri-decode str [#:charset]
>>>      Percent-decode the given STR, according to CHARSET.
>>
>> So the return value is a bytevector if CHARSET is #f, and a string if
>> not?
>
> Yes.

I'm still not completely sure here.  What if STR contains normal
characters as well as possible %XX sequences.  If I call uri-decode with
#:encoding #f, how is each normal character mapped into the resulting
bytevector?

>>>     `parser'
>>>           A procedure which takes a string and returns a parsed value.
>>> 
>>>     `validator'
>>>           A predicate, returning `#t' iff the value is valid for this
>>>           header.
>>
>> Maybe say something here about validator function often being very
>> similar to parsing function?
>
> They are not quite the same. A parser takes a string and produces a
> Scheme value, and a validator takes a Scheme value and returns #t iff
> the value is valid for that header. I will add an example.

Thanks, that makes better sense.  I was previously thinking that both
the validator and the parser acted on the raw header value.

I think there's still a glitch in the doc text, and have proposed an
update in the attached patch.

>> A possibly important point: what is the scope of the space in which
>> these header declarations are made?  My reason for asking is that this
>> infrastructure looks applicable for other HTTP-like protocols too, such
>> as SIP.  But the detailed rules for a given header in SIP may be
>> different from a header with the same name in HTTP, and hence different
>> header-decl objects would be needed.  Therefore, even though we claim no
>> other protocol support right now, perhaps we should anticipate that by
>> enhancing declare-header! so as to distinguish between HTTP-space and
>> other-protocol-spaces.
>
> It's a good question. HTTP is deliberately MIME-like, but specifies a
> number of important differences (see appendix 19.4 of RFC 2616).
>
> For now, the scope is limited to HTTP headers.

OK.

>> [After reading all through, I remain confused about exactly how general
>> this server infrastructure is intended to be]
>
> The ultimate intention is to allow the "web handler" stuff I mentioned
> at the end of the section, and to allow the web app author to not care
> very much what server is being used. To do this, we have to allow all
> kinds of "server" implementations -- CGI, direct HTTP to a socket,
> zeromq messages, etc. Regardless of how the request comes and the
> response goes, we need to be able to recognize and parse HTTP headers
> into their various appropriate data types -- and (web http) is really
> the middle, here. The (web server) stuff is a higher-level abstraction
> -- not a necessary abstraction, but helpful, if you can use it.

Thanks.  After playing with the mod-lisp code, I think I've finally
understood this.  The `web' in `(web ...)' means requests and response
with the HTTP-defined structure - even if they might be delivered to
application via something like modlisp or CGI.  Whereas the `http' in
`(web server http)' means delivery directly from/to a socket in HTTP
wire format.

Which is fine.  I daresay there might be a useful future extension to
something like SIP, but there's absolutely no need to try to engineer
that in now.

>>>  -- Function: lookup-header-decl name
>>>      Return the HEADER-DECL object registered for the given NAME.
>>> 
>>>      NAME may be a symbol or a string. Strings are mapped to headers in
>>>      a case-insensitive fashion.
>>> 
>>>  -- Function: valid-header? sym val
>>>      Returns a true value iff VAL is a valid Scheme value for the
>>>      header with name SYM.
>>
>> Note slight inconsistency in the two above deffns: "Return" vs
>> "Returns".
>
> Which is the right one? "Return"? Will change.

I very much doubt that Guile is globally consistent on this; but it was
quite noticeable here.

>>>    Now that we have a generic interface for reading and writing
>>> headers, we do just that.
>>> 
>>>  -- Function: read-header port
>>>      Reads one HTTP header from PORT. Returns two values: the header
>>>      name and the parsed Scheme value.
>>
>> As multiple values?  Is that more helpful than as a cons?
>
> Yes, as multiple values. The advantage is that returning multiple values
> from a Scheme procedure does not cause any allocation.

Ah, OK.

>>> The `(web http)' module defines parsers and unparsers for all headers
>>> defined in the HTTP/1.1 standard.  This section describes the parsed
>>> format of the various headers.
>>> 
>>>    We cannot describe the function of all of these headers, however, in
>>> sufficient detail.
>>
>> I don't get the point here.
>
> Do you mean that the reason is not apparent at this point in the
> document? I don't think the intro is worded very well, and indeed it
> appears to be a bit of buildup without knowing where you go... Maybe an
> example in the beginning would be apropos?

I meant that I didn't understand why "cannot" - rather than, say,
"don't" or "don't want to" - and the meaning of "sufficient detail" -
i.e. sufficient for what?

I think that the text now, "For full details on the meanings of all of
these headers, see the HTTP 1.1 standard, RFC 2616.", is better, and
covers these points.

> Or should we give brief descriptions of the meanings of all of these
> headers as well? That might be a good idea too.

No, I don't think that's needed.

>>> `transfer-encoding'
>>>      A param list of transfer codings.  `chunked' is the only known key.
>>
>> OK, why a param list rather than key-value?  How are elements in the
>> second key-value list, say, different from elements in the first
>> key-value list?
>
> Well, some of these headers are quite unfortunate in their
> construction.  In this case:
>
>      Transfer-Encoding       = "Transfer-Encoding" ":" 1#transfer-coding
>
> So really, this is a list. But:
>
>        transfer-coding         = "chunked" | transfer-extension
>        transfer-extension      = token *( ";" parameter )
>        parameter               = attribute "=" value
>        attribute               = token
>        value                   = token | quoted-string
>
> Given that a transfer-extension is really a toeken with a number of
> parameters, the thing gets complicated. You could have:
>
>   Transfer-Encoding: chunked,abcd,newthing;foo="bar, baz; qux";xyzzy
>
> which is hard to parse if you do it ad-hoc.  (web http) parses it as:
>
>    (transfer-encoding . ((chunked) ("abcd") ("newthing" ("foo . "bar, baz; 
> quz") "xyzzy")))
>
> Still complicated, but more uniform at least. Saying that `chunked' is
> the only known key means that it's the only one that's translated to a
> symbol; i.e. `abcd' is parsed to a string.  (This is to prevent attacks
> to intern a bunch of symbols; though symbols can be gc'd in guile.)
>
> Does that help? I'll see about replacing usages of "param list" as "list
> of key-value lists", as it's probably clearer, and we can save ourselves
> a definition.

Hmm.  I still don't feel I completely understand this; but on the other
hand it's too fiddly to me to want to go into more now.  I think I'll
wait until I actually have to process something with these structures.

>>> `www-authenticate'
>>>      A string.
>>
>> Obviously there's lots of substructure there (in WWW-Authenticate) that
>> we just don't support yet.  Is there a clear compatibility story for
>> if/when Guile is enhanced to parse that out?
>>
>> I guess yes; calling code will just need something like
>>
>>   (if (string? val)
>>       ;; An older Guile that doesn't parse authentication fully.
>>       (do-application-own-parsing)
>>       ;; A newer Guile that does parse authentication.
>>       (use-the-parsed-authentication-object))
>
> That's a very good question. The problem is that if we change the parsed
> representation, then old code breaks. That's why I put in the effort to
> give (hopefully) good representations for most headers, to avoid that
> situation -- though you appear to have caught one laziness on my part
> here, and in Authorizaton, Proxy-Authenticate, and Proxy-Authorization.

I don't think I could ever think that you are lazy!

> So maybe the right thing to do here is just to bite the bullet, parse as
> the RFC says we should, and avoid this situation.

As long as it's a bounded problem, fine.  (And I see that the modules do
now crack out authorizaton and authenticate headers.)

>> Hmm, I think the provision of this data type needs a bit more
>> motivation.  It doesn't appear to offer much additional value, compared
>> with reading or writing the components of a request individually, and on
>> the other hand it appears to bake in assumptions about charsets and
>> content length that might not always be true.
>
> I probably didn't explain it very well then. A request record holds the
> data from a request -- the method, uri, headers, etc. Additionally it
> can be read or written. It does not actually bake in assumptions about
> character sets or the like. It's simply that that HTTP protocol is
> flawed in this respect, that it mixes textual and binary data. We want
> to be able to read and parse requests, responses, and their headers
> using string and char routines, and that's fine as the character set for
> HTTP messages is restricted to a subset of the lower ASCII set. But then
> the body of an HTTP message is fundamentally binary -- the
> content-length is specified in bytes, not characters.

Thanks.  I think I see the usefulness of request and response objects
now, given the "overall picture" above.

> So the right way to read off a body is as a bytevector of the specified
> length (potentially with chunked transfer encoding of course, though we
> don't do that yet). Then if you want text, you decode using the
> character set specified in the request. If you are particularly lucky
> and it is a textual type and the charset is not specified, you can read
> it as a latin-1 string directly, otherwise you convert. Or you can deal
> with the binary data as a string.
>
> Setting the charset on the port is a bit of a hack, but it is the right
> thing to do if you are reading HTTP. And it doesn't matter what the
> charset is when you read the body as it's specified in bytes anyway and
> should be read in bytes (and then, possibly, decoded).
>
> Some more organized discussion should go in the manual... but what do
> you think?

The discussion in `An Important Note on Character Sets' looks good to
me.

>>>  -- Function: adapt-response-version response version
>>>      Adapt the given response to a different HTTP version. Returns a
>>>      new HTTP response.
>>
>> Interesting, and adds more value to the idea of the response object.
>> Why not for the request object too - are you assuming that Guile will
>> usually be acting as the HTTP server?  (Which I'm sure is correct, but
>> "usually" is not "always".)
>
> The thing is that the request initiates the transaction -- so it's the
> requestor that makes the version decision.

Oh yes, of course.

>>>   2. The `read' hook is called, to read a request from a new client.
>>>      The `read' hook takes one argument, the server socket.  It should
>>
>> It feels surprising for the infrastructure to pass the server socket to
>> the read hook.  I'd expect the infrastructure to do the `accept' itself
>> and pass the client socket to the read hook.
>
> It's the opaque "server socket object". Doing it this way has a two
> advantages:
>
>   1) Works with other socket architectures (zeromq, particularly).

I'm not familiar with that, but will take a look.

>   2) Allows the server to make its own implementation of keepalive
>   (or not).
>
> Particularly the latter is interesting -- the http implementation makes
> a big pollset (from (ice-9 poll), not yet documented), and polls on the
> server socket and all the keepalive sockets.

As does (tekuti mod-lisp) - so the duplication is a slight shame.  But I
agree that there's no reason why the work to unduplicate that should be
in (web server).

Regards,
        Neil
0001-Fix-typos-in-web-.-doc.patch
Description: Text Data
0001-Export-server-impl-so-that-applications-can-use-it.patch
Description: Text Data
0002-Update-body-related-calls-to-new-API.patch
Description: Text Data
0003-Avoid-using-lookup-header-decl-which-isn-t-exported.patch
Description: Text Data
0004-Update-to-new-parse-header-signature-which-only-retu.patch
Description: Text Data
0005-Update-to-new-build-request-signature.patch
Description: Text Data
0006-Don-t-validate-headers-as-we-get-an-apparently-inval.patch
Description: Text Data
[Prev in Thread]
Current Thread
[Next in Thread]
Re: documentation for (web ...), Andy Wingo, 2011/01/11
- Re: documentation for (web ...), Neil Jerram <=
  - Re: documentation for (web ...), Andy Wingo, 2011/01/22
Prev by Date: Re: PEG Parser
Next by Date: Re: documentation for (web ...)
Previous by thread: Re: documentation for (web ...)
Next by thread: Re: documentation for (web ...)
Index(es):
- Date
- Thread