|
From: | Neil Jerram |
Subject: | Re: documentation for (web ...) |
Date: | Fri, 21 Jan 2011 23:05:44 +0000 |
User-agent: | Gnus/5.13 (Gnus v5.13) Emacs/23.1 (gnu/linux) |
Andy Wingo <address@hidden> writes: > Heya Neil, Hi Andy, I've properly read all your responses on this now, and basically agree with them all. I've just added a few further comments on specific points below. I've also looked at the updated docs, and think they're great. I noticed a few minor glitches in the doc text, and a patch for those is attached. I've also attached patches for updating your (tekuti mod-lisp) to the latest (web ...) API. Thanks for working on this area; it's great to have this function in the Guile core. > So this default port stuff is a poor-man's scheme-specific > normalization, to not display a port component in a serialization, if > the port is the default for the scheme. Thanks, I see that now. >>> -- Function: uri-decode str [#:charset] >>> Percent-decode the given STR, according to CHARSET. >> >> So the return value is a bytevector if CHARSET is #f, and a string if >> not? > > Yes. I'm still not completely sure here. What if STR contains normal characters as well as possible %XX sequences. If I call uri-decode with #:encoding #f, how is each normal character mapped into the resulting bytevector? >>> `parser' >>> A procedure which takes a string and returns a parsed value. >>> >>> `validator' >>> A predicate, returning `#t' iff the value is valid for this >>> header. >> >> Maybe say something here about validator function often being very >> similar to parsing function? > > They are not quite the same. A parser takes a string and produces a > Scheme value, and a validator takes a Scheme value and returns #t iff > the value is valid for that header. I will add an example. Thanks, that makes better sense. I was previously thinking that both the validator and the parser acted on the raw header value. I think there's still a glitch in the doc text, and have proposed an update in the attached patch. >> A possibly important point: what is the scope of the space in which >> these header declarations are made? My reason for asking is that this >> infrastructure looks applicable for other HTTP-like protocols too, such >> as SIP. But the detailed rules for a given header in SIP may be >> different from a header with the same name in HTTP, and hence different >> header-decl objects would be needed. Therefore, even though we claim no >> other protocol support right now, perhaps we should anticipate that by >> enhancing declare-header! so as to distinguish between HTTP-space and >> other-protocol-spaces. > > It's a good question. HTTP is deliberately MIME-like, but specifies a > number of important differences (see appendix 19.4 of RFC 2616). > > For now, the scope is limited to HTTP headers. OK. >> [After reading all through, I remain confused about exactly how general >> this server infrastructure is intended to be] > > The ultimate intention is to allow the "web handler" stuff I mentioned > at the end of the section, and to allow the web app author to not care > very much what server is being used. To do this, we have to allow all > kinds of "server" implementations -- CGI, direct HTTP to a socket, > zeromq messages, etc. Regardless of how the request comes and the > response goes, we need to be able to recognize and parse HTTP headers > into their various appropriate data types -- and (web http) is really > the middle, here. The (web server) stuff is a higher-level abstraction > -- not a necessary abstraction, but helpful, if you can use it. Thanks. After playing with the mod-lisp code, I think I've finally understood this. The `web' in `(web ...)' means requests and response with the HTTP-defined structure - even if they might be delivered to application via something like modlisp or CGI. Whereas the `http' in `(web server http)' means delivery directly from/to a socket in HTTP wire format. Which is fine. I daresay there might be a useful future extension to something like SIP, but there's absolutely no need to try to engineer that in now. >>> -- Function: lookup-header-decl name >>> Return the HEADER-DECL object registered for the given NAME. >>> >>> NAME may be a symbol or a string. Strings are mapped to headers in >>> a case-insensitive fashion. >>> >>> -- Function: valid-header? sym val >>> Returns a true value iff VAL is a valid Scheme value for the >>> header with name SYM. >> >> Note slight inconsistency in the two above deffns: "Return" vs >> "Returns". > > Which is the right one? "Return"? Will change. I very much doubt that Guile is globally consistent on this; but it was quite noticeable here. >>> Now that we have a generic interface for reading and writing >>> headers, we do just that. >>> >>> -- Function: read-header port >>> Reads one HTTP header from PORT. Returns two values: the header >>> name and the parsed Scheme value. >> >> As multiple values? Is that more helpful than as a cons? > > Yes, as multiple values. The advantage is that returning multiple values > from a Scheme procedure does not cause any allocation. Ah, OK. >>> The `(web http)' module defines parsers and unparsers for all headers >>> defined in the HTTP/1.1 standard. This section describes the parsed >>> format of the various headers. >>> >>> We cannot describe the function of all of these headers, however, in >>> sufficient detail. >> >> I don't get the point here. > > Do you mean that the reason is not apparent at this point in the > document? I don't think the intro is worded very well, and indeed it > appears to be a bit of buildup without knowing where you go... Maybe an > example in the beginning would be apropos? I meant that I didn't understand why "cannot" - rather than, say, "don't" or "don't want to" - and the meaning of "sufficient detail" - i.e. sufficient for what? I think that the text now, "For full details on the meanings of all of these headers, see the HTTP 1.1 standard, RFC 2616.", is better, and covers these points. > Or should we give brief descriptions of the meanings of all of these > headers as well? That might be a good idea too. No, I don't think that's needed. >>> `transfer-encoding' >>> A param list of transfer codings. `chunked' is the only known key. >> >> OK, why a param list rather than key-value? How are elements in the >> second key-value list, say, different from elements in the first >> key-value list? > > Well, some of these headers are quite unfortunate in their > construction. In this case: > > Transfer-Encoding = "Transfer-Encoding" ":" 1#transfer-coding > > So really, this is a list. But: > > transfer-coding = "chunked" | transfer-extension > transfer-extension = token *( ";" parameter ) > parameter = attribute "=" value > attribute = token > value = token | quoted-string > > Given that a transfer-extension is really a toeken with a number of > parameters, the thing gets complicated. You could have: > > Transfer-Encoding: chunked,abcd,newthing;foo="bar, baz; qux";xyzzy > > which is hard to parse if you do it ad-hoc. (web http) parses it as: > > (transfer-encoding . ((chunked) ("abcd") ("newthing" ("foo . "bar, baz; > quz") "xyzzy"))) > > Still complicated, but more uniform at least. Saying that `chunked' is > the only known key means that it's the only one that's translated to a > symbol; i.e. `abcd' is parsed to a string. (This is to prevent attacks > to intern a bunch of symbols; though symbols can be gc'd in guile.) > > Does that help? I'll see about replacing usages of "param list" as "list > of key-value lists", as it's probably clearer, and we can save ourselves > a definition. Hmm. I still don't feel I completely understand this; but on the other hand it's too fiddly to me to want to go into more now. I think I'll wait until I actually have to process something with these structures. >>> `www-authenticate' >>> A string. >> >> Obviously there's lots of substructure there (in WWW-Authenticate) that >> we just don't support yet. Is there a clear compatibility story for >> if/when Guile is enhanced to parse that out? >> >> I guess yes; calling code will just need something like >> >> (if (string? val) >> ;; An older Guile that doesn't parse authentication fully. >> (do-application-own-parsing) >> ;; A newer Guile that does parse authentication. >> (use-the-parsed-authentication-object)) > > That's a very good question. The problem is that if we change the parsed > representation, then old code breaks. That's why I put in the effort to > give (hopefully) good representations for most headers, to avoid that > situation -- though you appear to have caught one laziness on my part > here, and in Authorizaton, Proxy-Authenticate, and Proxy-Authorization. I don't think I could ever think that you are lazy! > So maybe the right thing to do here is just to bite the bullet, parse as > the RFC says we should, and avoid this situation. As long as it's a bounded problem, fine. (And I see that the modules do now crack out authorizaton and authenticate headers.) >> Hmm, I think the provision of this data type needs a bit more >> motivation. It doesn't appear to offer much additional value, compared >> with reading or writing the components of a request individually, and on >> the other hand it appears to bake in assumptions about charsets and >> content length that might not always be true. > > I probably didn't explain it very well then. A request record holds the > data from a request -- the method, uri, headers, etc. Additionally it > can be read or written. It does not actually bake in assumptions about > character sets or the like. It's simply that that HTTP protocol is > flawed in this respect, that it mixes textual and binary data. We want > to be able to read and parse requests, responses, and their headers > using string and char routines, and that's fine as the character set for > HTTP messages is restricted to a subset of the lower ASCII set. But then > the body of an HTTP message is fundamentally binary -- the > content-length is specified in bytes, not characters. Thanks. I think I see the usefulness of request and response objects now, given the "overall picture" above. > So the right way to read off a body is as a bytevector of the specified > length (potentially with chunked transfer encoding of course, though we > don't do that yet). Then if you want text, you decode using the > character set specified in the request. If you are particularly lucky > and it is a textual type and the charset is not specified, you can read > it as a latin-1 string directly, otherwise you convert. Or you can deal > with the binary data as a string. > > Setting the charset on the port is a bit of a hack, but it is the right > thing to do if you are reading HTTP. And it doesn't matter what the > charset is when you read the body as it's specified in bytes anyway and > should be read in bytes (and then, possibly, decoded). > > Some more organized discussion should go in the manual... but what do > you think? The discussion in `An Important Note on Character Sets' looks good to me. >>> -- Function: adapt-response-version response version >>> Adapt the given response to a different HTTP version. Returns a >>> new HTTP response. >> >> Interesting, and adds more value to the idea of the response object. >> Why not for the request object too - are you assuming that Guile will >> usually be acting as the HTTP server? (Which I'm sure is correct, but >> "usually" is not "always".) > > The thing is that the request initiates the transaction -- so it's the > requestor that makes the version decision. Oh yes, of course. >>> 2. The `read' hook is called, to read a request from a new client. >>> The `read' hook takes one argument, the server socket. It should >> >> It feels surprising for the infrastructure to pass the server socket to >> the read hook. I'd expect the infrastructure to do the `accept' itself >> and pass the client socket to the read hook. > > It's the opaque "server socket object". Doing it this way has a two > advantages: > > 1) Works with other socket architectures (zeromq, particularly). I'm not familiar with that, but will take a look. > 2) Allows the server to make its own implementation of keepalive > (or not). > > Particularly the latter is interesting -- the http implementation makes > a big pollset (from (ice-9 poll), not yet documented), and polls on the > server socket and all the keepalive sockets. As does (tekuti mod-lisp) - so the duplication is a slight shame. But I agree that there's no reason why the work to unduplicate that should be in (web server). Regards, Neil
0001-Fix-typos-in-web-.-doc.patch
Description: Text Data
0001-Export-server-impl-so-that-applications-can-use-it.patch
Description: Text Data
0002-Update-body-related-calls-to-new-API.patch
Description: Text Data
0003-Avoid-using-lookup-header-decl-which-isn-t-exported.patch
Description: Text Data
0004-Update-to-new-parse-header-signature-which-only-retu.patch
Description: Text Data
0005-Update-to-new-build-request-signature.patch
Description: Text Data
0006-Don-t-validate-headers-as-we-get-an-apparently-inval.patch
Description: Text Data
[Prev in Thread] | Current Thread | [Next in Thread] |