trans-coord-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Non-ascii characters in the original articles


From: Kaloian Doganov
Subject: Re: Non-ascii characters in the original articles
Date: Thu, 07 Feb 2008 09:23:32 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.50 (gNewSense gnu/linux)

Yavor Doganov <address@hidden> writes:

    The proper fix here is just to use &eacute; in
    software-literary-patents.html, right?

In order to satisfy the current GNUN implementation -- yes.  But
technically, the article includes server/header.html, which declares
UTF-8.  So browsers interprets correctly any UTF-8 characters in it.

    We could specify UTF-8, but I bet some articles are in ISO-8859-1 or
    -15...

In fact all files that include header.html are encoded in UTF-8, but I
don't know which of them are counted as articles.  You can generate a
list of all HTML files that do not include header.html with the
following command, executed in www as current directory:

find . | grep "[.]s\?html$" | grep -v "[.]\(..\|zh-..\)[.]s\?html$" \
       | grep -v "^[.]/www\(in\|\es\)" | grep -v "^[.]/spanish" \
       | grep -v "^[.]/japan" | grep -v "^[.]/chinese" \
       | xargs grep -L '#include virtual="/server/header.html"'

Also, there are some articles that do not include header.html, but are
declaring UTF-8 by themselves (philosophy/software-patents.html, for
example).

They can be excluded by the generated list above with some additional
pipes:

       | xargs grep -L 'charset=utf-8'

Which outputs over 1500 filenames, mostly in doc/, prep/, server/ and
software/, but there are others too, like philosophy/amazonpatent.html.
I guess these files should be reworked using the standard templates,
before placing them under GNUN management.

So, I guess it is reasonable to require all original files that enter
GNUN to be encoded in UTF-8.  (Let me remind you that US-ASCII is a
subset of UTF-8, so all files that stick to US-ASCII files are fine with
this requirement.)




reply via email to

[Prev in Thread] Current Thread [Next in Thread]