[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Character literals for Unicode (control) characters
From: |
Paul Eggert |
Subject: |
Re: Character literals for Unicode (control) characters |
Date: |
Mon, 14 Mar 2016 13:03:38 -0700 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 |
Thanks, here's a detailed low level review.
Subject: [PATCH 4/4] Use `ucs-names'.
Summary lines like "Use `ucs-names'." should not end with "." and should
be as informative as possible within a 50-char limit.
+#include <stdnoreturn.h>
This include reportedly doesn't work well with Microsoft compilers. Omit
it and use _Noreturn instead of noreturn.
+/* Signals an `invalid-read-syntax' error indicating that the
+ character name in an \N{...} literal is invalid. */
Use active voice "Signal an" rather than a non-sentence. Don't use grave
quoting in comments (no quoting needed here anyway).
+static noreturn void invalid_character_name (Lisp_Object name)
Put "static _Noreturn void" on the first line, and the rest on the next
line; that's the usual GNU style.
+/* Checks that CODE is a valid Unicode scalar value, and returns its
+ value. CODE should be parsed from the character name given by
+ NAME. NAME is used for error messages. */
Active voice: "Checks" -> "Check".
+static int check_scalar_value (Lisp_Object code, Lisp_Object name)
"static int" in a separate line.
+{
+ if (! RANGED_INTEGERP (0, code, MAX_UNICODE_CHAR) ||
+ /* Don't allow surrogates. */
+ RANGED_INTEGERP (0xD800, code, 0xDFFF))
+ invalid_character_name (name);
+ return XINT (code);
+}
RANGED_INTEGERP implies two tests for integer. Better would be an
explicit NUMBERP check, followed by an XINT, followed by C-language
range checks. Just use <= or < in range checks (not >= or >).
Also, don't put operators like || at the end of a line; put them at the
start of the next line instead.
+/* If NAME starts with PREFIX, interpret the rest as a hexadecimal
+ number and return its value. Raises `invalid-read-syntax' if the
+ number is not a valid scalar value. Returns -1 if NAME doesn't
+ start with PREFIX. */
Active voice. No need for grave quoting.
+static int
+parse_code_after_prefix (Lisp_Object name, const char* prefix)
"char* x" -> "char *x" in GNU style.
+ if (name_len > prefix_len && name_len <= prefix_len + 8
Just use < or <= for range checks.
+ Lisp_Object code = string_to_number (SDATA (name) + prefix_len,
16, false);
+ if (! NILP (code))
+ return check_scalar_value (code, name);
Why is nil treated differently from other invalid values (e.g.,
floating-point numbers)? They're all invalid character names, right?
+ /* Various ranges of CJK characters; see UnicodeData.txt. */
+ if ((code >= 0x3400 && code <= 0x4DB5) ||
+ (code >= 0x4E00 && code <= 0x9FD5) ||
+ (code >= 0x20000 && code <= 0x2A6D6) ||
+ (code >= 0x2A700 && code <= 0x2B734) ||
+ (code >= 0x2B740 && code <= 0x2B81D) ||
+ (code >= 0x2B820 && code <= 0x2CEA1))
+ return code;
Use only <= here, and put || at the start of lines. What's the
likelihood that the numbers in the above test will change?
+ if (! CONSP (names))
+ invalid_syntax ("Unicode character name database not loaded");
This test is not needed, as ucs-names always returns a cons, and anyway
even if it didn't then Fassoc would do the right thing.
+ /* 200 characters is hopefully long enough. Increase if
+ not. */
+ char name[200];
Give a name to this constant, e.g.,
/* Bound on the length of a Unicode character name.
As of Unicode 9.0.0 the maximum is 83, so this should be safe. */
enum { UNICODE_CHARACTER_NAME_LENGTH_BOUND = 199 };
...
char name[UNICODE_CHARACTER_NAME_LENGTH_BOUND + 1];
- Re: Character literals for Unicode (control) characters, (continued)
- Re: Character literals for Unicode (control) characters, John Wiegley, 2016/03/05
- Re: Character literals for Unicode (control) characters, Philipp Stephani, 2016/03/06
- Re: Character literals for Unicode (control) characters, Eli Zaretskii, 2016/03/06
- Re: Character literals for Unicode (control) characters, Philipp Stephani, 2016/03/06
- Re: Character literals for Unicode (control) characters, Paul Eggert, 2016/03/06
- Re: Character literals for Unicode (control) characters, Philipp Stephani, 2016/03/06
- Re: Character literals for Unicode (control) characters, Paul Eggert, 2016/03/06
- Re: Character literals for Unicode (control) characters, Philipp Stephani, 2016/03/06
- Re: Character literals for Unicode (control) characters, Eli Zaretskii, 2016/03/06
- Re: Character literals for Unicode (control) characters, Philipp Stephani, 2016/03/13
- Re: Character literals for Unicode (control) characters,
Paul Eggert <=
- Re: Character literals for Unicode (control) characters, Eli Zaretskii, 2016/03/14
- Re: Character literals for Unicode (control) characters, Nikolai Weibull, 2016/03/15
- Re: Character literals for Unicode (control) characters, Eli Zaretskii, 2016/03/15
- Re: Character literals for Unicode (control) characters, Nikolai Weibull, 2016/03/16
- Re: Character literals for Unicode (control) characters, Clément Pit--Claudel, 2016/03/14
- Re: Character literals for Unicode (control) characters, Paul Eggert, 2016/03/14
- Re: Character literals for Unicode (control) characters, Philipp Stephani, 2016/03/19
- Re: Character literals for Unicode (control) characters, Paul Eggert, 2016/03/20
- Re: Character literals for Unicode (control) characters, Philipp Stephani, 2016/03/20
- Re: Character literals for Unicode (control) characters, Philipp Stephani, 2016/03/25