[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Character error not reported
From: |
Hans Åberg |
Subject: |
Re: Character error not reported |
Date: |
Tue, 18 Jun 2019 18:09:08 +0200 |
> On 17 Jun 2019, at 18:06, Akim Demaille <address@hidden> wrote:
>
> Hi Hans,
Hi,
>> Le 17 juin 2019 à 15:12, Hans Åberg <address@hidden> a écrit :
>>
>> When a byte with high bit set that is not used in the grammar, the parser
>> generated by Bison 3.4.1, does not report an error, only if the high bit is
>> not set.
>
> This is hard to believe. I suspect your problem is elsewhere.
>
>> This occurs if one sets a Flex default rule
>> . { return yytext[0]; }
>> and the lexer finds a stray UTF-8 byte.
>
> I would say that here, you return a char (yytext[0]) with "a high bit set",
> on an architecture where char is signed, so you are actually returning a
> negative int (when the 8th bit is set). And for Bison, any negative token
> number stands for end-of-file.
Indeed, likely the case.
> You should actually write:
>
> . { return (unsigned char) yytext[0]; }
As 8-bit character tokens are not useful with UTF-8, I have replaced it with:
%token token_error "token error"
. { return my_parser::token::token_error; }
Please let me know if there is a better way to generate a parser error.