bug-global
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gtags bug report: issue with S-JIS encoding files


From: Shigio YAMAGUCHI
Subject: Re: gtags bug report: issue with S-JIS encoding files
Date: Fri, 17 Nov 2023 16:38:33 +0900

Hello,
This cannot be considered a bug. Because Global does not
support multi-byte character code set.

[/usr/local/share/gtags/FAQ]
--------------------------------------------------------------
Q10. Does Global support multi-byte code set?
     Which character code set is supported?

A10. Global doesn't support multi-byte character code set yet.
     Global supports only ASCII and ASCII super-sets.
--------------------------------------------------------------

Shift-JIS "機能" consists of the following characters:

0x22    "
0x8b    (binary)
0x40    @
0x94    (binary)
0x5c    \
0x22    "

Since 0x5c ('\') quotes 0x22 ('"'), the parser considers the rest of
the source code as a long string. It is impossible to recognize it
as a failure because it is a correct process.

Regards,
Shigio

On Fri, Nov 17, 2023 at 11:46 AM Johnny Cheng <itainan.101@gmail.com> wrote:
Hi,

I found that if a file contains a specific CJK characters sequence, the parser seems fail to continue parsing the file.

See the follow example source file, let’s say `test.c` in encoding of Shift-JIS (cp932).

extern void printf(char * msg, ...);

 

void Foo() {

    char msg[] = "機能";

    printf(msg);

}

 

void Hello() {

    return;

}

(In case of mojibake due to encoding issue for Kanji, screenshots are also provided below.) 

  • What was occurred? (as is)

Now if you run `gtags` command in same folder follow by `global -f test.c`, you only get one tag, which is `Foo`, but `Hello` shall also be found.

  • What did you expect from it?

However, if I modify the source a little bit, then tag `Hello` is found. See variations I tried in the table below.


Cases Table

Cases

Source Code Screenshot

global -f test.c

Bad Case

image001.png

(Encoding is cp932, or shift-jis)

Foo                 4 test.cpp         void Foo() {

Good Cases

<image001.png>

(Encoding is utf8)

 

image002.png

(Encoding is cp932, or shift-jis)

 

image003.png

(Encoding is cp932, or shift-jis)

Foo                 4 test.cpp         void Foo() {

Hello               9 test.cpp         void Hello() {


My environment

OS

Windows 11 Enterprise 22H2 64bit Build 22621.2428

gtags --version

gtags (Global) 6.6.9

Powered by Berkeley DB 1.85.

Copyright (c) 1996-2022 Tama Communications Corporation

License GPLv3+: GNU GPL version 3 or later http://www.gnu.org/licenses/gpl.html

This is free software; you are free to change and redistribute it.

There is NO WARRANTY, to the extent permitted by law.


Possible Solutions

  • Add a command line encoding option to read the file properly.
  • Find out why such file cannot be fully parsed, ignore such special error, and continue parsing.

Also, if such case happens, at least print out some error message to inform user that some files are not fully parsed.

 

 

Johnny Cheng



--
Shigio YAMAGUCHI <shigio@gnu.org>
PGP fingerprint: 
26F6 31B4 3D62 4A92 7E6F  1C33 969C 3BE3 89DD A6EB

reply via email to

[Prev in Thread] Current Thread [Next in Thread]