[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Binary recognition is to narrow [new suggestion]
From: |
Shigio YAMAGUCHI |
Subject: |
Re: Binary recognition is to narrow [new suggestion] |
Date: |
Sat, 21 Nov 2009 15:42:11 +0900 |
> Instead of counting characters over 127 the only test is that the first
> 511 bytes don't contain any of the controll characters 0-8, 14-31. No
> normal textfile would contain these.
>
> Assuming that binary data is random the probability of a incorrectly
> tagged binary would be
>
> ((256-8-18)/256)^511=.00000000000000000000000170726
>
> just testing 127 bits would be a bit to little
>
> ((256-8-18)/256)^127=.00000123868
This is a very interesting idea.
> One of the benefits is that this will correctly tag files in uni-code as
> text as well. Since those control characters never appears in uni-code
> either.
This is a big merit.
Most other multi-byte character set are sure to be designed like that,
I would like to make the 512 a customizable variable too.
$ gtags ... use conventional test
[File gtags.conf]
+----------------------------
|...
| :binarytest_size=512:... ----------------------------------+
| |
v
$ gtags ... use new test using the first n=512 bytes
After testing for a while, we can decide what we should do.
Thank you for your profitable consideration.
--
Shigio YAMAGUCHI <address@hidden>
PGP fingerprint: D1CB 0B89 B346 4AB6 5663 C4B6 3CA5 BBB3 57BE DDA3
Re: Binary recognition is to narrow., Hideki IWAMOTO, 2009/11/19