bug-global
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Binary recognition is to narrow.


From: Erik Jonsson
Subject: Binary recognition is to narrow.
Date: Wed, 18 Nov 2009 17:46:01 +0100 (CET)
User-agent: SquirrelMail/1.4.9a

Hi all,

The function is_binary is a bit naive and therefore tags to much as binary
content. It is quite common with names in the first 32 bytes and it's also
common with strange characters in names. Strange as in char > 127.

I have rewritten the function a bit and think you probably should
incorporate this fix or one of your own in a future release.

Here is my version...

static int
is_binary(const char *path)
{
        int ip;
        char buf[32];
        int i, c, size;
        int strange = 0;

        ip = open(path, O_RDONLY);
        if (ip < 0)
                die("cannot open file '%s' in read mode.", path);
        size = read(ip, buf, sizeof(buf));
        close(ip);
        if (size < 0)
                return 1;
        if (size >= 7 && locatestring(buf, "!<arch>", MATCH_AT_FIRST))
                return 1;
        for (i = 0; i < size; i++) {
                c = (unsigned char)buf[i];
                if (c <= 8)
                        return 1;
                if (c >= 14 && c < 32)
                        return 1;
                if (c > 128)
                        strange++;

        }
        if (((float)strange)/size > 0.3f)
            return 1;

        return 0;
}






reply via email to

[Prev in Thread] Current Thread [Next in Thread]