poke-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Automatic Poke


From: Philippe Marchesseault
Subject: Re: Automatic Poke
Date: Tue, 14 Mar 2023 14:55:58 -0400

Thank you Mohammad-Rez and Jose for the pointers, I am looking into them. 

If we focus on the libPoke side, I have some questions:
1-My tool will need to open many ios that may not be backed directly by files. They could be backed by a file, a hexdump, a base64 snippet, etc... I started with the libPoke API, but I cannot find the API to populate the ios content with my bytes (from my C side). How should I go about that? I could set each byte with a Poke statement like: byte@0x00#B=0x34 but it looks wrong.
2-The search space might get very big quickly. Is there any chance libPoke can work in a multithreaded program?

If I understand your email correctly, you recommend using the Poke language directly to implement the permutations? I will look into that option, but I rely on other analysis libraries. I am also considering using poked instead of libPoke. I was thinking of using the Poke language only to describe the known structures, and the placeholders where the search/permutations should occur. I like your idea of using the constraints to detect problems.

- The data structures I am working with vary in length from a few bytes to a few megabytes. I need to manage from 10 to 100 different blobs that are mapped into Poke ios.
- The content usually follows a combination of list, struct and primitives.
- The primitives I work a lot with are ieee754 decimals. They are a good source of information because in most applications, they will represent a value with few decimals. (Think Latitudes/Longitudes) or even whole numbers. When you have a list of floats, most of them will have the same precision. 
- You can also have various integer encodings. Big/Little endian, variable length, fixed length, indexed, ...
- Various fixed decimal encodings
- Endianness sometimes varies within the blob (!)
- They may contain embedded files that I will identify with magic headers (think jpeg or png files). These embedded files can remain opaque for my purpose. 
- Rarely do I encounter blobs that are not byte aligned, but it happens. (canbus messages)

Thank you again for your feedback. And congratulations for creating a great tool!

Philippe


On Sun, Mar 12, 2023 at 8:06 AM Jose E. Marchesi <jemarch@gnu.org> wrote:

Hi Philippe.

> I am starting a new project that will make use of libPoke. It is a tool
> that will try to automatically reverse engineer the structure of a
> collection of opaque blobs. The idea is that if you have enough blobs with
> the same structure, you can have the computer try grammar permutations that
> fit the blob collection. The tool will generate a portion of Poke grammar
> for you to modify and improve on.
>
> The workflow I have in mind is this:
> 1-Organize your blobs by grammar.
> 2-Run the tool, it will generate grammar. Maybe with different choices?
> 3-Edit the generated grammar
> 4-Repeat step 2
>
> Nice Features:
> - User provides hints of data you know is in the blob (from log files,
> visual inspection, ...)
> - Automatically detect embedded files with magic headers
> -...
>
> I hope to generate the grammars in the Poke DSL, and use the libPoke VM to
> interpret and apply the grammars to multiple blobs and score how they
> perform. How should I go about this? Is this even a good idea? Does it make
> sense?
>
> Thank you for your feedback!

Interesting domain.

Using libpoke you can evaluate any Poke code.  This means that you
should be able to, for example, easily permutate fields of different
types with different constraints in them and rely on poke's data
integrity checking to determine whether the guess worked.

Additionally, you could also achieve run-time permutation by using
unions and labels.

I find the scoring part and the goal-oriented process very interesting.
We could maybe write a pickle with useful utilities for that kind of
things (discover.pk?).

Do you have some particular example of the kind of data structures of
these blobs?


reply via email to

[Prev in Thread] Current Thread [Next in Thread]