[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Idiom for guarded byte streams
From: |
Jose E. Marchesi |
Subject: |
Re: Idiom for guarded byte streams |
Date: |
Thu, 23 Dec 2021 21:55:23 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux) |
> Hi people!
>
> So today I was writing a pickle for the BER encoding of ASN-1 and came
> with an interesting problem: how to best denote a sequence of bytes
> ended by two consecutive 0UB bytes.
>
> Sounds simple enough... but I really had to think for a while before
> coming to this:
>
> type GuardedBytes =
> struct
> {
> type Datum =
> union
> {
> byte[2] pair : pair[1] != 0UB;
> byte single : single != 0UB;
> };
>
> Datum[] data;
> byte[2] end : end == [0UB,0UB];
> };
>
> It is easy to get an array with the bytes (at the cost of copying them)
> with a function or a method like this:
>
> method get_bytes = byte[]:
> {
> var bytes = byte[]();
> for (d in data)
> try bytes += d.pair;
> catch if E_elem { bytes += [d.single]; }
> return bytes;
> }
>
> And given a mapped GuardedBytes, it is easy to map other stuff in its
> contents using data'offset and data'size.
>
> If the end-of-data marker was three bytes instead of two, I guess I
> would need to add a `triplet' alternative to the Datum union. And so
> on...
>
> Can you think about a better solution for this kind of structures? In
> that case please share, I'm very interested :)
Hm, actually the `end' field constraint can use the shorter form:
type GuardedBytes =
struct
{
type Datum =
union
{
byte[2] pair : pair[1] != 0UB;
byte single : single != 0UB;
};
Datum[] data;
byte[2] end == [0UB,0UB];
};
This is how the structure looks like:
(poke) .mem
(poke) byte[] @ 0#B = ['a','b','c','d','e']
(poke) GuardedBytes @ 0#B
GuardedBytes {
data=[Datum {
pair=[0x61UB,0x62UB]
},Datum {
pair=[0x63UB,0x64UB]
},Datum {
single=0x65UB
}],
end=[0x0UB,0x0UB]
}
And using the method:
(poke) (GuardedBytes @ 0#B).get_bytes
[0x61UB,0x62UB,0x63UB,0x64UB,0x65UB]