discuss-gnuradio
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Getting GPS data into stream


From: Marcus Müller
Subject: Re: Getting GPS data into stream
Date: Thu, 4 May 2023 09:39:53 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0

Hey Marcus,

as you say, for a lot of science you don't get high rates – so I'm really less worried about that. More worried about Excel interpreting some singular data point as date; or, as soon as we involve textual data, all the funs with encodings, quoting/delimiting/escaping… (not to mention that an Excel set to German might interpret different things as numbers than a Northern American one).

I wish there was just one good CSV standard that tools adhered to. Alas, that's not the case, and especially Excel has a habit of autoconverting input and losing data at that point. So, looking for an alternative that has these well-defined constraints and isn't as focused on hierarchical data (JSON, YAML, XML), far too verbose but excellent to query with command line tools (XML), completely impossible to correctly parse as human or parser in its full beauty (YAML)… Just some tabular data notation that's textual, appendable, and not a party of guesswork for the reading tool.
We could just canonalize calling all our files

marcusdata.utf8.textalwaysquoted.iso8601.headerspecifies_fieldname_parentheses_type.csv

but even that wouldn't solve the issue of excel seeing an unquoted 12.2021 and deciding the field being about christmases past.

So, maybe we just do some rootless JSON format that starts with a SigMF object describing the file and its columns, and then basically is just a sequence of JSON arrays

[ 1.212e-1, 0, "Müller", 24712388823 ]
[ 1.444e-2, 1, "📡🔭  \"👽\"!", 11111111111 ]
[ 2.0115-1, 0, "Cygnus-B", 0 ]

(I'm not even sure that's not valid JSON; gut feeling tells me we should be putting [] around the whole document, but we don't want that for streaming purposes. ECMA-404 doesn't seem to *forbid* it.)

That way, we get the metadata in a format that's easy to skip by simpler tools, but trivial to parse with the right tools (I've grown to like `jq`), and the data into a well-defined format. Sure, you can't dump that into Excel, still, but you know what, if it comes down to it, we can have a python script that takes these files and actually converts them to valid XLSX without the misconversion footguns, and that same tool could also be run in a browser for those having a hard time executing python on their machines.

Cheers,
Marcus
On 03.05.23 23:05, Marcus D. Leech wrote:
On 03/05/2023 16:51, Marcus Müller wrote:

Do agree, but really don't like CSV, too underspecified a format, too many ways that comes back to bite you (aside from a thousand SDR users writing emails that their PC can't keep up with writing a few MS/s of CSV…)
I like CSV because you can hand your data files to someone who doesn't have a complete suite of astrophysics tools, and they
   can slurp it into Excel and play with it.


How important is plain-textness in your applications?
I (and many others in my community) tend to throw ad-hoc tools at data from ad-hoc experiments.  In the past, I used a lot   of AWK to post-process data, and these days, I use a lot of Python.    Text-based formats lend themselves well to this kind   of processing.  Rates are quite low, typically.  Like logging an integrated power spectrum a few times a minute, for example.

There are other observing modes where text-based formats aren't quite so obvious--like pulsar observations, where filterbank   outputs might be recorded at 10s of kHz, and then post-processed with any of a number of pulsar tools.

In all of this, part of the "science" is extracted in "real-time" and part in post-processing.



Best,
Marcus



Attachment: smime.p7s
Description: S/MIME Cryptographic Signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]