Re: Additional "in" operator for fields being lists of strings

bug-recutils

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Additional "in" operator for fields being lists of strings

From:	Jose E. Marchesi
Subject:	Re: Additional "in" operator for fields being lists of strings
Date:	Wed, 05 Aug 2020 17:15:24 +0200
User-agent:	Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux)

>> Marcin Szewczyk <marcin.szewczyk@wodny.org> wrote:
>> > One question comes to mind. Should the != operator mean:
>> > 1. at least one enum value different than  or
>> > 2. none of enum tokens may be equal to the specified value.
>> > [...]
>> > Should a normalization step be taken?
>> > Like:
>> >
>> >     Device: plumbus
>> >     Tag: plubus
>> >     Tag: dinglebop fleeb
>> >     Tag: grumbo
>> >
>> > to (only for enum fields):
>> >
>> >     Device: plumbus
>> >     Tag: plubus dinglebop fleeb grumbo
>> 
>> I would say we clearly want 2. for the semantics of != when applied to
>> enumerated fields.  Normalizing is indeed necessary.
>> 
>> > Currently, I cannot see any exclusion operator. For multi-field strings
>> > neither 'Y!="y3"' nor '!(Y="y3")' will exclude a record if there is any
>> > Y field that matches these conditions. So the second semantic variant
>> > would give something new and interesting but also incompatible with the
>> > current string semantics. [...]
>> 
>> Hmm, I don't think don't need to keep the existing string semantics for
>> enums, because in properly conformed data each Tag are restricted to
>> have only one of the valid values, i.e.:
>> 
>> --- foo.rec ---
>> %rec: Device
>> %type: Tag enum dinglebop fleeb plubus grumbo
>> 
>> Device: plumbus
>> Tag: dinglebop fleeb plubus grumbo
>> --- end of foo.rec ---
>> 
>> $ recfix foo.rec
>> foo.rec:5: error: invalid enum value.
>
> But if the user has always used the properly structured variant
> (accepted by recfix), ie.:
>
> --- foo.rec ---
> %rec: Device
> %type: Tag enum dinglebop fleeb plubus grumbo
>
> Device: plumbus
> Tag: dinglebop
> Tag: fleeb
> Tag: plubus
> Tag: grumbo
> --- end of foo.rec ---
>
> executing `recsel -e 'Tag != "fleeb"' foo.rec` would change output from
> returning the record to returning nothing (assuming that both expanded
> and non-expanded forms should mean the same thing).

Hm I see what you mean.  Yes, the semantics of these operators would
change in that case.

> Which form of using multiple enum values should be canonical:
> - SFMV: single field with multiple values (non-expanded) or
> - MFSV: multiple fields with single values (expanded)?
>
> Implementing SFMV would probably be quite easy and based on strtok() in
> the ops switch.
>
> The MFSV form would probably require a serious change in `rec_sex_eval`
> implementation[1] to give semantics 2. of the `!=` operator.

Ideally both forms should be allowed, and neither form shall be
preferred to the other.

> Do you think that normalization should be:
> - explicit and applied permanently eg. by recfix or
> - implicit and calculated just for SEX evaluation?

I would say implicit.  It is important for recutils to allow the users
to use comfortable ways to write the stuff... since the data is intended
to be edited by hand often.

Also, the form of written data should not change.  That's why things
like comments, or \ finalized lines, are preserved.

> Or maybe a trick should be implemented:
> - official representation of serialized records should be MFSV (explicit
>   normalization) but
> - for ease of implementation internal representation used for SEX
>   evaluation should be SFMV (implicit normalization)?

The internal representation can be massaged in order to ease processing,
but in any case it should be able to write the data back the way it was
read.  (See above.)

> One more thing that comes to mind is: should access to enum tokens
> (multiple values per field) be implemented (if allowed) over
> `rec_record_get_field_by_name` or should `rec_record_get_field_by_name`
> itself be capable of indexing in the following implicit normalization
> manner:
>
>     Tag: plubus
>     Tag: dinglebop fleeb
>     Tag: grumbo
>
>     Tag[0] = plubus
>     Tag[1] = dinglebop
>     Tag[2] = fleeb
>     Tag[3] = grumbo

I think the implicit normalization makes sense.  In the record above,
there are four instances of Tag values, not three.

Ditto for indexing, that is useful, but not when it comes to count
fields (i.e. the record still has three fields, not four.)

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Additional "in" operator for fields being lists of strings, Jose E. Marchesi <=

Prev by Date: New Swedish PO file for 'recutils' (version 1.8)
Previous by thread: New Swedish PO file for 'recutils' (version 1.8)
Index(es):
- Date
- Thread