bug-global
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Tony.RE: GNU Global Parsing Suffixless Files Patch


From: Cooper, Anthony
Subject: Tony.RE: GNU Global Parsing Suffixless Files Patch
Date: Wed, 5 Oct 2016 16:28:14 +0100

SECURITY CLASSIFICATION: OFFICIAL




> -----Original Message-----
> From: address@hidden [mailto:address@hidden On Behalf Of
> Shigio YAMAGUCHI
> Sent: 05 October 2016 02:56
> To: Cooper, Anthony
> Cc: address@hidden
> Subject: Re: GNU Global Parsing Suffixless Files Patch
>
> > Q: I'm assuming any glob patterns would implicitly be anchored to
> > the end of the path string (as they are in bash)?
>
> Yes. In ctags, '(<pattern>)' matches to file names not path names, like 
> '.c.h'.

:-)

>
> > Yes I know... In fact after originally looking at global and ctags I
> > thought how potentially dangerous ctags's --force-language option
> > was and that's why I called my extension suffixless_langmap.
> > My intention was  that this option wouldn't force anything but
> > instead provide a default language when there wasn't a file suffix.
> >
> > For example, in project include directories you quite often get
> > other
>
> > artefacts like .c, .texi, .html (I know that these get excluded) and
> > .inc files (MSVS). If the --force-language override option is used
> > on those include directories then files with a suffix don't
> > automatically get handled the way they should. Instead you'd
> > possibly have to put in additional more specific --force-language
> > overrides to reinstate default behaviour for certain extensions. E.g.:
>
> You are right. It is a important point. You should be able to finely control.
>
> How about using a 'file list' instead of a direct path.
>
> --language-force=<lang>:<file list>
>
> File list is a file which lists file names.
>
> e.g.
> [cppfiles]
> +-----------------------------
> |include/c++/4.8/algorithm
> |include/c++/4.8/bits/stl_algo.h
> |include/c++/5.1/algorithm
>
> $ gtags --language-force=cpp:cppfiles
>
> You can use find(1) command to make a file list.
> This will satisfy your request too, because find(1) has both glob and
> regex. :)
>
> New priority:
> [high]
> 1. --language-force=<lang>:<file list> 2. langmap=<lang>:<suffix or
> glob pattern list> [low]
>
> What do you think?

An interesting idea :-). Upon reflection I'm actually quite happy with what you 
proposed yesterday - sorry perhaps I should have been clearer at the time...

At one stage I thought of extending the gtags file format to include an 
optional language override, it's similar to your file list idea... However as I 
used global more I started to shy away from that as it's high maintenance and 
would break automatic recursive update on file addition.

For example: If you're working on a project that has non-standard file naming 
conventions and/or has particular type types in odd places (like my texi/inc 
example) then if you used a file list/type approach you'd need to update that 
each time you added another suffixless header file. However with your 
path/specific glob approach and priority scheme(let's call this prio-path-glob):

        --force-language=cpp:include --force-language=c:.c 
--force-language=makefile:([Mm]akefile) ...

This does the job quite nicely. You wouldn't need to update any config unless 
there was a new file type that needed to be excluded (unlikely within an 
existing project). You could just run global -u and update as normal. If given 
the file list feature I would avoid using it because of the need to maintain 
it. A couple of the really cool things about gtags is you just type gtags and 
it does it all for you (unless you have non-standard stuff) and global -u picks 
up updates and new files.

The only `upsides' my `explicitly select the overridden files with RE' approach 
has over yours was:
        1) RE patterns are more powerful and succinct - would deal with cases 
we haven't thought of.
        2) You're explicitly selecting what you want to override.

So 1 is overkill as agreed (the prio-path-glob approach will meet all the 
requirements we can think of) so that's gone; and as for 2 if prio-path-glob 
were used instead you'd probably only need to have a couple of file type 
override directives in there anyway, as the skip list will weed out most 
exceptions anyway. So upon reflection I feel that a file type list would add 
extra complexity that isn't needed. If you have a specific requirement for it 
yourself then could we have it in addition to what you proposed yesterday 
please?

So as I understand it we would have --language-force=<Language>:<Specifier> 
where <Specifier> would be one of:
        *x- Existing langmap style extension list e.g. `.c.h'.
        *x- File only glob pattern e.g. `([Mm]akefile)'.
        *x - A mixture of the above two e.g. `.c.h([Mm]akefile)(*.inc)'
          x - A dumb path substring match (possibly with the caveat that it 
must start with ./ or / to distinguish it from the above?) e.g. '/include/'.
         ? - A bare name of a file list in the config e.g. `cppfiles'?
Those entries marked with * would also apply to langmap config entries as well. 
Those entries marked with x meet my requirements/wishlist.

With those additional features marked with x and your proposed priority list as 
detailed yesterday I would say that would give maximum benefit without too much 
extra cost (famous last words!).

One additional feature/thought is that one could have a language type of auto 
that would mean do normal file type detection. Thus the above example would 
read:

        --force-language=cpp:include --force-language=auto:.c 
--force-language=auto:([Mm]akefile) ...

This is similar to ctags usage of the `auto' language designator. Only an idea 
that just occurred to me, I'm not deliberately trying to add more work - honest!

On a completely different note...

Do you have that patch for ctags to give out references? And if applied could 
gtags make use of them? If so would you be happy to send that to me? I 
understand if it's awkward etc.

BTW I was fiddling around mixing parsers today in the same run (using internal 
gtags and then falling back on ctags for unsupported files). Very easy to do 
and it works so well :-). Many thanks.

Regards,

Tony.
>
> > If/when someone comes to work on this, my patch is probably still
> > worth a look as 70-80% of it is done with respect to the proposal above.
> > Either way some of it may be of use.
>
> Thank you so much.
>
> Regards,
> Shigio
>
>
> 2016-10-05 4:09 GMT+09:00 Cooper, Anthony
> <address@hidden>:
>
>
>       SECURITY CLASSIFICATION: OFFICIAL
>
>
>       Good morning :-)
>
>       > -----Original Message-----
>       > From: address@hidden [mailto:address@hidden On Behalf Of
>       > Shigio YAMAGUCHI
>       > Sent: 04 October 2016 01:19
>       > To: Cooper, Anthony
>       > Cc: address@hidden
>       > Subject: Re: GNU Global Parsing Suffixless Files Patch
>       >
>       > Good morning :)
>       > I understood regex version of --language-force is very powerful.
>       > However, it seems too powerful for us to manage it completely.
>       >
>       > How about releasing the real path version and '()' syntax first?
>       > It's simple and easy to understand, and is similar to ctags.
>       > At the stage now, no one can judge whether regex version is needed,
>       > because no one has used even the real path version.
>       >
>       > >        E.g. If I had:
>       > >        Default: \
>       > >        :GTAGS_OPTIONS=--force-language=yacc\:(sys\$): \
>       > >                               
> --force-language='cpp\:(^\\./Microsoft Visual)':
>       > >
>       > > Then this would say match all files ending in sys and treat them as
>       > > yacc and any suffixless files with a path starting with `./Microsoft
>       > > Visual' are to be treated as cpp files.
>       >
>       > Using the real path version and '()' syntax, that is realized
> easily like
> this:
>       >
>       >         [gtags.conf]
>       >         :langmap=yacc\:(*sys):
>       >
>       >         $ gtags --force-language='yacc:Microsoft Visual'
>       >
>
>       A very minor point: the `Microsoft Visual' examples are different as
> my RE only matches at the head of the path.
>
>       I guess I get nervous putting in more limited matching mechanisms
> inside an option that is designed to override the normal default/sane
> behaviour; I would like to be as precise as possible in my overrides.
> Also most would use the simple substring match, but regex's are there
> for edge cases that we haven't thought of. Most devs are comfortable with REs.
>
>       Q: I'm assuming any glob patterns would implicitly be anchored to the
> end of the path string (as they are in bash)?
>
>       > > One thing to note, made in the man page and help text, is this
>       > > switch won't affect any files with a suffux, which some people
> might
>       > > expect with `force' in the name of the switch.
>       >
>       > In ctags, --language-force option ignores suffixes. I'd like to follow
>       > ctags method.
>
>       Yes I know... In fact after originally looking at global and ctags I
> thought how potentially dangerous ctags's --force-language option was
> and that's why I called my extension suffixless_langmap. My intention
> was  that this option wouldn't force anything but instead provide a
> default language when there wasn't a file suffix.
>
>       For example, in project include directories you quite often get other
> artefacts like .c, .texi, .html (I know that these get excluded) and
> .inc files (MSVS). If the --force-language override option is used on
> those include directories then files with a suffix don't automatically
> get handled the way they should. Instead you'd possibly have to put in
> additional more specific -- force-language overrides to reinstate
> default behaviour for certain extensions. E.g.:
>
>               --force-language=cpp:include --force-language=c:.c --force-
> language=makefile:([Mm]akefile) ...
>
>       However with REs you could be more selective in your initial --force-
> language setting and avoid the subsequent detailed extension overrides.
>
>               --force-language='cpp:(/include/(.*/)*[^/.]?$)'
>
>       In a glob pattern as far as I'm aware there's no way of saying
> `select files not containing a period' :-(.
>
>       >
>       > $ ctags --language-force=c test.php # test.php is treated as C
> source
>       > file
>       >
>       > How about setting the following priority?
>       > (This --language-force is the real path version)
>       >
>       > [high]
>       > 1. --language-force=<lang>:<file>
>       > 2. --language-force=<lang>:<directory>
>       > 3. langmap=<lang>:<suffix or glob pattern list> [low]
>       >
>       > e.g.
>       > [gtags.conf]
>       > :langmap=c\:.x([Mm]ake):
>       >
>       > $ gtags --language-force=perl:dir1 --language-force=php:php.x
>       >
>       > ./
>       >  |-dir1/
>       >  |  |-test.x    => perl by --language-force=perl:dir1
>       >  |  |-Make      => perl by --language-force=perl:dir1
>       >  |  |-php.x     => php by --language-force=php:php.x
>       >  |-dir2
>       >     |-test.x    => c by langmap=c\:.x([Mm]ake):
>       >     |-Make => c by langmap=c\:.x([Mm]ake):
>       >
>
>       The priorities look fine to me.
>
>       Whilst I think it's a _bit_ of a pity not to have REs for the reasons
> pointed out above, none of the issues are insurmountable with a glob
> implementation, just possibly less obvious? But more consistent as you
> say with ctags. So as you say start off with globs and see :-).
>
>       Many thanks for being so helpful and constructive, it is appreciated
> as is Global.
>
>       If/when someone comes to work on this, my patch is probably still
> worth a look as 70-80% of it is done with respect to the proposal above.
> Either way some of it may be of use.
>
>       Regards,
>
>       Tony.
>
>
>       > > Did you correctly receive the new patch for 6.5.5?
>       >
>       > Sorry but I did not read that at all. I would like to discuss about
>       > the specification not about the implementation.
>       >
>       > Regards,
>       > Shigio
>       >
>       >
>       > 2016-10-03 21:34 GMT+09:00 Cooper, Anthony
>       > <address@hidden
> <mailto:address@hidden> >:
>       >
>       >
>       >       SECURITY CLASSIFICATION: OFFICIAL
>       >
>       >
>       >       Good morning :-) (See comments below)
>       >
>       >       > -----Original Message-----
>       >       > From: address@hidden [mailto:address@hidden
> On Behalf Of
>       >       > Shigio YAMAGUCHI
>       >       > Sent: 01 October 2016 00:17
>       >       > To: Cooper, Anthony
>       >       > Cc: address@hidden
>       >       > Subject: Re: GNU Global Parsing Suffixless Files Patch
>       >       >
>       >       > Before implementation, I would like to make clear the
> specification.
>       >       >
>       >       > > Assorted projects I've come across have include and Include
> (the
>       >       > > example below is a trivial but a real one relating to MS-
> Windows)
>       >       > > and some even have include dirs names XInclude or
> something
>       > similar
>       >       > > (can't remember the project now, wasn't X11 but probably
> an X
>       > client).
>       >       >
>       >       > Let me ask a couple of questions, please.
>       >       >
>       >       >
>       >       > Q1: Is the following (1) and (2) equal?
>       >       >
>       >       >         (1) --language-force='cpp:([Ii]nclude)'
>       >       >         (2) --language-force='cpp:include' --language-
>       > force='cpp:Include'
>       >       >
>       >       >     If so, you think that (1) is better than (2) since it is 
> shorter?
>       >
>       >       Yes precisely. Although perhaps I gave a rather weak example. A
>       > stronger case would be when differentiating between say:
>       >               /usr/include/C++/4.8/algorithm
>       >               /usr/include/C++/5.1/algorithm
>       >               /usr/include/C++/..../algorithm
>       >       And:
>       >               ./project/helper-programs/algorithm/sort/qsort  <- 
> script or
>       > binary
>       >
>       >       Or to match:
>       >               .../include/sys
>       >       But not:
>       >               .../include/system_errors
>       >
>       >       If I wanted to catch the first set of files in both example 
> without
>       > tripping up over the second then I could do --language-
>       > force=cpp:(algorithm\$)  and --language-force=cpp:(sys\$).
>       >
>       >       >
>       >       > Q2: Does (1) above match to the followings?
>       >       >
>       >       >         ./XXXincludeYYY/
>       >       >         ./XXXincludeYYY.php
>       >       >         ./project/include/release/
>       >       >         ./project/include/release/test.php
>       >
>       >       Yes. The matching is a dumb substring or regex match on the
> path
>       > string available around where decide_lang() is called. No anchoring
> by
>       > default.
>       >
>       >       >
>       >       > Q3: Regex '^' and '$' are available? If so, what does they
> mean?
>       >
>       >       Yes they are. `^' would mean start matching at the beginning of
> the
>       > path and `$' would mean match the end of the path (particularly
> useful
>       > for just picking up matches against a file name as directories in
>       > themselves aren't processed beyond traversal). File globbing
> doesn't
>       > make ^ and $ available and I have come across other
>       > programs/situations where I have been frustrated by this for want
> of a regex. E.g. If I had:
>       >               Default: \
>       >               :GTAGS_OPTIONS=--force-language=yacc\:(sys\$): \
>       >                                      
> --force-language='cpp\:(^\\./Microsoft Visual)':
>       >       Then this would say match all files ending in sys and treat them
> as
>       > yacc and any suffixless files with a path starting with `./Microsoft
>       > Visual' are to be treated as cpp files.
>       >
>       >       One thing to note, made in the man page and help text, is this
> switch
>       > won't affect any files with a suffux, which some people might
> expect
>       > with `force' in the name of the switch.
>       >
>       >       Did you correctly receive the new patch for 6.5.5?
>       >
>       >       Many thanks once again :-).
>       >
>       >       Regards Tony.
>       >       >
>       >       > Regards,
>       >       > Shigio
>       >       >
>       >       > --
>       >       >
>       >       > Shigio YAMAGUCHI <address@hidden>
>       >       > PGP fingerprint: D1CB 0B89 B346 4AB6 5663  C4B6 3CA5 BBB3
> 57BE
>       > DDA3
>       >       >
>       >       >
>       >       >
>       >
> __________________________________________________________
>       >
>       >       > ____________
>       >       > This email has been scanned by the Symantec Email
> Security.cloud
>       > service.
>       >       > For more information please visit
> http://www.symanteccloud.com
>       >       >
>       >
> __________________________________________________________
>       >       > ____________
>       >
>       >
>       >
> ****************************************************
>       > ************************
>       >       Communications with GCHQ may be monitored and/or recorded
>       >       for system efficiency and other lawful purposes. Any views or
>       >       opinions expressed in this e-mail do not necessarily reflect
> GCHQ
>       >       policy.  This email, and any attachments, is intended for the
>       >       attention of the addressee(s) only. Its unauthorised use,
>       >       disclosure, storage or copying is not permitted.  If you are not
> the
>       >       intended recipient, please notify address@hidden
>       >
>       >       This information is exempt from disclosure under the Freedom
> of
>       >       Information Act 2000 and may be subject to exemption under
>       >       other UK information legislation. Refer disclosure requests to
>       >       GCHQ on 01242 221491 ext 30306 (non-secure) or email
>       >       address@hidden
>       >
>       >
> ****************************************************
>       > ************************
>       >
>       >
>       >
>       >
>       >
>       >
>       > --
>       >
>       > Shigio YAMAGUCHI <address@hidden>
>       > PGP fingerprint: D1CB 0B89 B346 4AB6 5663  C4B6 3CA5 BBB3 57BE
> DDA3
>       >
>       >
>       >
> __________________________________________________________
>       > ____________
>       > This email has been scanned by the Symantec Email Security.cloud
> service.
>       > For more information please visit http://www.symanteccloud.com
>       >
> __________________________________________________________
>       > ____________
>
>
>
>
>
>
> --
>
> Shigio YAMAGUCHI <address@hidden>
> PGP fingerprint: D1CB 0B89 B346 4AB6 5663  C4B6 3CA5 BBB3 57BE DDA3
>
>
> __________________________________________________________
> ____________
> This email has been scanned by the Symantec Email Security.cloud service.
> For more information please visit http://www.symanteccloud.com
> __________________________________________________________
> ____________


reply via email to

[Prev in Thread] Current Thread [Next in Thread]