lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Validating paths with std::filesystem


From: Greg Chicares
Subject: Re: [lmi] Validating paths with std::filesystem
Date: Thu, 8 Oct 2020 17:49:50 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.12.0

On 2020-10-08 13:22, Vadim Zeitlin wrote:
> 
>  Sorry for yet another distracting question, but this is relatively
> important and, after just making 14d582d49 (Trap exceptions from filesystem
> library, 2020-10-07), you might already have the answer to this question in
> your L1 cache, so I'd like to ask it now, before it's evicted from it:

Sleeping pushed it out to L3; but I had meant to look back today
at the "/opt/lmi/src/lmi/Z:/etc/opt/lmi/default.ill" thing, anyway.

>  Do we need to validate paths at all when using std::filesystem?

TL;DR: your option 0 below.

There are some special cases where we want various kinds of validation,
but lmi's validate_directory() and validate_filepath() seem adequate
for those cases.

>  Currently using invalid characters in the path components results in an
> exception, as you've seen with the commit above (although I'm not sure how
> exactly can this exception be reproduced and if you could explain what do
> we need to do to see it, i.e. what should be in configurable_settings.xml,
> it would be useful for testing it with the new std::filesystem branch). But
> with std::filesystem there are no such exceptions at all, i.e. it simply
> doesn't do any validation of the path components, unlike Boost.Filesystem.

To reproduce:

$vim -p /opt/lmi/data/configurable_settings.xml
Cause the <print_directory> element to be as follows:
  <print_directory>/opt/lmi/src/lmi/Z:/etc/opt/lmi/default.ill</print_directory>

/opt/lmi/bin[0]$wine ./lmi_wx_shared --ash_nazg --data_path=/opt/lmi/data
boost::filesystem::path: invalid name "Z:" in path: 
"/opt/lmi/src/lmi/Z:/etc/opt/lmi/default.ill"
Warning: If possible, data directory 'Z:/opt/lmi/data' will be used for print 
files instead.
[configurable_settings.cpp : 178]

That reproduction still works today. Yesterday, lmi summarily closed,
whereas today it lets you continue; but the exception still prints.

>  The question is whether it's really a problem? In principle, constructing
> an invalid path in memory doesn't do any harm, as long as any attempt to
> actually use will result in an error -- as it will, under MSW. But this is
> different from the current behaviour and it could be argued that detecting
> the problem earlier is better than detecting it later.

The issue reproduced above is a special case. I'm not aware of any general
problem (else we'd have trapped boost::filesystem exceptions elsewhere).

In this case, I looked back at
  $git show 090d19a80e6a
and remembered the rationale. We made the "print" directory configurable.
Then we asked ourselves "what if it's configured invalidly?", and decided
to modify it forcibly, in class configurable_settings's ctor, displaying
a messagebox announcing the problem and explaining the resolution. That
seemed perfect at the time.

Shortly thereafter, we found an imperfection. If you entered an invalid
directory in the GUI, you'd get that messagebox the next time you opened
lmi...and the next time...and the next...but, unless you edited the xml
file by hand, its <print_directory> element would remain forever invalid.
(Except that a savvy GUI user could change it to anything valid, then
reload lmi and reset it to the same value that lmi forcibly inserted).
You couldn't change it in the "Ctrl-F" GUI because the xml contents
could never be seen there...because we had already forcibly corrected
the invalid path in the configurable_settings ctor. So we decided not
only to alter the data forcibly when reading it into memory, but also to
alter it on disk--which is why we have a ctor that calls save(), when
normally one would expect a ctor to load() but not save(). That, again,
seemed perfect at the time.

But what if the forcible correction doesn't correct the problem? That
never seemed to happen until I started using pc-linux-gnu as well as
mingw-w64 binaries in the same installation, which somehow caused the
weird string above.

Maybe the right answer is to step back and design something better.
Yesterday, I contented myself with making the error non-fatal.

> So do you think we
> should implement our own path validation on top of std::filesystem::path,
> just as we've already done with our own formatting, used instead of the
> default behaviour of the standard class?

I keep asking myself "to what end?", and I'm not finding any answer
to that question.

>  And if you do think it's worth doing, should we faithfully the partial
> path validation done by Boost.Filesystem version we currently use or should
> we actually perform the full validation according to MSW rules described at
> https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file and
> which I'll quote here in case this URL changes:
[...snip elaborate specifics...]
>         - Any other character that the target file system does not allow.

That last bullet point makes the grammar incomplete anyway, as you note...

> We could implement all of these rules with the exception of the
> filesystem-dependent one while Boost.Filesystem doesn't check for all of
> them (e.g. it makes no attempt to check for any of the reserved file
> names) and if we do implement validation, I think it would be better to be
> complete.

And we could also, alternatively, impose POSIX rules for POSIX platforms.

However, wise users don't violate pathname rules. And if unwise users
try to save an lmi input file as "¯\_(ツ)_/¯.cns" or "LPT1.ill", I'd
ask them to save a spreadsheet as "LPT1.xlsx" and see what happens.

>  So, as usual, there are several choices:
> 
> 0. Don't implement any validation at all.

Yes, that's my preference, too.

> 1. Implement minimal validation exactly reproducing the current behaviour.
> 2. Implement [as] full validation [as possible].
> 
>  I think (0) should actually be fine, but if not, then I'd rather do (2)
> than (1).

There's at least one other option:

  3. Do what they should have done, and transform their input into
something sensible. That's what lmi's orthodox_filename() tries to do;
we needed that to translate user filenames to something 'java' would
consider valid, for XSL-FO. But D.W.T.S.H.D. is difficult to get right
(e.g., the "/opt/lmi/src/lmi/Z:/etc/opt/lmi/default.ill" example above).



reply via email to

[Prev in Thread] Current Thread [Next in Thread]