[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] no_proxy domain matching
From: |
Tim Rühsen |
Subject: |
Re: [PATCH] no_proxy domain matching |
Date: |
Wed, 27 Nov 2019 10:50:54 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.2.2 |
On 11/26/19 4:00 PM, Tomas Hozza wrote:
>
>
> On 20. 11. 2019 18:47, Tim Rühsen wrote:
>> On 20.11.19 12:41, Tomas Hozza wrote:
>>> On 7. 11. 2019 21:30, Tim Rühsen wrote:
>>>> On 07.11.19 15:21, Tomas Hozza wrote:
>>>>> Hi.
>>>>>
>>>>> In RHEL-8, we ship a wget version that suffers from bug fixed by [1]. The
>>>>> fix resolved issue with matching subdomains when no_proxy domain
>>>>> definition was prefixed with dot, e.q. "no_prefix=.mit.edu". As part of
>>>>> backporting the fix to RHEL, I wanted to create an upstream test for
>>>>> no_prefix functionality. However I found that there is still one corner
>>>>> case, which is not handled by the current upstream code and honestly I'm
>>>>> not sure what is the intended domain matching behavior in that case. Man
>>>>> page is also not very specific in this regard.
>>>>>
>>>>> The corner case is as follows:
>>>>> - no_proxy=.mit.edu
>>>>> - download URL is e.g. "http://mit.edu/file1"
>>>>>
>>>>> In this case the proxy settings are used, because domains don't match due
>>>>> to the leftmost dot in no_proxy domain definition. This is either
>>>>> intended or corner case that was not considered. One could argue, that if
>>>>> the no_proxy is set to ".mit.edu", then leftmost dot means that the proxy
>>>>> settings should not apply only to subdomains of "mit.edu", but proxy
>>>>> settings should still apply to "mit.edu" domain itself. From my point of
>>>>> view, after reading wget man page, I don't think that the leftmost dost
>>>>> in no_proxy definition has any special meaning.
>>>>
>>>> Hello Tomas,
>>>>
>>>> hard to decide how to handle this. I personally would like to see a
>>>> match with curl's behavior (see https://github.com/curl/curl/issues/1208).
>>>>
>>>> Given the docs from GNU emacs, you are right. "no_proxy=.mit.edu" means
>>>> "mit.edu and subdomains" are excluded from proxy settings.
>>>> (see https://www.gnu.org/software/emacs/manual/html_node/url/Proxies.html)
>>>>
>>>> The caveat with emacs' behavior is that you cannot exclude just all
>>>> subdomains of mit.edu without mit.edu itself. Effectively, that creates
>>>> a corner case that can't be handled at all. (but if curl also does it
>>>> that way, let's go for it).
>>>>
>>>> Maybe you can find out about the current no_proxy behavior of typical
>>>> and wide-spread tools (regarding leftmost dot) !? Once we have that
>>>> information, we can make a confident decision.
>>>>
>>>> Regards, Tim
>>>
>>> Hi Tim.
>>>
>>> It took me some time to go through the current situation and to be honest,
>>> it is kind of a mess. While each tool handles the no_proxy env a little bit
>>> differently, there are some similarities. Nevertheless I was not able to
>>> find any standard.
>>>
>>> curl's behavior:
>>> - "no_proxy=.mit.edu"
>>> - will match the domain and subdomains e.g. "www.mit.edu" or
>>> "www.subdomain.mit.edu"
>>> - will match the host "mit.edu"
>>> - "no_proxy=mit.edu"
>>> - will match the domain and subdomains e.g. "www.mit.edu" or
>>> "www.subdomain.mit.edu"
>>> - will match the host "mit.edu"
>>> - downside: can not match only the host; can not match only the domain and
>>> subdomains
>>>
>>> current wget's behavior:
>>> - "no_proxy=.mit.edu"
>>> - will match the domain and subdomains e.g. "www.mit.edu" or
>>> "www.subdomain.mit.edu"
>>> - will NOT match the host "mit.edu"
>>> - "no_proxy=mit.edu"
>>> - will match the domain and subdomains e.g. "www.mit.edu" or
>>> "www.subdomain.mit.edu"
>>> - will match the host "mit.edu"
>>> - downside: can not match only the host
>>>
>>> wget's behavior with proposed patch:
>>> - "no_proxy=.mit.edu"
>>> - will match the domain and subdomains e.g. "www.mit.edu" or
>>> "www.subdomain.mit.edu"
>>> - will match the host "mit.edu"
>>> - "no_proxy=mit.edu"
>>> - will match the domain and subdomains e.g. "www.mit.edu" or
>>> "www.subdomain.mit.edu"
>>> - will match the host "mit.edu"
>>> - downside: can not match only the host; can not match only the domain and
>>> subdomains
>>> - it would be consistent with curl's behavior
>>>
>>> emacs's behavior:
>>> - "no_proxy=.mit.edu"
>>> - will match the domain and subdomains e.g. "www.mit.edu" or
>>> "www.subdomain.mit.edu"
>>> - will match the host "mit.edu"
>>> - "no_proxy=mit.edu"
>>> - will NOT match the domain and subdomains e.g. "www.mit.edu" or
>>> "www.subdomain.mit.edu"
>>> - will match the host "mit.edu"
>>> - downside: can not match only subdomains
>>>
>>> python httplib2's behavior:
>>> - "no_proxy=.mit.edu"
>>> - will match the domain and subdomains e.g. "www.mit.edu" or
>>> "www.subdomain.mit.edu"
>>> - will match the host "mit.edu"
>>> - "no_proxy=mit.edu"
>>> - will NOT match the domain and subdomains e.g. "www.mit.edu" or
>>> "www.subdomain.mit.edu"
>>> - will match the host "mit.edu"
>>> - downside: can not match only subdomains
>>>
>>> To sum it up. Each approach has some downsides. Given the change that I
>>> provided, wget's behavior would be consistent with curl's behavior. However
>>> it will have more downsides that it currently has, specifically it will
>>> loose the ability to not to match the host, but only domain and subdomains.
>>> Emacs's behavior is similar to Python's httplib2 behavior regarding the
>>> leftmost dot.
>>>
>>> Honestly I have a soft preference for keeping the current wget's behavior.
>>> But I admit that making the behavior consistent with curl's behavior makes
>>> sense. Please let me know how you would like to proceed.
>>>
>>> To make the behavior consistent with curl, the previously attached changes
>>> should be OK. If you find those new conditions too complicated, I can try
>>> to rethink it, but I already tried to make it as little complicated as
>>> possible and at the same time trying to not completely rewrite the function.
>>>
>>> If you'll decide to keep the current behavior, I'll modify the test that I
>>> added to cope with the behavior.
>>
>> Great work, Tomas !
>>
>> Wow, didn't think it's so messed up :-(
>>
>> We should definitely document your results, e.g. in the wget manual.
>>
>> If we keep the current behavior, we could adjust it with a new option or
>> a new env variable 'WGET_NO_PROXY_MODE'. Which could take well-defined
>> values like 'curl', 'emacs', 'wget' (the default), and maybe a new one
>> ('strict') with none of the detected downsides.
>>
>> Looks a bit over-engineered, but it means that wget can easily adopt to
>> existing environments. And the code seems pretty straight forward.
>>
>> Let's see if some more opinions come in.
>>
>> Regards, Tim
>
> Yes, 'WGET_NO_PROXY_MODE' is probably the safest option with regard to
> backward compatibility. And if needed, the default could later change. One
> downside of allowing values like 'curl' or 'emacs' is that these would
> probably require also handling of asterisk "*" in hostnames, as those tools
> do.
>
> Nevertheless for now, I at least modified the new test to cover current wget
> behavior. There were also minor changes to the test framework in order to
> make the test possible. Patches are attached. Please let me know if they need
> any changes.
Please add your test to testenv/Makefile.am (best directly before adding
$(METALINK_TESTS)). The test currently doesn't work for me, since host
`working1.localhost` can't be resolved.
Maybe you can add `testenv/certs/wgethosts`, similar as
`tests/certs/wgethosts`, but with working1.localhost and
working2.localhost included. You have to set env variable HOSTALIASES to
that file name (glibc feature).
In patch 0001 there is a typo 'retuns'.
Please don't make your lines in the commit messages longer than 79 chars.
Not sure if and when I implement WGET_NO_PROXY_MODE. We try to make no
more changes to wget 1.x except bug fixes. All new stuff should only go
into wget2.
Regards, Tim
signature.asc
Description: OpenPGP digital signature