wget-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

wget | wget incorrectly checks filename length when mirroring files. (#6


From: Paul Ferrell (@pflarr)
Subject: wget | wget incorrectly checks filename length when mirroring files. (#6)
Date: Tue, 19 Oct 2021 15:33:18 +0000


Paul Ferrell created an issue: https://gitlab.com/gnuwget/wget/-/issues/6



>From url.c
```c
/* Calculate the length of the output string.  e-b is the input
   string length.  Each quoted char introduces two additional
   characters in the string, hence 2*quoted.  */
outlen = (e - b) + (2 * quoted);
# ifdef WINDOWS
  max_length = MAX_PATH;
# else
  max_length = get_max_length(dest->base, dest->tail, _PC_NAME_MAX);
# endif
  max_length -= CHOMP_BUFFER;
  if (max_length > 0 && outlen > max_length)
    {
      logprintf (LOG_NOTQUIET, "The destination name is too long (%d), reducing 
to %d\n", outlen, max_length);

      outlen = max_length;
    }
```

When mirroring (and possibly in other situations) the output path is a relative 
path, not a single file name. `get_max_length` uses `pathconf` to get the max 
length of what can be placed at the given location. Unfortunately, there are 
two distinct limits that need to be checked, not one. 

1. The length of the overall relative path, which can be checked with 
`_PC_PATH_MAX`. 
2. The length of each component of the path, wich can be checked with 
`_PC_NAME_MAX`. 

By checking the whole relative path against just `_PC_NAME_MAX` you're limiting 
the entire relative path to the length limit for a single component of that 
path on the given system. For example, on a typical x86_64 Ubuntu box with an 
XFS filesystem, _PC_NAME_MAX is about 256 bytes, but _PC_PATH_MAX is 4096 
bytes. 

I ran into this when recursively mirroring a site with some particularly long 
filenames in a deep tree. Mirroring the same tree with wget2 doesn't seem to 
have any issues.

-- 
Reply to this email directly or view it on GitLab: 
https://gitlab.com/gnuwget/wget/-/issues/6
You're receiving this email because of your account on gitlab.com.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]