[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-wget] Segfault with WARC + CDX
From: |
Gijs van Tulder |
Subject: |
[Bug-wget] Segfault with WARC + CDX |
Date: |
Wed, 30 May 2012 23:13:54 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:12.0) Gecko/20120430 Thunderbird/12.0.1 |
Hi,
There's a bug in the warc_find_duplicate_cdx_record function. If you
provide a file with CDX records, Wget can segfault if a record is not
found in the CDX file. In fact, the deduplication now only works if
*every* new record can be found in the CDX index.
The segmentation fault is generated on these lines in src/warc.c:
hash_table_get_pair (warc_cdx_dedup_table, sha1_digest_payload, &key,
&rec_existing);
if (rec_existing != NULL && strcmp (rec_existing->url, url) == 0)
Other than the code expects hash_table_get_pair does not set
rec_existing to NULL if no record is found. So instead of checking for
NULL, the function should check if the return value of
hash_table_get_pair is non-zero:
int found = hash_table_get_pair (warc_cdx_dedup_table,
sha1_digest_payload,
&key, &rec_existing);
if (found && strcmp (rec_existing->url, url) == 0)
The attached patch makes this change. The deduplication works better.
Regards,
Gijs
0001-warc-Fix-segfault-if-CDX-record-is-not-found.patch
Description: Text Data
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Bug-wget] Segfault with WARC + CDX,
Gijs van Tulder <=