bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#36431: Crash in marker.c:337


From: Stefan Monnier
Subject: bug#36431: Crash in marker.c:337
Date: Tue, 02 Jul 2019 15:44:07 -0400
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux)

>> - we insert the new bytes at the beginning of the gap, in order to have
>>   room to grow if there are more bytes than expected, and also in case
>>   there are fewer bytes than expected (in which case we'd otherwise
>>   have to move the bytes we just read so they properly end at the end
>>   of the gap).
>
> Also, you will see in insert-file-contents that it supports quitting
> while reading a huge file, and also the REPLACE argument, where we
> detect the same contents at beginning and end of the file and the
> buffer.

Right, tho the end result is the same (e.g. when we quit, we can either
abort the whole operation and trow away the bytes we read, or we can
keep going with the bytes we did read which is simply another case of
reading less than expected).

>> - decode_coding_gap wants the new input bytes to be at the end of the
>>   gap, so that we can put the decoded chars at the beginning of the gap
>>   and as one grows the other shrinks, so we don't need space for "IN +
>>   OUT" bytes but only for "OUT" bytes.  Is that right (I'm trying to
>>   find some comment or other evidence that this is the case, but
>>   haven't found it yet).
>
> That's right.  The comment you are looking for (well, at least part of
> it) is in the commentary before decode_coding, where it explains the
> semantics of CODING->src_pos.  You will see at the beginning of
> decode_coding_gap how it sets things up according to that hairy
> protocol.

IIUC you're referring to this comment:

    Decode the data at CODING->src_object into CODING->dst_object.
    CODING->src_object is a buffer, a string, or nil.
    CODING->dst_object is a buffer.
 
    If CODING->src_object is a buffer, it must be the current buffer.
    In this case, if CODING->src_pos is positive, it is a position of
    the source text in the buffer, otherwise, the source text is in the
    gap area of the buffer, and CODING->src_pos specifies the offset of
    the text from GPT (which must be the same as PT).  If this is the
    same buffer as CODING->dst_object, CODING->src_pos must be
    negative.
 
    If CODING->src_object is a string, CODING->src_pos is an index to
    that string.
 
    If CODING->src_object is nil, CODING->source must already point to
    the non-relocatable memory area.  In this case, CODING->src_pos is
    an offset from CODING->source.
 
    The decoded data is inserted at the current point of the buffer
    CODING->dst_object.

but this doesn't say if the bytes are to be found originally at the
beginning of the gap or its end, nor whether they finish at the beginning or
the end, nor what happens in the middle and why it's been designed this way.

Is the patch below correct?

>> IOW, it should be possible to optimize the common case by reading the
>> new bytes into the end of the gap to avoid moving everything in the
>> common case (if the number of bytes read is different from originally
>> expected, we'll have to do extra work, but for the common case where we
>> know the file size upfront and it doesn't change while we read it, this
>> will save us some work).
>> 
>> But the effort is probably not worth the trouble: a memmove of a few
>> gigabytes costs relatively little compared to the cost of actually
>> decoding those same gigabytes.
>
> Right.  Also, there are the other subtle issues with quitting, the
> REPLACE argument, special files, etc.

I think the crash-example I sent can probably be made less esoteric by
making it use "quit" instead of catch/throw.  I'm beginning to think
that when we quit (or signal an error) from within
set-auto-coding-function, we simply shouldn't revert the buffer
to multibyte.


        Stefan


diff --git a/src/coding.c b/src/coding.c
index 5b9bfa17dd..218d69e2e7 100644
--- a/src/coding.c
+++ b/src/coding.c
@@ -7322,11 +7322,16 @@ produce_annotation (struct coding_system *coding, 
ptrdiff_t pos)
 
    If CODING->src_object is a buffer, it must be the current buffer.
    In this case, if CODING->src_pos is positive, it is a position of
-   the source text in the buffer, otherwise, the source text is in the
-   gap area of the buffer, and CODING->src_pos specifies the offset of
-   the text from GPT (which must be the same as PT).  If this is the
-   same buffer as CODING->dst_object, CODING->src_pos must be
-   negative.
+   the source text in the buffer, otherwise, the source text is at the
+   end of the gap area of the buffer, and CODING->src_pos specifies the
+   offset of the text from the end of the gap (which must be the at PT).
+   If this is the same buffer as CODING->dst_object, CODING->src_pos must
+   be negative.
+   
+   When the text is taken from the gap, it needs to be at the end of
+   the gap so that we can produce the decoded text at the beginning of
+   the gap: this way, as the output grows, the input shrinks, so we only
+   need to allocate enough space for `max(IN, OUT)` instead of `IN + OUT`.
 
    If CODING->src_object is a string, CODING->src_pos is an index to
    that string.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]