bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#11519: "Wrong type argument: characterp" building custom-deps while


From: Eli Zaretskii
Subject: bug#11519: "Wrong type argument: characterp" building custom-deps while boostrapping
Date: Thu, 24 May 2012 19:22:46 +0300

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: rms@gnu.org,  handa@gnu.org,  schwab@linux-m68k.org,  lekktu@gmail.com,  
> 11519@debbugs.gnu.org
> Date: Wed, 23 May 2012 16:07:05 -0400
> 
> >> > Which other places use C pointers to buffer text and call functions
> >> > that can allocate memory?
> >> IIUC any place that uses STRING_CHAR_AND_LENGTH on buffer text is
> >> vulnerable to the problem.
> > That's not true.  As long as you access buffer text through character
> > position, you are safe.
> 
> Right, some of those uses might be safe, indeed.  Of course it's not
> only STRING_CHAR_AND_LENGTH but STRING_CHAR_ADVANCE as well, together
> with FETCH_* macros which use those, etc...

No, FETCH_* macros are safe, because they accept buffer positions,
fetch only one character at a time, and call STRING_CHAR_* _after_
they access the buffer.

> Grepping for those macros shows they're used at *many* places, and I'd
> be amazed if re_search is the only place where we don't go through the
> BYTE_POS_ADDR rigmarole.
> 
> Let's see ...hmmm... yup, set-buffer-multibyte is another example,
> find_charsets_in_text yet another, and I'm not even trying hard.
> Just grep for "STRING_CHAR_" and see for yourself.

I didn't mean STRING_CHAR_*.  I agree that they should be fixed not to
have such unexpected side effect.  They should be read-only operations.
As a temporary band-aid for Emacs 24.1, I suggest the change below.

What I meant is the code besides STRING_CHAR_* macros.  I don't think
you will find code that manipulates C pointers to buffer text and
calls functions that can allocate memory.

> >> But on other platforms where we use mmap, we do suffer from this
> >> fragmentation, and yet it doesn't seem to be a real source of problem.
> > I don't know enough about mmap to answer that.  I vaguely recollect
> > that mmap avoids such fragmentation as an inherent feature, but I may
> > be wrong.
> 
> No, fragmentation is a property of the address space, so without
> relocation you can't avoid it.

I asked Gerd Möllmann, who wrote the mmap-based buffer allocation
code, about this.  That code originally resided on ralloc.c and was
meant to replace the sbrk-based implementation.  So I would expect
that the issue of relocation was considered back then, and I hope Gerd
will remember why the mmap-based code was considered good enough to
replace ralloc.c.

> > I find it hard to believe that going through system malloc on
> > MS-Windows will let us use buffers as large as 1.5 GB (on a 32-bit
> > machine).  To achieve this today, we reserve a 2GB contiguous chunk of
> > address space at startup, and then commit and uncommit parts of it as
> > needed (see w32heap.c).  ralloc.c has an important part in this
> > arrangement.
> 
> You mean that Windows's system malloc library has a memory that's too
> fragmented to be able to allocate a single 1.5G chunk?  Why?

This has nothing to do with Windows APIs, so you are well equipped to
reason about this ;-)

You said "malloc", so I took an issue with the MS C runtime
implementation of malloc.  Since all the other implementations suffer
from fragmentation, there's no reason to believe that the MS
implementation avoids that danger.  A general-purpose function that
cannot move buffers behind the scenes cannot possibly avoid that.
Doing better was the original reason for writing ralloc.c.

If you meant to bypass malloc and use the Windows memory-allocation
APIs, such as VirtualAlloc, directly, then that's what we already do
in w32heap.c, which implements an emulation of sbrk that is good
enough for Emacs.  The fact that gmalloc and ralloc are used on top of
that are simply to avoid reinventing the wheel.

We could easily turn off buffer relocation in ralloc.c for good, by
fixing the value of use_relocatable_buffers at zero.  But I'm worried
that this would cause Emacs on Windows run out of memory (or act as if
it were) faster.  For example, in an Emacs session that runs for 2
weeks and has a 200MB working set, I just visited a 1.3GB file, went
to its middle and typed "C-u 30000 d" to insert 30K characters.  Emacs
had no problems enlarging the buffer, although it has only 1.9GB of
reserved memory space on that machine, so if it needed to allocate
another 1.3GB+30KB buffer (due to fragmentation, which is a certainty
after such a long time), it would have failed, I presume.

Yet another alternative is to emulate mmap on Windows using the
equivalent Windows API.  But that requires a research comparing mmap
features we need and use on Posix platforms with the features offered
by Windows, to make sure this is at all feasible.  Such a research
would need to be done by someone who knows enough about mmap and is
willing to do the job.  Do we have such a person on board?  And then
there's the implementation and testing.  Doesn't sound like an
efficient use of our time and energy.

Are there other alternatives?

Here's the band-aid I propose for emacs-24, to make the STRING_CHAR_*
macros safe:

=== modified file 'src/charset.c'
--- src/charset.c       2012-01-19 07:21:25 +0000
+++ src/charset.c       2012-05-24 16:14:05 +0000
@@ -1641,6 +1641,12 @@ maybe_unify_char (int c, Lisp_Object val
     return c;
 
   CHECK_CHARSET_GET_CHARSET (val, charset);
+#ifdef REL_ALLOC
+  /* The call to load_charset below can allocate memory, whcih screws
+     callers of this function through STRING_CHAR_* macros that hold C
+     pointers to buffer text, if REL_ALLOC is used.  */
+  r_alloc_inhibit_buffer_relocation (1);
+#endif
   load_charset (charset, 1);
   if (! inhibit_load_charset_map)
     {
@@ -1656,6 +1662,9 @@ maybe_unify_char (int c, Lisp_Object val
       if (unified > 0)
        c = unified;
     }
+#ifdef REL_ALLOC
+  r_alloc_inhibit_buffer_relocation (0);
+#endif
   return c;
 }
 

=== modified file 'src/ralloc.c'
--- src/ralloc.c        2012-05-23 17:32:28 +0000
+++ src/ralloc.c        2012-05-24 16:16:14 +0000
@@ -1204,7 +1204,12 @@ r_alloc_reset_variable (POINTER *old, PO
 void
 r_alloc_inhibit_buffer_relocation (int inhibit)
 {
-  use_relocatable_buffers = !inhibit;
+  if (use_relocatable_buffers < 0)
+    use_relocatable_buffers = 0;
+  if (inhibit)
+    use_relocatable_buffers++;
+  else if (use_relocatable_buffers > 0)
+    use_relocatable_buffers--;
 }
 
 







reply via email to

[Prev in Thread] Current Thread [Next in Thread]