Re: [PATCH v2 4/9] bsd-user/syscall: Replace alloca() by g

On Thu, May 6, 2021 at 9:12 AM Eric Blake <eblake@redhat.com> wrote:

On 5/6/21 9:55 AM, Warner Losh wrote:

>>> Where is this freed? Also, alloca just moves a stack pointer, where
>> malloc
>>> has complex interactions. Are you sure that's a safe change here?
>>
>> It's freed any time the g_autofree variable goes out of scope (that's
>> what the g_autofree macro is for). Yes, the change is safe, although
>> you are right that switching to malloc is going to be a bit more
>> heavyweight than what alloca used. What's more, it adds safety: if
>> count was under user control, a user could pass a value that could cause
>> alloca to allocate more than 4k and accidentally mess up stack guard
>> pages, while malloc() uses the heap and therefore cannot cause stack bugs.
>>
>
> I'm not sure I understand that argument, since we're not compiling bsd-user
> with the stack-smash-protection stuff enabled, so there's no guard pages
> involved. The stack can grow quite large and the unmapped page at
> the end of the stack would catch any overflows. Since these allocations
> are on the top of the stack, they won't overflow into any other frames
> and subsequent calls are made with them already in place.

With alloca() on user-controlled size, the user can set up the size to
be larger than the unmapped guard page, at which point you CANNOT catch
the stack overflow because the alloca can skip the guard page and wrap
into other valid memory. Compiling with stack-smash-protection stuff
enabled will catch such a bad alloca(); but the issue at hand here is
not when stack-smash-protection is enabled, but the more common case
when it is disabled (at which point the only protection you have is the
guard page, but improper use of alloca() can bypass the guard page).
Not all alloca() arguments are under user control, but it is easier as a
matter of policy to blindly avoid alloca() than it is to audit which
calls have safe sizes and therefore will not risk user control bypassing
stack guards.

I'd forgotten that the gap pages are only a single page and things

may be mapped beyond that. For bsd-user, though, any issues that

this cause will be mitigated by the fact that it's running as the user

that we're emulating, so there's no privilege escalations that couldn't

already be hand-coded w/o bsd-user in the loop.

>
> malloc, on the other hand, involves taking out a number of mutexes
> and similar things to obtain the memory, which may not necessarily
> be safe in all the contexts system calls can be called from. System
> calls are, typically, async safe and can be called from signal handlers.
> alloca() is async safe, while malloc() is not. So changing the calls
> from alloca to malloc makes calls to system calls in signal handlers
> unsafe and potentially introducing buggy behavior as a result.

Correct, use of malloc() is not safe within signal handlers. But these
calls are not within signal handlers - or am _I_ missing something? Is
the point of *-user code to emulate syscalls that are reachable from
code installed in a signal handler, at which point introducing an
async-unsafe call to malloc in our emulation is indeed putting the
application at risk of a malloc deadlock?

My concern is that the user emulation has some elements that are

based on catching signals and interpreting them to control execution

in the guest code. There's a number of places where signals are queued

instead of run right away as well. I've not had time to dive into the

complex implementation to make sure that we'll not wind up in a situation

where we would be trying to call malloc from a signal handler directly.

I'll try to find an answer to that as well.

Ultimately, we're trading one maintenance headache (determining which
alloca() size calls might be under user control) for another
(determining that malloca() calls are not in a signal context), but the
latter is far more common such that we can use existing tooling to make
that conversion safely (both in the fact that the compiler has flags to
warn about alloca() usage, and in the fact that Coverity is good at
flagging improper uses of malloc() such as within a function reachable
from something installed in a signal handler). But I'm not familiar
enough with the bsd/linux-user code to know if your point about having
to use only async-safe functionalities is a valid limitation on our
emulation.

Yea, my faith in Coverity flagging these when there's trips between,

say, arm binary code and user code isn't quite as high as yours.

But for the real answer, I need to contact the original authors of

this part of the code (they are no longer involved day-to-day in

the bsd-user efforts) to see if this scenario is possible or not. If

it's easy to find out that way, we can either know this is safe to

do, or if effort is needed to make it safe. At present, I've seen

enough and chatted enough with others to be concerned that

the change would break proper emulation.

Warner

From:	Warner Losh
Subject:	Re: [PATCH v2 4/9] bsd-user/syscall: Replace alloca() by g_new()
Date:	Thu, 6 May 2021 09:30:49 -0600

Re: [PATCH v2 4/9] bsd-user/syscall: Replace alloca() by g_new()