[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <10e453df-6911-e40d-8758-66caf9c59dbe@redhat.com>
Date: Wed, 19 Apr 2023 12:00:26 -0400
From: Joe Mario <jmario@...hat.com>
To: Matthew Wilcox <willy@...radead.org>,
Waiman Long <longman@...hat.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, Barry Marson <bmarson@...hat.com>,
Rafael Aquini <aquini@...hat.com>
Subject: Re: [PATCH] mm/mmap: Map MAP_STACK to VM_STACK
On 4/19/23 11:09 AM, Matthew Wilcox wrote:
> On Wed, Apr 19, 2023 at 11:07:04AM -0400, Waiman Long wrote:
>> On 4/18/23 23:46, Matthew Wilcox wrote:
>>> On Tue, Apr 18, 2023 at 09:16:37PM -0400, Waiman Long wrote:
>>>> 1) App runs creating lots of threads.
>>>> 2) It mmap's 256K pages of anonymous memory.
>>>> 3) It writes executable code to that memory.
>>>> 4) It calls mprotect() with PROT_EXEC on that memory so
>>>> it can subsequently execute the code.
>>>>
>>>> The above mprotect() will fail if the mmap'd region's VMA gets merged with
>>>> the VMA for one of the thread stacks. That's because the default RHEL
>>>> SELinux policy is to not allow executable stacks.
>>> By the way, this is a daft policy. The policy you really want is
>>> EXEC|WRITE is not allowed. A non-writable stack is useless, so it's
>>> actually a superset of your current policy. Forbidding _simultaneous_
>>> write and executable is just good programming. This way, you don't need
>>> to care about the underlying VMA's current permissions, you just need
>>> to do:
>>>
>>> if ((prot & (PROT_EXEC|PROT_WRITE)) == (PROT_EXEC|PROT_WRITE))
>>> return -EACCESS;
>>
>> I am not totally sure if the application changes the VMA to read-only first.
>> Even if it does that, it highlights another possible issue when an anonymous
>> VMA is merged with a stack VMA. Either the mprotect() to write-protect the
>> VMA will fail or the application will segfault if it writes stuff to the
>> stack. This particular issue is not related to SELinux. It provides another
>> good idea why we should avoid merging stack VMA to anonymous VMA.
>
> mprotect will split the VMA into two VMAs, one that is
> PROT_READ|PROT_WRITE and one the is PROT_READ|PROT_EXEC.
>
But in this case, the latter still has PROT_WRITE.
This was reported by a large data analytics customer. They started getting infrequent random crashes in code they haven't touched in 10 years.
One of the threads in their program mmaps a large region using PROT_READ|PROT_WRITE, and that region just happens to be merged with the thread's stack.
Then they copy a small snipit of code to a location somewhere within that mapped region. For the one page that contains that code, they mprotect it to PROT_READ|PROT_WRITE|PROT_EXEC. I recall they're still reading and writing data elsewhere on that page.
Joe
Powered by blists - more mailing lists