linux-kernel - Re: [PATCH] mm/mmap: Map MAP_STACK to VM

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <10e453df-6911-e40d-8758-66caf9c59dbe@redhat.com>
Date:   Wed, 19 Apr 2023 12:00:26 -0400
From:   Joe Mario <jmario@...hat.com>
To:     Matthew Wilcox <willy@...radead.org>,
        Waiman Long <longman@...hat.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, Barry Marson <bmarson@...hat.com>,
        Rafael Aquini <aquini@...hat.com>
Subject: Re: [PATCH] mm/mmap: Map MAP_STACK to VM_STACK



On 4/19/23 11:09 AM, Matthew Wilcox wrote:
> On Wed, Apr 19, 2023 at 11:07:04AM -0400, Waiman Long wrote:
>> On 4/18/23 23:46, Matthew Wilcox wrote:
>>> On Tue, Apr 18, 2023 at 09:16:37PM -0400, Waiman Long wrote:
>>>>   1) App runs creating lots of threads.
>>>>   2) It mmap's 256K pages of anonymous memory.
>>>>   3) It writes executable code to that memory.
>>>>   4) It calls mprotect() with PROT_EXEC on that memory so
>>>>      it can subsequently execute the code.
>>>>
>>>> The above mprotect() will fail if the mmap'd region's VMA gets merged with
>>>> the VMA for one of the thread stacks.  That's because the default RHEL
>>>> SELinux policy is to not allow executable stacks.
>>> By the way, this is a daft policy.  The policy you really want is
>>> EXEC|WRITE is not allowed.  A non-writable stack is useless, so it's
>>> actually a superset of your current policy.  Forbidding _simultaneous_
>>> write and executable is just good programming.  This way, you don't need
>>> to care about the underlying VMA's current permissions, you just need
>>> to do:
>>>
>>> 	if ((prot & (PROT_EXEC|PROT_WRITE)) == (PROT_EXEC|PROT_WRITE))
>>> 		return -EACCESS;
>>
>> I am not totally sure if the application changes the VMA to read-only first.
>> Even if it does that, it highlights another possible issue when an anonymous
>> VMA is merged with a stack VMA. Either the mprotect() to write-protect the
>> VMA will fail or the application will segfault if it writes stuff to the
>> stack. This particular issue is not related to SELinux. It provides another
>> good idea why we should avoid merging stack VMA to anonymous VMA.
> 
> mprotect will split the VMA into two VMAs, one that is
> PROT_READ|PROT_WRITE and one the is PROT_READ|PROT_EXEC.
> 

But in this case, the latter still has PROT_WRITE.  

This was reported by a large data analytics customer.  They started getting infrequent random crashes in code they haven't touched in 10 years.

One of the threads in their program mmaps a large region using PROT_READ|PROT_WRITE, and that region just happens to be merged with the thread's stack.

Then they copy a small snipit of code to a location somewhere within that mapped region. For the one page that contains that code, they mprotect it to PROT_READ|PROT_WRITE|PROT_EXEC.  I recall they're still reading and writing data elsewhere on that page.

Joe