[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3baf8814-0a9a-4de0-b568-62d241dbba0e@nvidia.com>
Date: Mon, 21 Oct 2024 12:25:08 -0700
From: John Hubbard <jhubbard@...dia.com>
To: David Hildenbrand <david@...hat.com>, Lorenzo Stoakes
<lorenzo.stoakes@...cle.com>
CC: Andrew Morton <akpm@...ux-foundation.org>, Suren Baghdasaryan
<surenb@...gle.com>, "Liam R . Howlett" <Liam.Howlett@...cle.com>, "Matthew
Wilcox" <willy@...radead.org>, Vlastimil Babka <vbabka@...e.cz>, "Paul E .
McKenney" <paulmck@...nel.org>, Jann Horn <jannh@...gle.com>,
<linux-mm@...ck.org>, <linux-kernel@...r.kernel.org>, Muchun Song
<muchun.song@...ux.dev>, Richard Henderson <richard.henderson@...aro.org>,
Ivan Kokshaysky <ink@...assic.park.msu.ru>, Matt Turner <mattst88@...il.com>,
Thomas Bogendoerfer <tsbogend@...ha.franken.de>, "James E . J . Bottomley"
<James.Bottomley@...senpartnership.com>, Helge Deller <deller@....de>, "Chris
Zankel" <chris@...kel.net>, Max Filippov <jcmvbkbc@...il.com>, Arnd Bergmann
<arnd@...db.de>, <linux-alpha@...r.kernel.org>, <linux-mips@...r.kernel.org>,
<linux-parisc@...r.kernel.org>, <linux-arch@...r.kernel.org>, Shuah Khan
<shuah@...nel.org>, Christian Brauner <brauner@...nel.org>,
<linux-kselftest@...r.kernel.org>, Sidhartha Kumar
<sidhartha.kumar@...cle.com>, Jeff Xu <jeffxu@...omium.org>, "Christoph
Hellwig" <hch@...radead.org>, <linux-api@...r.kernel.org>
Subject: Re: [PATCH v2 3/5] mm: madvise: implement lightweight guard page
mechanism
On 10/21/24 10:23 AM, David Hildenbrand wrote:
> On 21.10.24 19:15, Lorenzo Stoakes wrote:
>> On Mon, Oct 21, 2024 at 07:05:27PM +0200, David Hildenbrand wrote:
>>> On 20.10.24 18:20, Lorenzo Stoakes wrote:
>>>> Implement a new lightweight guard page feature, that is regions of userland
>>>> virtual memory that, when accessed, cause a fatal signal to arise.
>>>>
>>>> Currently users must establish PROT_NONE ranges to achieve this.
>>>>
>>>> However this is very costly memory-wise - we need a VMA for each and every
>>>> one of these regions AND they become unmergeable with surrounding VMAs.
>>>>
>>>> In addition repeated mmap() calls require repeated kernel context switches
>>>> and contention of the mmap lock to install these ranges, potentially also
>>>> having to unmap memory if installed over existing ranges.
>>>>
>>>> The lightweight guard approach eliminates the VMA cost altogether - rather
>>>> than establishing a PROT_NONE VMA, it operates at the level of page table
>>>> entries - poisoning PTEs such that accesses to them cause a fault followed
>>>> by a SIGSGEV signal being raised.
>>>>
>>>> This is achieved through the PTE marker mechanism, which a previous commit
>>>> in this series extended to permit this to be done, installed via the
>>>> generic page walking logic, also extended by a prior commit for this
>>>> purpose.
>>>>
>>>> These poison ranges are established with MADV_GUARD_POISON, and if the
>>>> range in which they are installed contain any existing mappings, they will
>>>> be zapped, i.e. free the range and unmap memory (thus mimicking the
>>>> behaviour of MADV_DONTNEED in this respect).
>>>>
>>>> Any existing poison entries will be left untouched. There is no nesting of
>>>> poisoned pages.
>>>>
>>>> Poisoned ranges are NOT cleared by MADV_DONTNEED, as this would be rather
>>>> unexpected behaviour, but are cleared on process teardown or unmapping of
>>>> memory ranges.
>>>>
>>>> Ranges can have the poison property removed by MADV_GUARD_UNPOISON -
>>>> 'remedying' the poisoning. The ranges over which this is applied, should
>>>> they contain non-poison entries, will be untouched, only poison entries
>>>> will be cleared.
>>>>
>>>> We permit this operation on anonymous memory only, and only VMAs which are
>>>> non-special, non-huge and not mlock()'d (if we permitted this we'd have to
>>>> drop locked pages which would be rather counterintuitive).
>>>>
>>>> Suggested-by: Vlastimil Babka <vbabka@...e.cz>
>>>> Suggested-by: Jann Horn <jannh@...gle.com>
>>>> Suggested-by: David Hildenbrand <david@...hat.com>
>>>> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
>>>> ---
>>>> arch/alpha/include/uapi/asm/mman.h | 3 +
>>>> arch/mips/include/uapi/asm/mman.h | 3 +
>>>> arch/parisc/include/uapi/asm/mman.h | 3 +
>>>> arch/xtensa/include/uapi/asm/mman.h | 3 +
>>>> include/uapi/asm-generic/mman-common.h | 3 +
>>>> mm/madvise.c | 168 +++++++++++++++++++++++++
>>>> mm/mprotect.c | 3 +-
>>>> mm/mseal.c | 1 +
>>>> 8 files changed, 186 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/uapi/asm/mman.h
>>>> index 763929e814e9..71e13f27742d 100644
>>>> --- a/arch/alpha/include/uapi/asm/mman.h
>>>> +++ b/arch/alpha/include/uapi/asm/mman.h
>>>> @@ -78,6 +78,9 @@
>>>> #define MADV_COLLAPSE 25 /* Synchronous hugepage collapse */
>>>> +#define MADV_GUARD_POISON 102 /* fatal signal on access to range */
>>>> +#define MADV_GUARD_UNPOISON 103 /* revoke guard poisoning */
>>>
>>> Just to raise it here: MADV_GUARD_INSTALL / MADV_GUARD_REMOVE or sth. like
>>> that would have been even clearer, at least to me.
Yes, I think so.
>>
>> :)
>>
>> It still feels like poisoning to me because we're explicitly putting
>> something in the page tables to make a range have different fault behaviour
>> like a HW poisoning, and 'installing' suggests backing or something like
>> this, I think that's more confusing.
>
> I connect "poison" to "SIGBUS" and "corrupt memory state", not to "there is nothing and there must not be anything". Thus my thinking. But again, not the end of the world, just wanted to raise it ...
"Poison" is used so far for fairly distinct things, and I'd very much like
to avoid extending its meaning to guard pages. It makes the other things
less unique, and it misses a naming and classification opportunity.
"Guard" and "guard page" are fairly unique names. That's valuable.
thanks,
--
John Hubbard
Powered by blists - more mailing lists