[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0b9c122a-c05a-b3df-c69f-85f520294adc@redhat.com>
Date: Thu, 24 Aug 2023 13:25:41 +0200
From: David Hildenbrand <david@...hat.com>
To: Catalin Marinas <catalin.marinas@....com>
Cc: Alexandru Elisei <alexandru.elisei@....com>, will@...nel.org,
oliver.upton@...ux.dev, maz@...nel.org, james.morse@....com,
suzuki.poulose@....com, yuzenghui@...wei.com, arnd@...db.de,
akpm@...ux-foundation.org, mingo@...hat.com, peterz@...radead.org,
juri.lelli@...hat.com, vincent.guittot@...aro.org,
dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
mgorman@...e.de, bristot@...hat.com, vschneid@...hat.com,
mhiramat@...nel.org, rppt@...nel.org, hughd@...gle.com,
pcc@...gle.com, steven.price@....com, anshuman.khandual@....com,
vincenzo.frascino@....com, eugenis@...gle.com, kcc@...gle.com,
hyesoo.yu@...sung.com, linux-arm-kernel@...ts.infradead.org,
linux-kernel@...r.kernel.org, kvmarm@...ts.linux.dev,
linux-fsdevel@...r.kernel.org, linux-arch@...r.kernel.org,
linux-mm@...ck.org, linux-trace-kernel@...r.kernel.org
Subject: Re: [PATCH RFC 00/37] Add support for arm64 MTE dynamic tag storage
reuse
On 24.08.23 13:06, David Hildenbrand wrote:
> On 24.08.23 12:44, Catalin Marinas wrote:
>> On Thu, Aug 24, 2023 at 09:50:32AM +0200, David Hildenbrand wrote:
>>> after re-reading it 2 times, I still have no clue what your patch set is
>>> actually trying to achieve. Probably there is a way to describe how user
>>> space intents to interact with this feature, so to see which value this
>>> actually has for user space -- and if we are using the right APIs and
>>> allocators.
>>
>> I'll try with an alternative summary, hopefully it becomes clearer (I
>> think Alex is away until the end of the week, may not reply
>> immediately). If this still doesn't work, maybe we should try a
>> different implementation ;).
>>
>> The way MTE is implemented currently is to have a static carve-out of
>> the DRAM to store the allocation tags (a.k.a. memory colour). This is
>> what we call the tag storage. Each 16 bytes have 4 bits of tags, so this
>> means 1/32 of the DRAM, roughly 3% used for the tag storage. This is
>> done transparently by the hardware/interconnect (with firmware setup)
>> and normally hidden from the OS. So a checked memory access to location
>> X generates a tag fetch from location Y in the carve-out and this tag is
>> compared with the bits 59:56 in the pointer. The correspondence from X
>> to Y is linear (subject to a minimum block size to deal with some
>> address interleaving). The software doesn't need to know about this
>> correspondence as we have specific instructions like STG/LDG to location
>> X that lead to a tag store/load to Y.
>>
>> Now, not all memory used by applications is tagged (mmap(PROT_MTE)).
>> For example, some large allocations may not use PROT_MTE at all or only
>> for the first and last page since initialising the tags takes time. The
>> side-effect is that of these 3% DRAM, only part, say 1% is effectively
>> used. Some people want the unused tag storage to be released for normal
>> data usage (i.e. give it to the kernel page allocator).
>>
>> So the first complication is that a PROT_MTE page allocation at address
>> X will need to reserve the tag storage at location Y (and migrate any
>> data in that page if it is in use).
>>
>> To make things worse, pages in the tag storage/carve-out range cannot
>> use PROT_MTE themselves on current hardware, so this adds the second
>> complication - a heterogeneous memory layout. The kernel needs to know
>> where to allocate a PROT_MTE page from or migrate a current page if it
>> becomes PROT_MTE (mprotect()) and the range it is in does not support
>> tagging.
>>
>> Some other complications are arm64-specific like cache coherency between
>> tags and data accesses. There is a draft architecture spec which will be
>> released soon, detailing how the hardware behaves.
>>
>> To your question about user APIs/ABIs, that's entirely transparent. As
>> with the current kernel (without this dynamic tag storage), a user only
>> needs to ask for PROT_MTE mappings to get tagged pages.
>
> Thanks, that clarifies things a lot.
>
> So it sounds like you might want to provide that tag memory using CMA.
>
> That way, only movable allocations can end up on that CMA memory area,
> and you can allocate selected tag pages on demand (similar to the
> alloc_contig_range() use case).
>
> That also solves the issue that such tag memory must not be longterm-pinned.
>
> Regarding one complication: "The kernel needs to know where to allocate
> a PROT_MTE page from or migrate a current page if it becomes PROT_MTE
> (mprotect()) and the range it is in does not support tagging.",
> simplified handling would be if it's in a MIGRATE_CMA pageblock, it
> doesn't support tagging. You have to migrate to a !CMA page (for
> example, not specifying GFP_MOVABLE as a quick way to achieve that).
>
Okay, I now realize that this patch set effectively duplicates some CMA
behavior using a new migrate-type. Yeah, that's probably not what we
want just to identify if memory is taggable or not.
Maybe there is a way to just keep reusing most of CMA instead.
Another simpler idea to get started would be to just intercept the first
PROT_MTE, and allocate all CMA memory. In that case, systems that don't
ever use PROT_MTE can have that additional 3% of memory.
You probably know better how frequent it is that only a handful of
applications use PROT_MTE, such that there is still a significant
portion of tag memory to be reused (and if it's really worth optimizing
for that scenario).
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists