lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0b9c122a-c05a-b3df-c69f-85f520294adc@redhat.com>
Date:   Thu, 24 Aug 2023 13:25:41 +0200
From:   David Hildenbrand <david@...hat.com>
To:     Catalin Marinas <catalin.marinas@....com>
Cc:     Alexandru Elisei <alexandru.elisei@....com>, will@...nel.org,
        oliver.upton@...ux.dev, maz@...nel.org, james.morse@....com,
        suzuki.poulose@....com, yuzenghui@...wei.com, arnd@...db.de,
        akpm@...ux-foundation.org, mingo@...hat.com, peterz@...radead.org,
        juri.lelli@...hat.com, vincent.guittot@...aro.org,
        dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
        mgorman@...e.de, bristot@...hat.com, vschneid@...hat.com,
        mhiramat@...nel.org, rppt@...nel.org, hughd@...gle.com,
        pcc@...gle.com, steven.price@....com, anshuman.khandual@....com,
        vincenzo.frascino@....com, eugenis@...gle.com, kcc@...gle.com,
        hyesoo.yu@...sung.com, linux-arm-kernel@...ts.infradead.org,
        linux-kernel@...r.kernel.org, kvmarm@...ts.linux.dev,
        linux-fsdevel@...r.kernel.org, linux-arch@...r.kernel.org,
        linux-mm@...ck.org, linux-trace-kernel@...r.kernel.org
Subject: Re: [PATCH RFC 00/37] Add support for arm64 MTE dynamic tag storage
 reuse

On 24.08.23 13:06, David Hildenbrand wrote:
> On 24.08.23 12:44, Catalin Marinas wrote:
>> On Thu, Aug 24, 2023 at 09:50:32AM +0200, David Hildenbrand wrote:
>>> after re-reading it 2 times, I still have no clue what your patch set is
>>> actually trying to achieve. Probably there is a way to describe how user
>>> space intents to interact with this feature, so to see which value this
>>> actually has for user space -- and if we are using the right APIs and
>>> allocators.
>>
>> I'll try with an alternative summary, hopefully it becomes clearer (I
>> think Alex is away until the end of the week, may not reply
>> immediately). If this still doesn't work, maybe we should try a
>> different implementation ;).
>>
>> The way MTE is implemented currently is to have a static carve-out of
>> the DRAM to store the allocation tags (a.k.a. memory colour). This is
>> what we call the tag storage. Each 16 bytes have 4 bits of tags, so this
>> means 1/32 of the DRAM, roughly 3% used for the tag storage. This is
>> done transparently by the hardware/interconnect (with firmware setup)
>> and normally hidden from the OS. So a checked memory access to location
>> X generates a tag fetch from location Y in the carve-out and this tag is
>> compared with the bits 59:56 in the pointer. The correspondence from X
>> to Y is linear (subject to a minimum block size to deal with some
>> address interleaving). The software doesn't need to know about this
>> correspondence as we have specific instructions like STG/LDG to location
>> X that lead to a tag store/load to Y.
>>
>> Now, not all memory used by applications is tagged (mmap(PROT_MTE)).
>> For example, some large allocations may not use PROT_MTE at all or only
>> for the first and last page since initialising the tags takes time. The
>> side-effect is that of these 3% DRAM, only part, say 1% is effectively
>> used. Some people want the unused tag storage to be released for normal
>> data usage (i.e. give it to the kernel page allocator).
>>
>> So the first complication is that a PROT_MTE page allocation at address
>> X will need to reserve the tag storage at location Y (and migrate any
>> data in that page if it is in use).
>>
>> To make things worse, pages in the tag storage/carve-out range cannot
>> use PROT_MTE themselves on current hardware, so this adds the second
>> complication - a heterogeneous memory layout. The kernel needs to know
>> where to allocate a PROT_MTE page from or migrate a current page if it
>> becomes PROT_MTE (mprotect()) and the range it is in does not support
>> tagging.
>>
>> Some other complications are arm64-specific like cache coherency between
>> tags and data accesses. There is a draft architecture spec which will be
>> released soon, detailing how the hardware behaves.
>>
>> To your question about user APIs/ABIs, that's entirely transparent. As
>> with the current kernel (without this dynamic tag storage), a user only
>> needs to ask for PROT_MTE mappings to get tagged pages.
> 
> Thanks, that clarifies things a lot.
> 
> So it sounds like you might want to provide that tag memory using CMA.
> 
> That way, only movable allocations can end up on that CMA memory area,
> and you can allocate selected tag pages on demand (similar to the
> alloc_contig_range() use case).
> 
> That also solves the issue that such tag memory must not be longterm-pinned.
> 
> Regarding one complication: "The kernel needs to know where to allocate
> a PROT_MTE page from or migrate a current page if it becomes PROT_MTE
> (mprotect()) and the range it is in does not support tagging.",
> simplified handling would be if it's in a MIGRATE_CMA pageblock, it
> doesn't support tagging. You have to migrate to a !CMA page (for
> example, not specifying GFP_MOVABLE as a quick way to achieve that).
> 

Okay, I now realize that this patch set effectively duplicates some CMA 
behavior using a new migrate-type. Yeah, that's probably not what we 
want just to identify if memory is taggable or not.

Maybe there is a way to just keep reusing most of CMA instead.


Another simpler idea to get started would be to just intercept the first 
PROT_MTE, and allocate all CMA memory. In that case, systems that don't 
ever use PROT_MTE can have that additional 3% of memory.

You probably know better how frequent it is that only a handful of 
applications use PROT_MTE, such that there is still a significant 
portion of tag memory to be reused (and if it's really worth optimizing 
for that scenario).

-- 
Cheers,

David / dhildenb

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ