linux-kernel - Re: [PATCH v2 0/2] MTE support for KVM guest

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <17efa848-9bda-26b2-b70f-040c9fa3f2da@arm.com>
Date:   Thu, 10 Sep 2020 15:14:47 +0100
From:   Steven Price <steven.price@....com>
To:     Andrew Jones <drjones@...hat.com>
Cc:     Peter Maydell <Peter.Maydell@....com>,
        linux-kernel@...r.kernel.org, Juan Quintela <quintela@...hat.com>,
        Catalin Marinas <catalin.marinas@....com>,
        Richard Henderson <richard.henderson@...aro.org>,
        qemu-devel@...gnu.org,
        "Dr. David Alan Gilbert" <dgilbert@...hat.com>,
        Marc Zyngier <maz@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Will Deacon <will@...nel.org>, kvmarm@...ts.cs.columbia.edu,
        linux-arm-kernel@...ts.infradead.org,
        Dave Martin <Dave.Martin@....com>
Subject: Re: [PATCH v2 0/2] MTE support for KVM guest

On 10/09/2020 14:56, Andrew Jones wrote:
> On Thu, Sep 10, 2020 at 10:21:04AM +0100, Steven Price wrote:
>> On 10/09/2020 07:29, Andrew Jones wrote:
>>> But if userspace created the memslots with memory already set with
>>> PROT_MTE, then this wouldn't be necessary, right? And, as long as
>>> there's still a way to access the memory with tag checking disabled,
>>> then it shouldn't be a problem.
>>
>> Yes, so one option would be to attempt to validate that the VMM has provided
>> memory pages with the PG_mte_tagged bit set (e.g. by mapping with PROT_MTE).
>> The tricky part here is that we support KVM_CAP_SYNC_MMU which means that
>> the VMM can change the memory backing at any time - so we could end up in
>> user_mem_abort() discovering that a page doesn't have PG_mte_tagged set - at
>> that point there's no nice way of handling it (other than silently upgrading
>> the page) so the VM is dead.
>>
>> So since enforcing that PG_mte_tagged is set isn't easy and provides a
>> hard-to-debug foot gun to the VMM I decided the better option was to let the
>> kernel set the bit automatically.
>>
> 
> The foot gun still exists when migration is considered, no? If userspace
> is telling a guest it can use MTE on its normal memory, but then doesn't
> prepare that memory correctly, or remember to migrate the tags correctly
> (which requires knowing the memory has tags and knowing how to get them),
> then I guess the VM is in trouble one way or another.

Well not all VMMs support migration, and it's only migration that is 
affected by this for a simple VMM (e.g. the changes to kvmtool are 
minimal for MTE). But yes fundamentally if a VMM enables MTE it needs to 
know how to deal with the extra tags everywhere.

> I feel like we should trust the VMM to ensure MTE will work on any memory
> the guest could use it on, and change the action in user_mem_abort() to
> abort the guest with a big error message if it sees the flag is missing.

I'm happy to change it, if you feel this is easier to debug.

>>>>>
>>>>> If userspace needs to write to guest memory then it should be due to
>>>>> a device DMA or other specific hardware emulation. Those accesses can
>>>>> be done with tag checking disabled.
>>>>
>>>> Yes, the question is can the VMM (sensibly) wrap the accesses with a
>>>> disable/renable tag checking for the process sequence. The alternative at
>>>> the moment is to maintain a separate (untagged) mapping for the purpose
>>>> which might present it's own problems.
>>>
>>> Hmm, so there's no easy way to disable tag checking when necessary? If we
>>> don't map the guest ram with PROT_MTE and continue setting the attribute
>>> in KVM, as this series does, then we don't need to worry about it tag
>>> checking when accessing the memory, but then we can't access the tags for
>>> migration.
>>
>> There's a "TCO" (Tag Check Override) bit in PSTATE which allows disabling
>> tag checking, so if it's reasonable to wrap accesses to the memory you can
>> simply set the TCO bit, perform the memory access and then unset TCO. That
>> would mean a single mapping with MTE enabled would work fine. What I don't
>> have a clue about is whether it's practical in the VMM to wrap guest
>> accesses like this.
>>
> 
> At least QEMU goes through many abstractions to get to memory already.
> There may already be a hook we could use, if not, it probably wouldn't
> be too hard to add one (famous last words).

Sounds good. My hope was that the abstractions were already in there.

Thanks,

Steve