linux-kernel - Re: [PATCH v2 0/2] MTE support for KVM guest

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200910135618.cvnlrgvhuy3amv6s@kamzik.brq.redhat.com>
Date:   Thu, 10 Sep 2020 15:56:18 +0200
From:   Andrew Jones <drjones@...hat.com>
To:     Steven Price <steven.price@....com>
Cc:     Peter Maydell <Peter.Maydell@....com>,
        linux-kernel@...r.kernel.org, Juan Quintela <quintela@...hat.com>,
        Catalin Marinas <catalin.marinas@....com>,
        Richard Henderson <richard.henderson@...aro.org>,
        qemu-devel@...gnu.org,
        "Dr. David Alan Gilbert" <dgilbert@...hat.com>,
        Marc Zyngier <maz@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Will Deacon <will@...nel.org>, kvmarm@...ts.cs.columbia.edu,
        linux-arm-kernel@...ts.infradead.org,
        Dave Martin <Dave.Martin@....com>
Subject: Re: [PATCH v2 0/2] MTE support for KVM guest

On Thu, Sep 10, 2020 at 10:21:04AM +0100, Steven Price wrote:
> On 10/09/2020 07:29, Andrew Jones wrote:
> > But if userspace created the memslots with memory already set with
> > PROT_MTE, then this wouldn't be necessary, right? And, as long as
> > there's still a way to access the memory with tag checking disabled,
> > then it shouldn't be a problem.
> 
> Yes, so one option would be to attempt to validate that the VMM has provided
> memory pages with the PG_mte_tagged bit set (e.g. by mapping with PROT_MTE).
> The tricky part here is that we support KVM_CAP_SYNC_MMU which means that
> the VMM can change the memory backing at any time - so we could end up in
> user_mem_abort() discovering that a page doesn't have PG_mte_tagged set - at
> that point there's no nice way of handling it (other than silently upgrading
> the page) so the VM is dead.
> 
> So since enforcing that PG_mte_tagged is set isn't easy and provides a
> hard-to-debug foot gun to the VMM I decided the better option was to let the
> kernel set the bit automatically.
>

The foot gun still exists when migration is considered, no? If userspace
is telling a guest it can use MTE on its normal memory, but then doesn't
prepare that memory correctly, or remember to migrate the tags correctly
(which requires knowing the memory has tags and knowing how to get them),
then I guess the VM is in trouble one way or another.

I feel like we should trust the VMM to ensure MTE will work on any memory
the guest could use it on, and change the action in user_mem_abort() to
abort the guest with a big error message if it sees the flag is missing.
 
> > > > 
> > > > If userspace needs to write to guest memory then it should be due to
> > > > a device DMA or other specific hardware emulation. Those accesses can
> > > > be done with tag checking disabled.
> > > 
> > > Yes, the question is can the VMM (sensibly) wrap the accesses with a
> > > disable/renable tag checking for the process sequence. The alternative at
> > > the moment is to maintain a separate (untagged) mapping for the purpose
> > > which might present it's own problems.
> > 
> > Hmm, so there's no easy way to disable tag checking when necessary? If we
> > don't map the guest ram with PROT_MTE and continue setting the attribute
> > in KVM, as this series does, then we don't need to worry about it tag
> > checking when accessing the memory, but then we can't access the tags for
> > migration.
> 
> There's a "TCO" (Tag Check Override) bit in PSTATE which allows disabling
> tag checking, so if it's reasonable to wrap accesses to the memory you can
> simply set the TCO bit, perform the memory access and then unset TCO. That
> would mean a single mapping with MTE enabled would work fine. What I don't
> have a clue about is whether it's practical in the VMM to wrap guest
> accesses like this.
> 

At least QEMU goes through many abstractions to get to memory already.
There may already be a hook we could use, if not, it probably wouldn't
be too hard to add one (famous last words).

Thanks,
drew