linux-kernel - Re: [PATCH v11 2/6] arm64: kvm: Introduce MTE VM feature

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20210513152158.GB6801@arm.com>
Date:   Thu, 13 May 2021 16:21:58 +0100
From:   Catalin Marinas <catalin.marinas@....com>
To:     Steven Price <steven.price@....com>
Cc:     Marc Zyngier <maz@...nel.org>, Will Deacon <will@...nel.org>,
        James Morse <james.morse@....com>,
        Julien Thierry <julien.thierry.kdev@...il.com>,
        Suzuki K Poulose <suzuki.poulose@....com>,
        kvmarm@...ts.cs.columbia.edu, linux-arm-kernel@...ts.infradead.org,
        linux-kernel@...r.kernel.org, Dave Martin <Dave.Martin@....com>,
        Mark Rutland <mark.rutland@....com>,
        Thomas Gleixner <tglx@...utronix.de>, qemu-devel@...gnu.org,
        Juan Quintela <quintela@...hat.com>,
        "Dr. David Alan Gilbert" <dgilbert@...hat.com>,
        Richard Henderson <richard.henderson@...aro.org>,
        Peter Maydell <peter.maydell@...aro.org>,
        Haibo Xu <Haibo.Xu@....com>, Andrew Jones <drjones@...hat.com>
Subject: Re: [PATCH v11 2/6] arm64: kvm: Introduce MTE VM feature

On Thu, May 13, 2021 at 11:57:39AM +0100, Steven Price wrote:
> On 12/05/2021 18:45, Catalin Marinas wrote:
> > On Wed, May 12, 2021 at 04:46:48PM +0100, Steven Price wrote:
> >> On 10/05/2021 19:35, Catalin Marinas wrote:
> >>>> On Thu, May 06, 2021 at 05:15:25PM +0100, Steven Price wrote:
> >>>>>> On Thu, Apr 29, 2021 at 05:06:41PM +0100, Steven Price wrote:
> >>>>>>> Given the changes to set_pte_at() which means that tags are restored from
> >>>>>>> swap even if !PROT_MTE, the only race I can see remaining is the creation of
> >>>>>>> new PROT_MTE mappings. As you mention an attempt to change mappings in the
> >>>>>>> VMM memory space should involve a mmu notifier call which I think serialises
> >>>>>>> this. So the remaining issue is doing this in a separate address space.
> >>>>>>>
> >>>>>>> So I guess the potential problem is:
> >>>>>>>
> >>>>>>>    * allocate memory MAP_SHARED but !PROT_MTE
> >>>>>>>    * fork()
> >>>>>>>    * VM causes a fault in parent address space
> >>>>>>>    * child does a mprotect(PROT_MTE)
> >>>>>>>
> >>>>>>> With the last two potentially racing. Sadly I can't see a good way of
> >>>>>>> handling that.
> > [...]
> >>> Options:
> >>>
> >>> 1. Change the mte_sync_tags() code path to set the flag after clearing
> >>>     and avoid reading stale tags. We document that mprotect() on
> >>>     MAP_SHARED may lead to tag loss. Maybe we can intercept this in the
> >>>     arch code and return an error.
> >>
> >> This is the best option I've come up with so far - but it's not a good
> >> one! We can replace the set_bit() with a test_and_set_bit() to catch the
> >> race after it has occurred - but I'm not sure what we can do about it
> >> then (we've already wiped the data). Returning an error doesn't seem
> >> particularly useful at that point, a message in dmesg is about the best
> >> I can come up with.
> > 
> > What I meant about intercepting is on something like
> > arch_validate_flags() to prevent VM_SHARED and VM_MTE together but only
> > for mprotect(), not mmap(). However, arch_validate_flags() is currently
> > called on both mmap() and mprotect() paths.
> 
> I think even if we were to restrict mprotect() there would be corner
> cases around swapping in. For example if a page mapped VM_SHARED|VM_MTE
> is faulted simultaneously in both processes then we have the same situation:
> 
>  * with test_and_set_bit() one process could potentially see the tags
> before they have been restored - i.e. a data leak.
> 
>  * with separated test and set then one process could write to the tags
> before the second restore has completed causing a lost update.

I don't think this can happen. We have shmem_swapin_page() which I think
would be called on any faulting pte that was sharing such page. It takes
the page lock around arch_swap_restore().

-- 
Catalin