linux-kernel - Re: [PATCH v2 08/20] mm/mshare: flush all TLBs when updating PTEs in an mshare range

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <a66997b6-60e4-4bfc-9437-89924c2ed3aa@oracle.com>
Date: Fri, 30 May 2025 15:47:08 -0700
From: Anthony Yznaga <anthony.yznaga@...cle.com>
To: Jann Horn <jannh@...gle.com>
Cc: akpm@...ux-foundation.org, willy@...radead.org, markhemm@...glemail.com,
        viro@...iv.linux.org.uk, david@...hat.com, khalid@...nel.org,
        andreyknvl@...il.com, dave.hansen@...el.com, luto@...nel.org,
        brauner@...nel.org, arnd@...db.de, ebiederm@...ssion.com,
        catalin.marinas@....com, linux-arch@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org, mhiramat@...nel.org,
        rostedt@...dmis.org, vasily.averin@...ux.dev, xhao@...ux.alibaba.com,
        pcc@...gle.com, neilb@...e.de, maz@...nel.org
Subject: Re: [PATCH v2 08/20] mm/mshare: flush all TLBs when updating PTEs in
 an mshare range



On 5/30/25 10:46 AM, Jann Horn wrote:
> On Fri, May 30, 2025 at 6:30 PM Anthony Yznaga
> <anthony.yznaga@...cle.com> wrote:
>> On 5/30/25 7:41 AM, Jann Horn wrote:
>>> On Fri, Apr 4, 2025 at 4:18 AM Anthony Yznaga <anthony.yznaga@...cle.com> wrote:
>>>> Unlike the mm of a task, an mshare host mm is not updated on context
>>>> switch. In particular this means that mm_cpumask is never updated
>>>> which results in TLB flushes for updates to mshare PTEs only being
>>>> done on the local CPU. To ensure entries are flushed for non-local
>>>> TLBs, set up an mmu notifier on the mshare mm and use the
>>>> .arch_invalidate_secondary_tlbs callback to flush all TLBs.
>>>> arch_invalidate_secondary_tlbs guarantees that TLB entries will be
>>>> flushed before pages are freed when unmapping pages in an mshare region.
>>>
>>> Thanks for working on this, I think this is a really nice feature.
>>>
>>> An issue that I think this series doesn't address is:
>>> There could be mmu_notifiers (for things like KVM or SVA IOMMU) that
>>> want to be notified on changes to an mshare VMA; if those are not
>>> invoked, we could get UAF of page contents. So either we propagate MMU
>>> notifier invocations in the host mm into the mshare regions that use
>>> it, or we'd have to somehow prevent a process from using MMU notifiers
>>> and mshare at the same time.
>>
>> Thanks, Jann. I've noted this as an issue. Ultimately I think the
>> notifiers calls will need to be propagated. It's going to be tricky, but
>> I have some ideas.
> 
> Very naively I think you could basically register your own notifier on
> the host mm that has notifier callbacks vaguely like this that walk
> the rmap of the mshare file and invoke nested mmu notifiers on each
> VMA that maps the file, basically like unmap_mapping_pages() except
> that you replace unmap_mapping_range_vma() with a notifier invocation?
> 
> static int mshare_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
>      const struct mmu_notifier_range *range)
> {
>    struct vm_area_struct *vma;
>    pgoff_t first_index, last_index;
> 
>    if (range->end < host_mm->mmap_base)
>      return 0;
>    first_index = (max(range->start, host_mm->mmap_base) -
> host_mm->mmap_base) / PAGE_SIZE;
>    last_index = (range->end - host_mm->mmap_base) / PAGE_SIZE;
>    i_mmap_lock_read(mapping);
>    vma_interval_tree_foreach(vma, &mapping->i_mmap, first_index, last_index) {
>      struct mmu_notifier_range nested_range;
> 
>      [... same math as in unmap_mapping_range_tree ...]
>      mmu_notifier_range_init(&nested_range, range->event, vma->vm_mm,
> nested_start, nested_end);
>      mmu_notifier_invalidate_range_start(&nested_range);
>    }
>    i_mmap_unlock_read(mapping);
> }
> 
> And ensure that when mm_take_all_locks() encounters an mshare VMA, it
> basically recursively does mm_take_all_locks() on the mshare host mm?
> 
> I think that might be enough to make it work, and the rest beyond that
> would be optimizations?

I figured the vma interval tree would need to be walked. I hadn't 
considered mm_take_all_locks(), though. This is definitely a good 
starting point. Thanks for this!