linux-kernel - Re: [RFC PATCH v2 0/5] Reduce NUMA balance caused TLB-shootdowns in a VM

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZN63m5Dej5MBLTqr@yzhao56-desk.sh.intel.com>
Date:   Fri, 18 Aug 2023 08:13:15 +0800
From:   Yan Zhao <yan.y.zhao@...el.com>
To:     David Hildenbrand <david@...hat.com>
CC:     John Hubbard <jhubbard@...dia.com>, <linux-mm@...ck.org>,
        <linux-kernel@...r.kernel.org>, <kvm@...r.kernel.org>,
        <pbonzini@...hat.com>, <seanjc@...gle.com>,
        <mike.kravetz@...cle.com>, <apopple@...dia.com>, <jgg@...dia.com>,
        <rppt@...nel.org>, <akpm@...ux-foundation.org>,
        <kevin.tian@...el.com>, Mel Gorman <mgorman@...hsingularity.net>,
        <alex.williamson@...hat.com>
Subject: Re: [RFC PATCH v2 0/5] Reduce NUMA balance caused TLB-shootdowns in
 a VM

On Thu, Aug 17, 2023 at 09:38:37AM +0200, David Hildenbrand wrote:
> On 17.08.23 07:05, Yan Zhao wrote:
> > On Wed, Aug 16, 2023 at 11:00:36AM -0700, John Hubbard wrote:
> > > On 8/16/23 02:49, David Hildenbrand wrote:
> > > > But do 32bit architectures even care about NUMA hinting? If not, just
> > > > ignore them ...
> > > 
> > > Probably not!
> > > 
> > > ...
> > > > > So, do you mean that let kernel provide a per-VMA allow/disallow
> > > > > mechanism, and
> > > > > it's up to the user space to choose between per-VMA and complex way or
> > > > > global and simpler way?
> > > > 
> > > > QEMU could do either way. The question would be if a per-vma settings
> > > > makes sense for NUMA hinting.
> > > 
> > >  From our experience with compute on GPUs, a per-mm setting would suffice.
> > > No need to go all the way to VMA granularity.
> > > 
> > After an offline internal discussion, we think a per-mm setting is also
> > enough for device passthrough in VMs.
> > 
> > BTW, if we want a per-VMA flag, compared to VM_NO_NUMA_BALANCING, do you
> > think it's of any value to providing a flag like VM_MAYDMA?
> > Auto NUMA balancing or other components can decide how to use it by
> > themselves.
> 
> Short-lived DMA is not really the problem. The problem is long-term pinning.
> 
> There was a discussion about letting user space similarly hint that
> long-term pinning might/will happen.
> 
> Because when long-term pinning a page we have to make sure to migrate it off
> of ZONE_MOVABLE / MIGRATE_CMA.
> 
> But the kernel prefers to place pages there.
> 
> So with vfio in QEMU, we might preallocate memory for the guest and place it
> on ZONE_MOVABLE/MIGRATE_CMA, just so long-term pinning has to migrate all
> these fresh pages out of these areas again.
> 
> So letting the kernel know about that in this context might also help.
>
Thanks! Glad to know it :)
But consider for GPUs case as what John mentioned, since the memory is
not even pinned, maybe they still need flag VM_NO_NUMA_BALANCING ?
For VMs, we hint VM_NO_NUMA_BALANCING for passthrough devices supporting
IO page fault (so no need to pin), and VM_MAYLONGTERMDMA to avoid misplace
and migration.

Is that good?
Or do you think just a per-mm flag like MMF_NO_NUMA is good enough for
now?