linux-kernel - Re: [RFC PATCH v2 0/5] Reduce NUMA balance caused TLB-shootdowns in a VM

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5c9e52ab-d46a-c939-b48f-744b9875ce95@redhat.com>
Date:   Thu, 17 Aug 2023 09:38:37 +0200
From:   David Hildenbrand <david@...hat.com>
To:     Yan Zhao <yan.y.zhao@...el.com>, John Hubbard <jhubbard@...dia.com>
Cc:     linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        kvm@...r.kernel.org, pbonzini@...hat.com, seanjc@...gle.com,
        mike.kravetz@...cle.com, apopple@...dia.com, jgg@...dia.com,
        rppt@...nel.org, akpm@...ux-foundation.org, kevin.tian@...el.com,
        Mel Gorman <mgorman@...hsingularity.net>,
        alex.williamson@...hat.com
Subject: Re: [RFC PATCH v2 0/5] Reduce NUMA balance caused TLB-shootdowns in a
 VM

On 17.08.23 07:05, Yan Zhao wrote:
> On Wed, Aug 16, 2023 at 11:00:36AM -0700, John Hubbard wrote:
>> On 8/16/23 02:49, David Hildenbrand wrote:
>>> But do 32bit architectures even care about NUMA hinting? If not, just
>>> ignore them ...
>>
>> Probably not!
>>
>> ...
>>>> So, do you mean that let kernel provide a per-VMA allow/disallow
>>>> mechanism, and
>>>> it's up to the user space to choose between per-VMA and complex way or
>>>> global and simpler way?
>>>
>>> QEMU could do either way. The question would be if a per-vma settings
>>> makes sense for NUMA hinting.
>>
>>  From our experience with compute on GPUs, a per-mm setting would suffice.
>> No need to go all the way to VMA granularity.
>>
> After an offline internal discussion, we think a per-mm setting is also
> enough for device passthrough in VMs.
> 
> BTW, if we want a per-VMA flag, compared to VM_NO_NUMA_BALANCING, do you
> think it's of any value to providing a flag like VM_MAYDMA?
> Auto NUMA balancing or other components can decide how to use it by
> themselves.

Short-lived DMA is not really the problem. The problem is long-term pinning.

There was a discussion about letting user space similarly hint that 
long-term pinning might/will happen.

Because when long-term pinning a page we have to make sure to migrate it 
off of ZONE_MOVABLE / MIGRATE_CMA.

But the kernel prefers to place pages there.

So with vfio in QEMU, we might preallocate memory for the guest and 
place it on ZONE_MOVABLE/MIGRATE_CMA, just so long-term pinning has to 
migrate all these fresh pages out of these areas again.

So letting the kernel know about that in this context might also help.

-- 
Cheers,

David / dhildenb