[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5c9e52ab-d46a-c939-b48f-744b9875ce95@redhat.com>
Date: Thu, 17 Aug 2023 09:38:37 +0200
From: David Hildenbrand <david@...hat.com>
To: Yan Zhao <yan.y.zhao@...el.com>, John Hubbard <jhubbard@...dia.com>
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org,
kvm@...r.kernel.org, pbonzini@...hat.com, seanjc@...gle.com,
mike.kravetz@...cle.com, apopple@...dia.com, jgg@...dia.com,
rppt@...nel.org, akpm@...ux-foundation.org, kevin.tian@...el.com,
Mel Gorman <mgorman@...hsingularity.net>,
alex.williamson@...hat.com
Subject: Re: [RFC PATCH v2 0/5] Reduce NUMA balance caused TLB-shootdowns in a
VM
On 17.08.23 07:05, Yan Zhao wrote:
> On Wed, Aug 16, 2023 at 11:00:36AM -0700, John Hubbard wrote:
>> On 8/16/23 02:49, David Hildenbrand wrote:
>>> But do 32bit architectures even care about NUMA hinting? If not, just
>>> ignore them ...
>>
>> Probably not!
>>
>> ...
>>>> So, do you mean that let kernel provide a per-VMA allow/disallow
>>>> mechanism, and
>>>> it's up to the user space to choose between per-VMA and complex way or
>>>> global and simpler way?
>>>
>>> QEMU could do either way. The question would be if a per-vma settings
>>> makes sense for NUMA hinting.
>>
>> From our experience with compute on GPUs, a per-mm setting would suffice.
>> No need to go all the way to VMA granularity.
>>
> After an offline internal discussion, we think a per-mm setting is also
> enough for device passthrough in VMs.
>
> BTW, if we want a per-VMA flag, compared to VM_NO_NUMA_BALANCING, do you
> think it's of any value to providing a flag like VM_MAYDMA?
> Auto NUMA balancing or other components can decide how to use it by
> themselves.
Short-lived DMA is not really the problem. The problem is long-term pinning.
There was a discussion about letting user space similarly hint that
long-term pinning might/will happen.
Because when long-term pinning a page we have to make sure to migrate it
off of ZONE_MOVABLE / MIGRATE_CMA.
But the kernel prefers to place pages there.
So with vfio in QEMU, we might preallocate memory for the guest and
place it on ZONE_MOVABLE/MIGRATE_CMA, just so long-term pinning has to
migrate all these fresh pages out of these areas again.
So letting the kernel know about that in this context might also help.
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists