[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bab1c156-ed5a-4c1d-8f0a-dd1e39e17c99@oracle.com>
Date: Mon, 28 Apr 2025 12:11:40 -0700
From: jane.chu@...cle.com
To: Jason Gunthorpe <jgg@...pe.ca>
Cc: logane@...tatee.com, hch@....de, gregkh@...uxfoundation.org,
willy@...radead.org, kch@...dia.com, axboe@...nel.dk,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
linux-pci@...r.kernel.org, linux-nvme@...ts.infradead.org,
linux-block@...r.kernel.org
Subject: Re: Report: Performance regression from ib_umem_get on zone device
pages
On 4/24/2025 5:01 AM, Jason Gunthorpe wrote:
> On Wed, Apr 23, 2025 at 10:35:06PM -0700, jane.chu@...cle.com wrote:
>>
>> On 4/23/2025 4:28 PM, Jason Gunthorpe wrote:
>>>> The flow of a single test run:
>>>> 1. reserve virtual address space for (61440 * 2MB) via mmap with PROT_NONE
>>>> and MAP_ANONYMOUS | MAP_NORESERVE| MAP_PRIVATE
>>>> 2. mmap ((61440 * 2MB) / 12) from each of the 12 device-dax to the
>>>> reserved virtual address space sequentially to form a continual VA
>>>> space
>>> Like is there any chance that each of these 61440 VMA's is a single
>>> 2MB folio from device-dax, or could it be?
>>>
>>> IIRC device-dax does could not use folios until 6.15 so I'm assuming
>>> it is not folios even if it is a pmd mapping?
>>
>> I just ran the mr registration stress test in 6.15-rc3, much better!
>>
>> What's changed? is it folio for device-dax? none of the code in
>> ib_umem_get() has changed though, it still loops through 'npages' doing
>
> I don't know, it is kind of strange that it changed. If device-dax is
> now using folios then it does change the access pattern to the struct
> page array somewhat, especially it moves all the writes to the head
> page of the 2MB section which maybe impacts the the caching?
6.15-rc3 is orders of magnitude better.
Agreed that device-dax's using folio are likely the heros. I've yet to
check the code and bisect, maybe pin_user_page_fast() adds folios to
page_list[] instead of 4K pages? if so, with 511/512 size reduction in
page_list[], that could drastically improve the dowstream call
performance in spite of the thrashing, that is, if thrashing is still there.
I'll report my findings.
Thanks,
-jane
>
> Jason
Powered by blists - more mailing lists