lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b2b81b99-29ee-4122-99ef-4a6094f4ec5c@nvidia.com>
Date: Fri, 23 Jan 2026 17:25:08 +1100
From: Jordan Niethe <jniethe@...dia.com>
To: Matthew Brost <matthew.brost@...el.com>
Cc: linux-mm@...ck.org, balbirs@...dia.com, akpm@...ux-foundation.org,
 linux-kernel@...r.kernel.org, dri-devel@...ts.freedesktop.org,
 david@...hat.com, ziy@...dia.com, apopple@...dia.com,
 lorenzo.stoakes@...cle.com, lyude@...hat.com, dakr@...nel.org,
 airlied@...il.com, simona@...ll.ch, rcampbell@...dia.com,
 mpenttil@...hat.com, jgg@...dia.com, willy@...radead.org,
 linuxppc-dev@...ts.ozlabs.org, intel-xe@...ts.freedesktop.org, jgg@...pe.ca,
 Felix.Kuehling@....com
Subject: Re: [PATCH v2 00/11] Remove device private pages from physical
 address space

Hi,

On 14/1/26 16:41, Jordan Niethe wrote:
> Hi,
> 
> On 9/1/26 17:22, Matthew Brost wrote:
>> On Fri, Jan 09, 2026 at 12:27:50PM +1100, Jordan Niethe wrote:
>>> Hi
>>> On 9/1/26 11:31, Matthew Brost wrote:
>>>> On Fri, Jan 09, 2026 at 11:01:13AM +1100, Jordan Niethe wrote:
>>>>> Hi,
>>>>>
>>>>> On 8/1/26 16:42, Jordan Niethe wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On 8/1/26 13:25, Jordan Niethe wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> On 8/1/26 05:36, Matthew Brost wrote:
>>>>>>>>
>>>>>>>> Thanks for the series. For some reason Intel's CI couldn't apply this
>>>>>>>> series to drm-tip to get results [1]. I'll manually apply this
>>>>>>>> and run all
>>>>>>>> our SVM tests and get back you on results + review the changes here. For
>>>>>>>> future reference if you want to use our CI system, the series must apply
>>>>>>>> to drm-tip, feel free to rebase this series and just send to intel-xe
>>>>>>>> list if you want CI
>>>>>>>
>>>>>>> Thanks, I'll rebase on drm-tip and send to the intel-xe list.
>>>>>>
>>>>>> For reference the rebase on drm-tip on the intel-xe list:
>>>>>>
>>>>>> https://patchwork.freedesktop.org/series/159738/
>>>>>>
>>>>>> Will watch the CI results.
>>>>>
>>>>> The series causes some failures in the intel-xe tests:
>>>>> https://patchwork.freedesktop.org/series/159738/#rev4
>>>>>
>>>>> Working through the failures now.
>>>>>
>>>>
>>>> Yea, I saw the failures. I haven't had time look at the patches on my
>>>> end quite yet. Scrabling to get a few things in 6.20/7.0 PR, so I may
>>>> not have bandwidth to look in depth until mid next week but digging is
>>>> on my TODO list.
>>>
>>> Sure, that's completely fine. The failures seem pretty directly related to
>>> the
>>> series so I think I'll be able to make good progress.
>>>
>>> For example https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-159738v4/bat-bmg-2/igt@xe_evict@evict-beng-small.html
>>>
>>> It looks like I missed that xe_pagemap_destroy_work() needs to be updated to
>>> remove the call to devm_release_mem_region() now we are no longer reserving
>>> a mem
>>> region.
>>
>> +1
>>
>> So this is the one I’d be most concerned about [1].
>> xe_exec_system_allocator is our SVM test, which does almost all the
>> ridiculous things possible in user space to stress SVM. It’s blowing up
>> in the core MM—but the source of the bug could be anywhere (e.g., Xe
>> SVM, GPU SVM, migrate device layer, or core MM). I’ll try to help when I
>> have bandwidth.
>>
>> Matt
>>
>> [1] https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-159738v4/shard-bmg-9/igt@xe_exec_system_allocator@threads-many-large-execqueues-free-nomemset.html
> 
> A similar fault in lruvec_stat_mod_folio can be repro'd if
> memremap_device_private_pagemap() is called with NUMA_NO_NODE instead of (say)
> numa_node_id() for the nid parameter.
> 
> The xe_svm driver uses devm_memremap_device_private_pagemap() which uses
> dev_to_node() for the nid parameter. Suspect this is causing something similar
> to happen.
> 
> When memremap_pages() calls pagemap_range() we have the following logic:
> 
>          if (nid < 0)
>                  nid = numa_mem_id();
> 
> I think we might need to add this to memremap_device_private_pagemap() to handle
> the NUMA_NO_NODE case. Still confirming.

This was the problem, fixed in v3.

> 
> Thanks,
> Jordan.
> 
>>
>>>
>>>
>>> Thanks,
>>> Jordan.
>>>
>>>>
>>>> Matt
>>>>
>>>>> Thanks,
>>>>> Jordan.
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Jordan.
>>>>>>
>>>>>>>
>>>>>>> Jordan.
>>>>>>>
>>>>>>>>
>>>>>>>> I was also wondering if Nvidia could help review one our core MM patches
>>>>>>>> [2] which is gating enabling 2M device pages too?
>>>>>>>>
>>>>>>>> Matt
>>>>>>>>
>>>>>>>> [1] https://patchwork.freedesktop.org/series/159738/
>>>>>>>> [2] https://patchwork.freedesktop.org/patch/694775/?series=159119&rev=1
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ