lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <33b4b93b-5ab6-4a3b-b3b2-c9b3cbc9d929@amd.com>
Date: Wed, 4 Dec 2024 10:11:33 +0530
From: Raghavendra K T <raghavendra.kt@....com>
To: Dan Williams <dan.j.williams@...el.com>, linux-kernel@...r.kernel.org,
 linux-cxl@...r.kernel.org
Cc: bharata@....com, Huang Ying <ying.huang@...el.com>,
 Andrew Morton <akpm@...ux-foundation.org>,
 David Hildenbrand <david@...hat.com>, Davidlohr Bueso <dave@...olabs.net>,
 Jonathan Cameron <jonathan.cameron@...wei.com>,
 Dave Jiang <dave.jiang@...el.com>,
 Alison Schofield <alison.schofield@...el.com>,
 Vishal Verma <vishal.l.verma@...el.com>, Ira Weiny <ira.weiny@...el.com>,
 Alistair Popple <apopple@...dia.com>,
 Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
 Bjorn Helgaas <bhelgaas@...gle.com>, Baoquan He <bhe@...hat.com>,
 ilpo.jarvinen@...ux.intel.com,
 Mika Westerberg <mika.westerberg@...ux.intel.com>,
 Fontenot Nathan <Nathan.Fontenot@....com>, Wei Huang <wei.huang2@....com>,
 regressions@...ts.linux.dev
Subject: Re: [RFC PATCH] resource: Fix CXL node not populated issue



On 12/4/2024 9:25 AM, Dan Williams wrote:
> [ add regressions@...ts.linux.dev ]
> 
> Next time make the subject of the patch:
> 
>     Revert "resource: fix region_intersects() vs add_memory_driver_managed()"
> 
> ...to make it clear that this is a revert, not a fix.
> 
> The revert should be applied if a fix does not materialize in the next few weeks.
> 

Agreed regarding fix.
one thing to note is it is not exact revert.

> Raghavendra K T wrote:
>> Before:
>> ~]$ numastat -m
>> ...
>>                            Node 0          Node 1           Total
>>                   --------------- --------------- ---------------
>> MemTotal               128096.18       128838.48       256934.65
>>
>> After:
>> $ numastat -m
>> .....
>>                            Node 0          Node 1          Node 2           Total
>>                   --------------- --------------- --------------- ---------------
>> MemTotal               128054.16       128880.51       129024.00       385958.67
>>
>> Current patch reverts the effect of first commit where the issue is seen.
> 
> Might you be able to dig a bit further into the details like memory map
> for this platform and ACPI SRAT tables? A dmesg comparison of the good
> and bad cases would be useful (those can be shared via a github gist).
> Even better would be some debug instrumentation to identify which call
> to __region_intersects() started behaving differently resulting in a
> whole node disappearing.
> 
> In terms of the urgency of fixing this it would also help to know how
> prevalent the system this was found on is in the wild.

I have compared dmesg, proc/iomem of both success and fail case.

A. dmesg:

1. Address ranges is different
2. extra message about printing Demotion target

Fallback order for Node 0: 0 1 2
Fallback order for Node 1: 1 0 2
Fallback order for Node 2: 2 0 1
Built 3 zonelists, mobility grouping on.  Total pages: 66145521
Policy zone: Normal
....
Demotion targets for Node 0: preferred: 2, fallback: 2
Demotion targets for Node 1: preferred: 2, fallback: 2
Demotion targets for Node 2: null

B. /proc/iomem

$ vimdiff success fail

  4050000000-604fffffff : Soft Reserved 
   |  164 4050000000-604fffffff : Soft Reserved
   165   4050000000-604fffffff : CXL Window 0 
        |  165   4050000000-604fffffff : CXL Window 0
   166     4080000000-5fffffffff : dax1.0 
        | 
------------------------------------------------------------------------
   167       4080000000-5fffffffff : System RAM (kmem) 
        | 
--------------------------------------------------------------------


I will get more detail from ACPI SRAT table etc..

Thanks and Regards
  Raghu

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ