[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <33b4b93b-5ab6-4a3b-b3b2-c9b3cbc9d929@amd.com>
Date: Wed, 4 Dec 2024 10:11:33 +0530
From: Raghavendra K T <raghavendra.kt@....com>
To: Dan Williams <dan.j.williams@...el.com>, linux-kernel@...r.kernel.org,
linux-cxl@...r.kernel.org
Cc: bharata@....com, Huang Ying <ying.huang@...el.com>,
Andrew Morton <akpm@...ux-foundation.org>,
David Hildenbrand <david@...hat.com>, Davidlohr Bueso <dave@...olabs.net>,
Jonathan Cameron <jonathan.cameron@...wei.com>,
Dave Jiang <dave.jiang@...el.com>,
Alison Schofield <alison.schofield@...el.com>,
Vishal Verma <vishal.l.verma@...el.com>, Ira Weiny <ira.weiny@...el.com>,
Alistair Popple <apopple@...dia.com>,
Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
Bjorn Helgaas <bhelgaas@...gle.com>, Baoquan He <bhe@...hat.com>,
ilpo.jarvinen@...ux.intel.com,
Mika Westerberg <mika.westerberg@...ux.intel.com>,
Fontenot Nathan <Nathan.Fontenot@....com>, Wei Huang <wei.huang2@....com>,
regressions@...ts.linux.dev
Subject: Re: [RFC PATCH] resource: Fix CXL node not populated issue
On 12/4/2024 9:25 AM, Dan Williams wrote:
> [ add regressions@...ts.linux.dev ]
>
> Next time make the subject of the patch:
>
> Revert "resource: fix region_intersects() vs add_memory_driver_managed()"
>
> ...to make it clear that this is a revert, not a fix.
>
> The revert should be applied if a fix does not materialize in the next few weeks.
>
Agreed regarding fix.
one thing to note is it is not exact revert.
> Raghavendra K T wrote:
>> Before:
>> ~]$ numastat -m
>> ...
>> Node 0 Node 1 Total
>> --------------- --------------- ---------------
>> MemTotal 128096.18 128838.48 256934.65
>>
>> After:
>> $ numastat -m
>> .....
>> Node 0 Node 1 Node 2 Total
>> --------------- --------------- --------------- ---------------
>> MemTotal 128054.16 128880.51 129024.00 385958.67
>>
>> Current patch reverts the effect of first commit where the issue is seen.
>
> Might you be able to dig a bit further into the details like memory map
> for this platform and ACPI SRAT tables? A dmesg comparison of the good
> and bad cases would be useful (those can be shared via a github gist).
> Even better would be some debug instrumentation to identify which call
> to __region_intersects() started behaving differently resulting in a
> whole node disappearing.
>
> In terms of the urgency of fixing this it would also help to know how
> prevalent the system this was found on is in the wild.
I have compared dmesg, proc/iomem of both success and fail case.
A. dmesg:
1. Address ranges is different
2. extra message about printing Demotion target
Fallback order for Node 0: 0 1 2
Fallback order for Node 1: 1 0 2
Fallback order for Node 2: 2 0 1
Built 3 zonelists, mobility grouping on. Total pages: 66145521
Policy zone: Normal
....
Demotion targets for Node 0: preferred: 2, fallback: 2
Demotion targets for Node 1: preferred: 2, fallback: 2
Demotion targets for Node 2: null
B. /proc/iomem
$ vimdiff success fail
4050000000-604fffffff : Soft Reserved
| 164 4050000000-604fffffff : Soft Reserved
165 4050000000-604fffffff : CXL Window 0
| 165 4050000000-604fffffff : CXL Window 0
166 4080000000-5fffffffff : dax1.0
|
------------------------------------------------------------------------
167 4080000000-5fffffffff : System RAM (kmem)
|
--------------------------------------------------------------------
I will get more detail from ACPI SRAT table etc..
Thanks and Regards
Raghu
Powered by blists - more mailing lists