[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6752acd92baf0_10a08329424@dwillia2-xfh.jf.intel.com.notmuch>
Date: Thu, 5 Dec 2024 23:50:49 -0800
From: Dan Williams <dan.j.williams@...el.com>
To: Raghavendra K T <raghavendra.kt@....com>, Dan Williams
<dan.j.williams@...el.com>, <linux-kernel@...r.kernel.org>,
<linux-cxl@...r.kernel.org>
CC: <bharata@....com>, Huang Ying <ying.huang@...el.com>, Andrew Morton
<akpm@...ux-foundation.org>, David Hildenbrand <david@...hat.com>, "Davidlohr
Bueso" <dave@...olabs.net>, Jonathan Cameron <jonathan.cameron@...wei.com>,
Dave Jiang <dave.jiang@...el.com>, Alison Schofield
<alison.schofield@...el.com>, Vishal Verma <vishal.l.verma@...el.com>, "Ira
Weiny" <ira.weiny@...el.com>, Alistair Popple <apopple@...dia.com>, "Andy
Shevchenko" <andriy.shevchenko@...ux.intel.com>, Bjorn Helgaas
<bhelgaas@...gle.com>, Baoquan He <bhe@...hat.com>,
<ilpo.jarvinen@...ux.intel.com>, Mika Westerberg
<mika.westerberg@...ux.intel.com>, Fontenot Nathan <Nathan.Fontenot@....com>,
Wei Huang <wei.huang2@....com>, <regressions@...ts.linux.dev>
Subject: Re: [RFC PATCH] resource: Fix CXL node not populated issue
Raghavendra K T wrote:
>
>
> On 12/4/2024 9:25 AM, Dan Williams wrote:
> > [ add regressions@...ts.linux.dev ]
> >
> > Next time make the subject of the patch:
> >
> > Revert "resource: fix region_intersects() vs add_memory_driver_managed()"
> >
> > ...to make it clear that this is a revert, not a fix.
> >
> > The revert should be applied if a fix does not materialize in the next few weeks.
> >
>
> Agreed regarding fix.
> one thing to note is it is not exact revert.
>
> > Raghavendra K T wrote:
> >> Before:
> >> ~]$ numastat -m
> >> ...
> >> Node 0 Node 1 Total
> >> --------------- --------------- ---------------
> >> MemTotal 128096.18 128838.48 256934.65
> >>
> >> After:
> >> $ numastat -m
> >> .....
> >> Node 0 Node 1 Node 2 Total
> >> --------------- --------------- --------------- ---------------
> >> MemTotal 128054.16 128880.51 129024.00 385958.67
> >>
> >> Current patch reverts the effect of first commit where the issue is seen.
> >
> > Might you be able to dig a bit further into the details like memory map
> > for this platform and ACPI SRAT tables? A dmesg comparison of the good
> > and bad cases would be useful (those can be shared via a github gist).
> > Even better would be some debug instrumentation to identify which call
> > to __region_intersects() started behaving differently resulting in a
> > whole node disappearing.
> >
> > In terms of the urgency of fixing this it would also help to know how
> > prevalent the system this was found on is in the wild.
>
> I have compared dmesg, proc/iomem of both success and fail case.
>
> A. dmesg:
>
> 1. Address ranges is different
> 2. extra message about printing Demotion target
>
> Fallback order for Node 0: 0 1 2
> Fallback order for Node 1: 1 0 2
> Fallback order for Node 2: 2 0 1
> Built 3 zonelists, mobility grouping on. Total pages: 66145521
> Policy zone: Normal
> ....
> Demotion targets for Node 0: preferred: 2, fallback: 2
> Demotion targets for Node 1: preferred: 2, fallback: 2
> Demotion targets for Node 2: null
>
> B. /proc/iomem
>
> $ vimdiff success fail
>
> 4050000000-604fffffff : Soft Reserved
> | 164 4050000000-604fffffff : Soft Reserved
> 165 4050000000-604fffffff : CXL Window 0
> | 165 4050000000-604fffffff : CXL Window 0
> 166 4080000000-5fffffffff : dax1.0
> |
> ------------------------------------------------------------------------
> 167 4080000000-5fffffffff : System RAM (kmem)
> |
> --------------------------------------------------------------------
My eyes only know how to read unified diff (diff -u) format. Is this
saying that in the failure case the System RAM range for dax1.0 is
missing?
Powered by blists - more mailing lists