lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6752acd92baf0_10a08329424@dwillia2-xfh.jf.intel.com.notmuch>
Date: Thu, 5 Dec 2024 23:50:49 -0800
From: Dan Williams <dan.j.williams@...el.com>
To: Raghavendra K T <raghavendra.kt@....com>, Dan Williams
	<dan.j.williams@...el.com>, <linux-kernel@...r.kernel.org>,
	<linux-cxl@...r.kernel.org>
CC: <bharata@....com>, Huang Ying <ying.huang@...el.com>, Andrew Morton
	<akpm@...ux-foundation.org>, David Hildenbrand <david@...hat.com>, "Davidlohr
 Bueso" <dave@...olabs.net>, Jonathan Cameron <jonathan.cameron@...wei.com>,
	Dave Jiang <dave.jiang@...el.com>, Alison Schofield
	<alison.schofield@...el.com>, Vishal Verma <vishal.l.verma@...el.com>, "Ira
 Weiny" <ira.weiny@...el.com>, Alistair Popple <apopple@...dia.com>, "Andy
 Shevchenko" <andriy.shevchenko@...ux.intel.com>, Bjorn Helgaas
	<bhelgaas@...gle.com>, Baoquan He <bhe@...hat.com>,
	<ilpo.jarvinen@...ux.intel.com>, Mika Westerberg
	<mika.westerberg@...ux.intel.com>, Fontenot Nathan <Nathan.Fontenot@....com>,
	Wei Huang <wei.huang2@....com>, <regressions@...ts.linux.dev>
Subject: Re: [RFC PATCH] resource: Fix CXL node not populated issue

Raghavendra K T wrote:
> 
> 
> On 12/4/2024 9:25 AM, Dan Williams wrote:
> > [ add regressions@...ts.linux.dev ]
> > 
> > Next time make the subject of the patch:
> > 
> >     Revert "resource: fix region_intersects() vs add_memory_driver_managed()"
> > 
> > ...to make it clear that this is a revert, not a fix.
> > 
> > The revert should be applied if a fix does not materialize in the next few weeks.
> > 
> 
> Agreed regarding fix.
> one thing to note is it is not exact revert.
> 
> > Raghavendra K T wrote:
> >> Before:
> >> ~]$ numastat -m
> >> ...
> >>                            Node 0          Node 1           Total
> >>                   --------------- --------------- ---------------
> >> MemTotal               128096.18       128838.48       256934.65
> >>
> >> After:
> >> $ numastat -m
> >> .....
> >>                            Node 0          Node 1          Node 2           Total
> >>                   --------------- --------------- --------------- ---------------
> >> MemTotal               128054.16       128880.51       129024.00       385958.67
> >>
> >> Current patch reverts the effect of first commit where the issue is seen.
> > 
> > Might you be able to dig a bit further into the details like memory map
> > for this platform and ACPI SRAT tables? A dmesg comparison of the good
> > and bad cases would be useful (those can be shared via a github gist).
> > Even better would be some debug instrumentation to identify which call
> > to __region_intersects() started behaving differently resulting in a
> > whole node disappearing.
> > 
> > In terms of the urgency of fixing this it would also help to know how
> > prevalent the system this was found on is in the wild.
> 
> I have compared dmesg, proc/iomem of both success and fail case.
> 
> A. dmesg:
> 
> 1. Address ranges is different
> 2. extra message about printing Demotion target
> 
> Fallback order for Node 0: 0 1 2
> Fallback order for Node 1: 1 0 2
> Fallback order for Node 2: 2 0 1
> Built 3 zonelists, mobility grouping on.  Total pages: 66145521
> Policy zone: Normal
> ....
> Demotion targets for Node 0: preferred: 2, fallback: 2
> Demotion targets for Node 1: preferred: 2, fallback: 2
> Demotion targets for Node 2: null
> 
> B. /proc/iomem
> 
> $ vimdiff success fail
> 
>   4050000000-604fffffff : Soft Reserved 
>    |  164 4050000000-604fffffff : Soft Reserved
>    165   4050000000-604fffffff : CXL Window 0 
>         |  165   4050000000-604fffffff : CXL Window 0
>    166     4080000000-5fffffffff : dax1.0 
>         | 
> ------------------------------------------------------------------------
>    167       4080000000-5fffffffff : System RAM (kmem) 
>         | 
> --------------------------------------------------------------------

My eyes only know how to read unified diff (diff -u) format. Is this
saying that in the failure case the System RAM range for dax1.0 is
missing?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ