[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAd53p56=CpWpPEOD2YdCneJX-XxO93MHMQHbLRB7VCYweW7SQ@mail.gmail.com>
Date: Wed, 17 May 2023 15:49:25 +0800
From: Kai-Heng Feng <kai.heng.feng@...onical.com>
To: "Luck, Tony" <tony.luck@...el.com>
Cc: "Zhuo, Qiuxu" <qiuxu.zhuo@...el.com>,
"kao, acelan" <acelan.kao@...onical.com>,
Borislav Petkov <bp@...en8.de>,
James Morse <james.morse@....com>,
Mauro Carvalho Chehab <mchehab@...nel.org>,
Robert Richter <rric@...nel.org>,
"linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] EDAC/Intel: Fix shift-out-of-bounds when DIMM/NVDIMM is absent
On Wed, May 17, 2023 at 1:13 AM Luck, Tony <tony.luck@...el.com> wrote:
>
> >> [ 13.875282] Hardware name: HP HP Z4 G5 Workstation Desktop PC/8962,
> > > BIOS U61 Ver. 01.01.15 04/19/2023
>
>
> >> When a DIMM slot is empty, the read value of mtr can be 0xffffffff, therefore
>
> > Looked like a buggy BIOS/hw that didn't set the mtr register.
> >
> > 1. Did you print the mtr register whose value was 0xffffffff?
> > 2. Can you take a dmesg log with kernel "CONFIG_EDAC_DEBUG=y" enabled?
> > 3. What was the CPU? Please take the output of "lscpu".
> > 4. Did you verify your patch that the issue was fixed on your systems?
>
> I wonder if BIOS is "hiding" some devices from the OS? The 0xffffffff return is
> the standard PCI response for reading a non-existent register. But that doesn't
> quite make sense with having a "dimm present" bit in the MTR register. If
> the register only exists if the DIMM is present, then there is no need for
> a "dimm present" bit.
I wonder if the "non-existent register" read is intended?
>
> Some "lspci" output may also be useful.
lspci can be found in [1]:
[1] https://bugzilla.kernel.org/show_bug.cgi?id=217453
Kai-Heng
>
> -Tony
Powered by blists - more mailing lists