[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAd53p5YhCpFgHat6Qv+T6id53NhJ=5W85wVeJvO6BW_W06kFg@mail.gmail.com>
Date: Wed, 17 May 2023 15:47:24 +0800
From: Kai-Heng Feng <kai.heng.feng@...onical.com>
To: "Zhuo, Qiuxu" <qiuxu.zhuo@...el.com>
Cc: "Luck, Tony" <tony.luck@...el.com>,
"kao, acelan" <acelan.kao@...onical.com>,
Borislav Petkov <bp@...en8.de>,
James Morse <james.morse@....com>,
Mauro Carvalho Chehab <mchehab@...nel.org>,
Robert Richter <rric@...nel.org>,
"linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] EDAC/Intel: Fix shift-out-of-bounds when DIMM/NVDIMM is absent
On Tue, May 16, 2023 at 8:53 PM Zhuo, Qiuxu <qiuxu.zhuo@...el.com> wrote:
>
> > From: Kai-Heng Feng <kai.heng.feng@...onical.com>
> > ...
> > Subject: [PATCH] EDAC/Intel: Fix shift-out-of-bounds when DIMM/NVDIMM
> > is absent
> >
> > The following splat can be found on many systems equipped with EDAC:
> > [ 13.875276] UBSAN: shift-out-of-bounds in
> > drivers/edac/skx_common.c:369:16
> > [ 13.875279] shift exponent -66 is negative
> > [ 13.875280] CPU: 11 PID: 519 Comm: systemd-udevd Not tainted 6.4.0-rc1+
> > #1
> > [ 13.875282] Hardware name: HP HP Z4 G5 Workstation Desktop PC/8962,
> > BIOS U61 Ver. 01.01.15 04/19/2023
> > [ 13.875283] Call Trace:
> > [ 13.875285] <TASK>
> > [ 13.875287] dump_stack_lvl+0x48/0x70
> > [ 13.875295] dump_stack+0x10/0x20
> > [ 13.875297] __ubsan_handle_shift_out_of_bounds+0x156/0x310
> > [ 13.875302] ? __kmem_cache_alloc_node+0x196/0x300
> > [ 13.875307] skx_get_dimm_info.cold+0xac/0x15d [i10nm_edac]
> > [ 13.875312] i10nm_get_dimm_config+0x240/0x360 [i10nm_edac]
> > [ 13.875316] ? kasprintf+0x4e/0x80
> > [ 13.875321] skx_register_mci+0x12b/0x1d0 [i10nm_edac]
> > [ 13.875324] ? __pfx_i10nm_get_dimm_config+0x10/0x10 [i10nm_edac]
> > [ 13.875329] i10nm_init+0x89f/0x1d10 [i10nm_edac]
> > [ 13.875333] ? __pfx_i10nm_init+0x10/0x10 [i10nm_edac]
> > [ 13.875337] do_one_initcall+0x46/0x240
> > [ 13.875342] ? kmalloc_trace+0x2a/0xb0
> > [ 13.875346] do_init_module+0x6a/0x280
> > [ 13.875350] load_module+0x2419/0x2500
> > [ 13.875353] ? security_kernel_post_read_file+0x5c/0x80
> > [ 13.875358] __do_sys_finit_module+0xcc/0x150
> > [ 13.875360] ? __do_sys_finit_module+0xcc/0x150
> > [ 13.875363] __x64_sys_finit_module+0x18/0x30
> > [ 13.875365] do_syscall_64+0x59/0x90
> > [ 13.875368] ? syscall_exit_to_user_mode+0x2a/0x50
> > [ 13.875371] ? do_syscall_64+0x69/0x90
> > [ 13.875372] ? do_syscall_64+0x69/0x90
> > [ 13.875373] ? do_syscall_64+0x69/0x90
> > [ 13.875374] ? do_syscall_64+0x69/0x90
> > [ 13.875375] ? syscall_exit_to_user_mode+0x2a/0x50
> > [ 13.875376] ? do_syscall_64+0x69/0x90
> > [ 13.875377] ? do_syscall_64+0x69/0x90
> > [ 13.875378] ? do_syscall_64+0x69/0x90
> > [ 13.875379] ? sysvec_call_function+0x4e/0xb0
> > [ 13.875381] entry_SYSCALL_64_after_hwframe+0x72/0xdc
> >
> > When a DIMM slot is empty, the read value of mtr can be 0xffffffff, therefore
>
> Looked like a buggy BIOS/hw that didn't set the mtr register.
If that's the case, I suspect the bug comes from Intel BIOS RC,
because the issue happens on different vendors' hardware.
>
> 1. Did you print the mtr register whose value was 0xffffffff?
Yes, 0xffffffff is the value. mcddrtcfg is also 0xffffffff.
> 2. Can you take a dmesg log with kernel "CONFIG_EDAC_DEBUG=y" enabled?
> 3. What was the CPU? Please take the output of "lscpu".
Both attached in Bugzlla [1].
> 4. Did you verify your patch that the issue was fixed on your systems?
I did, that's why I sent the patch to mailing list.
[1] https://bugzilla.kernel.org/show_bug.cgi?id=217453
Kai-Heng
>
> Thanks!
> -Qiuxu
>
> > the wrong "ranks" value creates shift-out-of-bounds error. The same issue
> > can be found on NVDIMM too.
> >
> > So only consider DIMM/NVDIMM is present when the value of
> > mtr/mcddrtcfg is not ~0.
> > ...
Powered by blists - more mailing lists