[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <202308021529.35b3ad6c-oliver.sang@intel.com>
Date: Wed, 2 Aug 2023 16:15:28 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Tony Lindgren <tony@...mide.com>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>,
Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
<linux-kernel@...r.kernel.org>, <linux-serial@...r.kernel.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Jiri Slaby <jirislaby@...nel.org>,
"Andy Shevchenko" <andriy.shevchenko@...el.com>,
Dhruva Gole <d-gole@...com>,
Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>,
John Ogness <john.ogness@...utronix.de>,
Johan Hovold <johan@...nel.org>,
"Sebastian Andrzej Siewior" <bigeasy@...utronix.de>,
Vignesh Raghavendra <vigneshr@...com>, <oliver.sang@...el.com>
Subject: Re: [PATCH v5 3/3] serial: core: Fix serial core controller port
name to show controller id
Hello,
kernel test robot noticed machine hang on:
commit: 4de64f4800a581e7eeba6392b3b2ce2131195145 ("[PATCH v5 3/3] serial: core: Fix serial core controller port name to show controller id")
url: https://github.com/intel-lab-lkp/linux/commits/Tony-Lindgren/serial-core-Controller-id-cannot-be-negative/20230725-134452
base: https://git.kernel.org/cgit/linux/kernel/git/gregkh/tty.git tty-testing
patch link: https://lore.kernel.org/all/20230725054216.45696-4-tony@atomide.com/
patch subject: [PATCH v5 3/3] serial: core: Fix serial core controller port name to show controller id
in testcase: boot
compiler: gcc-12
test machine: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with 512G memory
(please refer to attached dmesg/kmsg for entire log/backtrace)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@...el.com>
| Closes: https://lore.kernel.org/oe-lkp/202308021529.35b3ad6c-oliver.sang@intel.com
from serial, we observed last print out is:
[ 15.584772][ T954] EDAC MC0: Giving out device to module skx_edac controller Skylake Socket#0 IMC#0: DEV 0000:3a:0a.0 (INTERRUPT)
[ 15.597328][ T954] EDAC MC1: Giving out device to module skx_edac controller Skylake Socket#0 IMC#1: DEV 0000:3a:0c.0 (INTERRUPT)
[ 15.610326][ T954] EDAC MC2: Giving out device to module skx_edac controller Skylake Socket#1 IMC#0: DEV 0000:ae:0a.0 (INTERRUPT)
[ 15.623375][ T954] EDAC MC3: Giving out device to module skx_edac controller Skylake Socket#1 IMC#1: DEV 0000:ae:0c.0 (INTERRUPT)
[ 15.640145][ T19] intel_rapl_common: Found RAPL domain package
[ 15.655890][ T19] intel_rapl_common: Found RAPL domain dram
[ 15.661983][ T19] intel_rapl_common: package-0:package:long_term locked by BIOS
[ 15.678564][ T19] intel_rapl_common: package-0:package:short_term locked by BIOS
[ 15.695259][ T19] intel_rapl_common: package-0:dram:long_term locked by BIOS
[ 15.713068][ T158] intel_rapl_common: Found RAPL domain package
[ 15.728719][ T158] intel_rapl_common: Found RAPL domain dram
[ 15.734743][ T158] intel_rapl_common: package-1:package:long_term locked by BIOS
[ 15.745244][ T1154] raid6: avx512x4 gen() 18153 MB/s
[ 15.761297][ T158] intel_rapl_common: package-1:package:short_term locked by BIOS
[ 15.767244][ T1154] raid6: avx512x2 gen() 18130 MB/s
[ 15.768866][ T158] intel_rapl_common: package-1:dram:long_term locked by BIOS
[ 15.790243][ T1154] raid6: avx512x1 gen() 18155 MB/s
[ 15.812245][ T1154] raid6: avx2x4 gen() 18060 MB/s
[ 15.834244][ T1154] raid6: avx2x2 gen() 18076 MB/s
[ 15.856244][ T1154] raid6: avx2x1 gen() 13836 MB/s
[ 15.861474][ T1154] raid6: using algorithm avx512x1 gen() 18155 MB/s
[ 15.884243][ T1154] raid6: .... xor() 27974 MB/s, rmw enabled
[ 15.890254][ T1154] raid6: using avx512x2 recovery algorithm
[ 15.897891][ T1154] xor: measuring software checksum speed
[ 15.904013][ T1154] prefetch64-sse : 31308 MB/sec
[ 15.909878][ T1154] generic_sse : 22929 MB/sec
[ 15.915230][ T1154] xor: using function: prefetch64-sse (31308 MB/sec)
[ 16.042623][ T1154] Btrfs loaded, zoned=no, fsverity=no
[ 16.054593][ T930] BTRFS: device fsid e422031c-19be-42f5-ab4f-be5f306aa6e1 devid 1 transid 39725 /dev/sda2 scanned by systemd-udevd (930)
then the machine is just stuck there. (whole dmesg captured from serial is
attached), and the issue is 100% reproducible for this commit.
for parent, we never observed the boot failure.
it looks quite strange to us why this commit could cause this behavior on our
machine. could you help check dmesg, config and kernel command line which is
also captured in dmesg, etc. and guide us if anything need to be updated to be
compatible with this change? Thanks!
To reproduce:
# build kernel
cd linux
cp config-6.5.0-rc2-00003-g4de64f4800a5 .config
make HOSTCC=gcc-12 CC=gcc-12 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules
make HOSTCC=gcc-12 CC=gcc-12 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install
cd <mod-install-dir>
find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email
# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
View attachment "config-6.5.0-rc2-00003-g4de64f4800a5" of type "text/plain" (159815 bytes)
View attachment "job-script" of type "text/plain" (5424 bytes)
Download attachment "dmesg.xz" of type "application/x-xz" (24624 bytes)
Powered by blists - more mailing lists