lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <202308021529.35b3ad6c-oliver.sang@intel.com>
Date:   Wed, 2 Aug 2023 16:15:28 +0800
From:   kernel test robot <oliver.sang@...el.com>
To:     Tony Lindgren <tony@...mide.com>
CC:     <oe-lkp@...ts.linux.dev>, <lkp@...el.com>,
        Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
        <linux-kernel@...r.kernel.org>, <linux-serial@...r.kernel.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Jiri Slaby <jirislaby@...nel.org>,
        "Andy Shevchenko" <andriy.shevchenko@...el.com>,
        Dhruva Gole <d-gole@...com>,
        Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>,
        John Ogness <john.ogness@...utronix.de>,
        Johan Hovold <johan@...nel.org>,
        "Sebastian Andrzej Siewior" <bigeasy@...utronix.de>,
        Vignesh Raghavendra <vigneshr@...com>, <oliver.sang@...el.com>
Subject: Re: [PATCH v5 3/3] serial: core: Fix serial core controller port
 name to show controller id



Hello,

kernel test robot noticed machine hang on:

commit: 4de64f4800a581e7eeba6392b3b2ce2131195145 ("[PATCH v5 3/3] serial: core: Fix serial core controller port name to show controller id")
url: https://github.com/intel-lab-lkp/linux/commits/Tony-Lindgren/serial-core-Controller-id-cannot-be-negative/20230725-134452
base: https://git.kernel.org/cgit/linux/kernel/git/gregkh/tty.git tty-testing
patch link: https://lore.kernel.org/all/20230725054216.45696-4-tony@atomide.com/
patch subject: [PATCH v5 3/3] serial: core: Fix serial core controller port name to show controller id

in testcase: boot

compiler: gcc-12
test machine: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with 512G memory

(please refer to attached dmesg/kmsg for entire log/backtrace)




If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@...el.com>
| Closes: https://lore.kernel.org/oe-lkp/202308021529.35b3ad6c-oliver.sang@intel.com



from serial, we observed last print out is:

[   15.584772][  T954] EDAC MC0: Giving out device to module skx_edac controller Skylake Socket#0 IMC#0: DEV 0000:3a:0a.0 (INTERRUPT)
[   15.597328][  T954] EDAC MC1: Giving out device to module skx_edac controller Skylake Socket#0 IMC#1: DEV 0000:3a:0c.0 (INTERRUPT)
[   15.610326][  T954] EDAC MC2: Giving out device to module skx_edac controller Skylake Socket#1 IMC#0: DEV 0000:ae:0a.0 (INTERRUPT)
[   15.623375][  T954] EDAC MC3: Giving out device to module skx_edac controller Skylake Socket#1 IMC#1: DEV 0000:ae:0c.0 (INTERRUPT)
[   15.640145][   T19] intel_rapl_common: Found RAPL domain package
[   15.655890][   T19] intel_rapl_common: Found RAPL domain dram
[   15.661983][   T19] intel_rapl_common: package-0:package:long_term locked by BIOS
[   15.678564][   T19] intel_rapl_common: package-0:package:short_term locked by BIOS
[   15.695259][   T19] intel_rapl_common: package-0:dram:long_term locked by BIOS
[   15.713068][  T158] intel_rapl_common: Found RAPL domain package
[   15.728719][  T158] intel_rapl_common: Found RAPL domain dram
[   15.734743][  T158] intel_rapl_common: package-1:package:long_term locked by BIOS
[   15.745244][ T1154] raid6: avx512x4 gen() 18153 MB/s
[   15.761297][  T158] intel_rapl_common: package-1:package:short_term locked by BIOS
[   15.767244][ T1154] raid6: avx512x2 gen() 18130 MB/s
[   15.768866][  T158] intel_rapl_common: package-1:dram:long_term locked by BIOS
[   15.790243][ T1154] raid6: avx512x1 gen() 18155 MB/s
[   15.812245][ T1154] raid6: avx2x4   gen() 18060 MB/s
[   15.834244][ T1154] raid6: avx2x2   gen() 18076 MB/s
[   15.856244][ T1154] raid6: avx2x1   gen() 13836 MB/s
[   15.861474][ T1154] raid6: using algorithm avx512x1 gen() 18155 MB/s
[   15.884243][ T1154] raid6: .... xor() 27974 MB/s, rmw enabled
[   15.890254][ T1154] raid6: using avx512x2 recovery algorithm
[   15.897891][ T1154] xor: measuring software checksum speed
[   15.904013][ T1154]    prefetch64-sse  : 31308 MB/sec
[   15.909878][ T1154]    generic_sse     : 22929 MB/sec
[   15.915230][ T1154] xor: using function: prefetch64-sse (31308 MB/sec)
[   16.042623][ T1154] Btrfs loaded, zoned=no, fsverity=no
[   16.054593][  T930] BTRFS: device fsid e422031c-19be-42f5-ab4f-be5f306aa6e1 devid 1 transid 39725 /dev/sda2 scanned by systemd-udevd (930)


then the machine is just stuck there. (whole dmesg captured from serial is
attached), and the issue is 100% reproducible for this commit.

for parent, we never observed the boot failure.

it looks quite strange to us why this commit could cause this behavior on our
machine. could you help check dmesg, config and kernel command line which is
also captured in dmesg, etc. and guide us if anything need to be updated to be
compatible with this change? Thanks!



To reproduce:

        # build kernel
	cd linux
	cp config-6.5.0-rc2-00003-g4de64f4800a5 .config
	make HOSTCC=gcc-12 CC=gcc-12 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules
	make HOSTCC=gcc-12 CC=gcc-12 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install
	cd <mod-install-dir>
	find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz


        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email

        # if come across any failure that blocks the test,
        # please remove ~/.lkp and /lkp dir to run from a clean state.



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki



View attachment "config-6.5.0-rc2-00003-g4de64f4800a5" of type "text/plain" (159815 bytes)

View attachment "job-script" of type "text/plain" (5424 bytes)

Download attachment "dmesg.xz" of type "application/x-xz" (24624 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ