[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <9bc70c0d-00b3-ff5a-e8bd-d4315e367fad@intel.com>
Date: Tue, 16 Jun 2020 10:50:05 +0800
From: Rong Chen <rong.a.chen@...el.com>
To: Catalin Marinas <catalin.marinas@....com>
Cc: Nicolas Boichat <drinkcat@...omium.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Dmitry Vyukov <dvyukov@...gle.com>,
Masahiro Yamada <yamada.masahiro@...ionext.com>,
Kees Cook <keescook@...omium.org>,
Petr Mladek <pmladek@...e.com>,
Thomas Gleixner <tglx@...utronix.de>,
Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
Joe Lawrence <joe.lawrence@...hat.com>,
Uladzislau Rezki <urezki@...il.com>,
Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
Stephen Rothwell <sfr@...b.auug.org.au>,
Andrey Ryabinin <aryabinin@...tuozzo.com>,
Andrew Morton <akpm@...ux-foundation.org>,
LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org
Subject: Re: [kmemleak] b751c52bb5: BUG:kernel_hang_in_boot_stage
On 6/10/20 6:56 PM, Catalin Marinas wrote:
> On Wed, Jun 10, 2020 at 03:51:56PM +0800, kernel test robot wrote:
>> FYI, we noticed the following commit (built with gcc-7):
>>
>> commit: b751c52bb587ae66f773b15204ef7a147467f4c7 ("kmemleak: increase DEBUG_KMEMLEAK_EARLY_LOG_SIZE default to 16K")
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>>
>> in testcase: boot
>>
>> on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 8G
> [...]
>> BUG: kernel hang in boot stage
>>
>> To reproduce:
>>
>> # build kernel
>> cd linux
>> cp config-5.3.0-11789-gb751c52bb587a .config
>> make HOSTCC=gcc-7 CC=gcc-7 ARCH=i386 olddefconfig prepare modules_prepare bzImage
> I've never tried kmemleak on i386.
>
> Anyway, I'm not sure what caused the hang (or whether it's a hang at
> all) but I suspect prior to the above commit, kmemleak probably just
> disabled itself (early log buffer exceeded). So the bug may have been
> there already, only that kmemleak started working and tripped over it
> when the log buffer increased.
Hi,
Sorry for the late, I can reproduce the problem with command "bin/lkp
qemu -k <bzImage> job-script",
and the kernel hangs:
[ 0.333897] -----------------------------------------------------
[ 0.334561] |block | try |context|
[ 0.335170] -----------------------------------------------------
[ 0.335760] context: ok | ok | ok |
[ 0.337995] try: ok | ok | ok |
[ 0.340089] block: ok | ok | ok |
[ 0.342175] spinlock: ok | ok | ok |
[ 0.344481] -------------------------------------------------------
[ 0.345068] Good, all 261 testcases passed! |
[ 0.345514] ---------------------------------
KVM internal error. Suberror: 3
extra data[0]: 80000b0e
extra data[1]: 31
extra data[2]: 182
extra data[3]: bfff0
EAX=00000000 EBX=00200297 ECX=00000000 EDX=ffffffff
ESI=d2e997c0 EDI=d2e997f0 EBP=d2bbb038 ESP=c00bfff4
EIP=f4dccd57 EFL=00210046 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =007b 00000000 ffffffff 00c0f300 DPL=3 DS [-WA]
CS =0060 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA]
SS =0068 00000000 ffffffff 00c09300 DPL=0 DS [-WA]
DS =007b 00000000 ffffffff 00c0f300 DPL=3 DS [-WA]
FS =00d8 23331000 ffffffff 00809300 DPL=0 DS16 [-WA]
GS =00e0 f6422900 00000018 00409100 DPL=0 DS [--A]
LDT=0000 00000000 ffffffff 00c00000
TR =0080 ff403000 0000206b 00008b00 DPL=0 TSS32-busy
GDT= ff401000 000000ff
IDT= ff400000 000007ff
CR0=80050033 CR2=00000000 CR3=130fc000 CR4=00000690
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
DR3=0000000000000000
DR6=00000000fffe0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00
>
> Is there a chance that the kernel got much slower with kmemleak enabled
> and the test scripts timed out?
no, the parent commit log is:
[ 0.313845] -----------------------------------------------------
[ 0.314608] |block | try |context|
[ 0.315314] -----------------------------------------------------
[ 0.315974] context: ok | ok | ok |
[ 0.318261] try: ok | ok | ok |
[ 0.320478] block: ok | ok | ok |
[ 0.322562] spinlock: ok | ok | ok |
[ 0.324825] -------------------------------------------------------
[ 0.325403] Good, all 261 testcases passed! |
[ 0.325809] ---------------------------------
[ 0.326221] kmemleak: Early log buffer exceeded (401), please
increase DEBUG_KMEMLEAK_EARLY_LOG_SIZE
[ 0.327065] ACPI: Core revision 20190816
[ 0.327585] clocksource: hpet: mask: 0xffffffff max_cycles:
0xffffffff, max_idle_ns: 19112604467 ns
[ 0.328545] APIC: Switch to symmetric I/O mode setup
[ 0.329009] Enabling APIC mode: Flat. Using 1 I/O APICs
[ 0.329572] masked ExtINT on CPU#0
[ 0.330686] ENABLING IO-APIC IRQs
[ 0.331001] init IO_APIC IRQs
[ 0.331274] apic 0 pin 0 not connected
>
> Does this problem still exist with the latest mainline?
yes, still in v5.7.
Best Regards,
Rong Chen
Powered by blists - more mailing lists