linux-kernel - Re: 5.1 and 5.1.1: BUG: unable to handle kernel paging request at ffffea0002030000

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190521124310.GM18914@techsingularity.net>
Date:   Tue, 21 May 2019 13:43:10 +0100
From:   Mel Gorman <mgorman@...hsingularity.net>
To:     Justin Piszcz <jpiszcz@...idpixels.com>
Cc:     LKML <linux-kernel@...r.kernel.org>
Subject: Re: 5.1 and 5.1.1: BUG: unable to handle kernel paging request at
 ffffea0002030000

On Tue, May 21, 2019 at 05:01:06AM -0400, Justin Piszcz wrote:
> On Mon, May 20, 2019 at 7:56 AM Mel Gorman <mgorman@...hsingularity.net> wrote:
> >
> > On Sun, May 12, 2019 at 04:27:45AM -0400, Justin Piszcz wrote:
> > > Hello,
> > >
> > > I've turned off zram/zswap and I am still seeing the following during
> > > periods of heavy I/O, I am returning to 5.0.xx in the meantime.
> > >
> > > Kernel: 5.1.1
> > > Arch: x86_64
> > > Dist: Debian x86_64
> > >
> > > [29967.019411] BUG: unable to handle kernel paging request at ffffea0002030000
> > > [29967.019414] #PF error: [normal kernel read fault]
> > > [29967.019415] PGD 103ffee067 P4D 103ffee067 PUD 103ffed067 PMD 0
> > > [29967.019417] Oops: 0000 [#1] SMP PTI
> > > [29967.019419] CPU: 10 PID: 77 Comm: khugepaged Tainted: G
> > >    T 5.1.1 #4
> > > [29967.019420] Hardware name: Supermicro X9SRL-F/X9SRL-F, BIOS 3.2 01/16/2015
> > > [29967.019424] RIP: 0010:isolate_freepages_block+0xb9/0x310
> > > [29967.019425] Code: 24 28 48 c1 e0 06 40 f6 c5 1f 48 89 44 24 20 49
> > > 8d 45 79 48 89 44 24 18 44 89 f0 4d 89 ee 45 89 fd 41 89 c7 0f 84 ef
> > > 00 00 00 <48> 8b 03 41 83 c4 01 a9 00 00 01 00 75 0c 48 8b 43 08 a8 01
> > > 0f 84
> >
> > If you have debugging symbols installed, can you translate the faulting
> > address with the following?
> >
> > ADDR=`nm /path/to/vmlinux-or-debuginfo-file | grep "t isolate_freepages_block\$" | awk '{print $1}'`
> > addr2line -i -e vmlinux `printf "0x%lX" $((0x$ADDR+0xb9))`
> 
> Another event this morning, this occurred when copying a single ~25GB
> backup file from one block device device (3ware HW RAID) to a SW
> RAID-1 (mdadm):
> 
> With this event, it was a fault and khugepaged is not stuck at 100%
> but this may be related as the stack trace is similar where
> compaction_alloc is utilizing most of the CPU:
> https://lkml.org/lkml/2019/5/9/225
> 
> # ADDR=`nm /usr/src/linux/vmlinux | grep "t isolate_freepages_block\$"
> | awk '{print $1}'`
> # echo $ADDR
> ffffffff812274f0
> # addr2line -i -e /usr/src/linux/vmlinux `printf "0x%lX" $((0x$ADDR+0x83d))`
> compaction.c:?
> # addr2line -i -e /usr/src/linux/vmlinux `printf "0x%lX" $((0x$ADDR+0x8d0))`
> compaction.c:?
> 

Please use the offset 0xb9

addr2line -i -e /usr/src/linux/vmlinux `printf "0x%lX" $((0x$ADDR+0xb9))

-- 
Mel Gorman
SUSE Labs