linux-kernel - Re: 5.1 and 5.1.1: BUG: unable to handle kernel paging request at ffffea0002030000

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190524114329.hujd3qvtusz6uyfk@butterfly.localdomain>
Date:   Fri, 24 May 2019 13:43:29 +0200
From:   Oleksandr Natalenko <oleksandr@...hat.com>
To:     Mel Gorman <mgorman@...hsingularity.net>
Cc:     Justin Piszcz <jpiszcz@...idpixels.com>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: 5.1 and 5.1.1: BUG: unable to handle kernel paging request at
 ffffea0002030000

On Tue, May 21, 2019 at 01:43:10PM +0100, Mel Gorman wrote:
> On Tue, May 21, 2019 at 05:01:06AM -0400, Justin Piszcz wrote:
> > On Mon, May 20, 2019 at 7:56 AM Mel Gorman <mgorman@...hsingularity.net> wrote:
> > >
> > > On Sun, May 12, 2019 at 04:27:45AM -0400, Justin Piszcz wrote:
> > > > Hello,
> > > >
> > > > I've turned off zram/zswap and I am still seeing the following during
> > > > periods of heavy I/O, I am returning to 5.0.xx in the meantime.
> > > >
> > > > Kernel: 5.1.1
> > > > Arch: x86_64
> > > > Dist: Debian x86_64
> > > >
> > > > [29967.019411] BUG: unable to handle kernel paging request at ffffea0002030000
> > > > [29967.019414] #PF error: [normal kernel read fault]
> > > > [29967.019415] PGD 103ffee067 P4D 103ffee067 PUD 103ffed067 PMD 0
> > > > [29967.019417] Oops: 0000 [#1] SMP PTI
> > > > [29967.019419] CPU: 10 PID: 77 Comm: khugepaged Tainted: G
> > > >    T 5.1.1 #4
> > > > [29967.019420] Hardware name: Supermicro X9SRL-F/X9SRL-F, BIOS 3.2 01/16/2015
> > > > [29967.019424] RIP: 0010:isolate_freepages_block+0xb9/0x310
> > > > [29967.019425] Code: 24 28 48 c1 e0 06 40 f6 c5 1f 48 89 44 24 20 49
> > > > 8d 45 79 48 89 44 24 18 44 89 f0 4d 89 ee 45 89 fd 41 89 c7 0f 84 ef
> > > > 00 00 00 <48> 8b 03 41 83 c4 01 a9 00 00 01 00 75 0c 48 8b 43 08 a8 01
> > > > 0f 84
> > >
> > > If you have debugging symbols installed, can you translate the faulting
> > > address with the following?
> > >
> > > ADDR=`nm /path/to/vmlinux-or-debuginfo-file | grep "t isolate_freepages_block\$" | awk '{print $1}'`
> > > addr2line -i -e vmlinux `printf "0x%lX" $((0x$ADDR+0xb9))`
> > 
> > Another event this morning, this occurred when copying a single ~25GB
> > backup file from one block device device (3ware HW RAID) to a SW
> > RAID-1 (mdadm):
> > 
> > With this event, it was a fault and khugepaged is not stuck at 100%
> > but this may be related as the stack trace is similar where
> > compaction_alloc is utilizing most of the CPU:
> > https://lkml.org/lkml/2019/5/9/225
> > 
> > # ADDR=`nm /usr/src/linux/vmlinux | grep "t isolate_freepages_block\$"
> > | awk '{print $1}'`
> > # echo $ADDR
> > ffffffff812274f0
> > # addr2line -i -e /usr/src/linux/vmlinux `printf "0x%lX" $((0x$ADDR+0x83d))`
> > compaction.c:?
> > # addr2line -i -e /usr/src/linux/vmlinux `printf "0x%lX" $((0x$ADDR+0x8d0))`
> > compaction.c:?
> > 
> 
> Please use the offset 0xb9
> 
> addr2line -i -e /usr/src/linux/vmlinux `printf "0x%lX" $((0x$ADDR+0xb9))
> 
> -- 
> Mel Gorman
> SUSE Labs

Cc'ing myself since i observe such a behaviour sometimes right after KVM
VM is launched. No luck with reproducing it on demand so far, though.

-- 
  Best regards,
    Oleksandr Natalenko (post-factum)
    Senior Software Maintenance Engineer