linux-kernel - Re: [next-20180601][nvme][ppc] Kernel Oops is triggered when creating lvm snapshots on nvme disks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <1530176707.24245.12.camel@abdul.in.ibm.com>
Date:   Thu, 28 Jun 2018 14:35:07 +0530
From:   Abdul Haleem <abdhalee@...ux.vnet.ibm.com>
To:     Michael Ellerman <mpe@...erman.id.au>
Cc:     linuxppc-dev <linuxppc-dev@...ts.ozlabs.org>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        linux-next <linux-next@...r.kernel.org>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        linux-scsi <linux-scsi@...r.kernel.org>,
        Stephen Rothwell <sfr@...b.auug.org.au>,
        sachinp <sachinp@...ux.vnet.ibm.com>,
        sim <sim@...ux.vnet.ibm.com>,
        manvanth <manvanth@...ux.vnet.ibm.com>,
        Brian King <brking@...ux.vnet.ibm.com>
Subject: Re: [next-20180601][nvme][ppc] Kernel Oops is triggered when
 creating lvm snapshots on nvme disks

On Tue, 2018-06-26 at 23:36 +1000, Michael Ellerman wrote:
> Abdul Haleem <abdhalee@...ux.vnet.ibm.com> writes:
> 
> > Greeting's
> >
> > Kernel Oops is seen on 4.17.0-rc7-next-20180601 kernel on a bare-metal
> > machine when running lvm snapshot tests on nvme disks.
> >
> > Machine Type: Power 8 bare-metal
> > kernel : 4.17.0-rc7-next-20180601
> > test:  
> > $ pvcreate -y /dev/nvme0n1
> > $ vgcreate avocado_vg /dev/nvme0n1
> > $ lvcreate --size 1.4T --name avocado_lv avocado_vg -y
> > $ mkfs.ext2 /dev/avocado_vg/avocado_lv
> > $ lvcreate --size 1G --snapshot --name avocado_sn /dev/avocado_vg/avocado_lv -y
> > $ lvconvert --merge /dev/avocado_vg/avocado_sn
> 
> > the last command results in Oops:
> >
> > Unable to handle kernel paging request for data at address 0x000000d0
> > Faulting instruction address: 0xc0000000002dced4
> > Oops: Kernel access of bad area, sig: 11 [#1]
> > LE SMP NR_CPUS=2048 NUMA PowerNV
> > Dumping ftrace buffer:
> >    (ftrace buffer empty)
> > Modules linked in: dm_snapshot dm_bufio nvme bnx2x iptable_mangle
> > ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4
> > nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4
> > xt_tcpudp tun bridge stp llc iptable_filter dm_mirror dm_region_hash
> > dm_log dm_service_time vmx_crypto powernv_rng rng_core dm_multipath
> > kvm_hv binfmt_misc kvm nfsd ip_tables x_tables autofs4 xfs lpfc
> > crc_t10dif crct10dif_generic mdio nvme_fc libcrc32c nvme_fabrics
> > nvme_core crct10dif_common [last unloaded: nvme]
> > CPU: 70 PID: 157763 Comm: lvconvert Not tainted 4.17.0-rc7-next-20180601-autotest-autotest #1
> > NIP:  c0000000002dced4 LR: c000000000244d14 CTR: c000000000244cf0
> > REGS: c000001f81d6b5a0 TRAP: 0300   Not tainted  (4.17.0-rc7-next-20180601-autotest-autotest)
> > MSR:  900000010280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>  CR: 22442444  XER: 20000000
> > CFAR: c000000000008934 DAR: 00000000000000d0 DSISR: 40000000 SOFTE: 0
> > GPR00: c000000000244d14 c000001f81d6b820 c00000000109c400 c000003c9d080180
> > GPR04: 0000000000000001 c000001fad510000 c000001fad510000 0000000000000001
> > GPR08: 0000000000000000 f000000000000000 f000000000000008 0000000000000000
> > GPR12: c000000000244cf0 c000001ffffc4f80 00007fffa0e31090 00007fffd9d9b470
> > GPR16: 0000000000000000 000000000000005c 00007fffa0e3a5b0 00007fffa0e62040
> > GPR20: 0000010014ad7d50 0000010014ad7d20 00007fffa0e64210 0000000000000001
> > GPR24: 0000000000000000 c00000000081bae0 c000001ed2461b00 d00000000f859d08
> > GPR28: c000003c9d080180 c000000000244d14 0000000000000001 0000000000000000
> > NIP [c0000000002dced4] kmem_cache_free+0x1a4/0x2b0
> > LR [c000000000244d14] mempool_free_slab+0x24/0x40
> 
> Are you running with slub debugging enabled?
> Try booting with slub_debug=FZP

I was able to reproduce again with slub_debug=FZP and DEBUG_INFO enabled
on 4.17.0-rc7-next-20180601, but not much traces other than the Oops
stack trace

cat /proc/cmdline
rw,slub_debug=FZP root=UUID=e62c58bb-2824-4075-a31d-455f1bb62504 

.config
CONFIG_SLUB_DEBUG=y
CONFIG_SLUB=y
CONFIG_SLUB_CPU_PARTIAL=y
CONFIG_SLUB_DEBUG_ON=y
CONFIG_SLUB_STATS=y


the faulty instruction points to below code path :

gdb -batch vmlinux -ex 'list *(0xc000000000304fe0)'
0xc000000000304fe0 is in kmem_cache_free (mm/slab.h:231).
226	}
227	
228	static inline bool slab_equal_or_root(struct kmem_cache *s,
229					      struct kmem_cache *p)
230	{
231		return p == s || p == s->memcg_params.root_cache;
232	}
233	
234	/*
235	 * We use suffixes to the name in memcg because we can't have caches

detailed dmesg logs attached.

-- 
Regard's

Abdul Haleem
IBM Linux Technology Centre



View attachment "dmesg-slubon.txt" of type "text/plain" (93399 bytes)