[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <66871c0a-d763-abfc-46f1-d6ce3441f9fb@suse.cz>
Date: Mon, 13 Jan 2020 14:44:50 +0100
From: Vlastimil Babka <vbabka@...e.cz>
To: Mikhail Gavrilov <mikhail.v.gavrilov@...il.com>,
Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
Linux Memory Management List <linux-mm@...ck.org>,
Christoph Lameter <cl@...ux.com>
Subject: Re: [BUG] RIP: 0010:__kmalloc+0xa8/0x330 (general protection fault:
0000 [#1] SMP NOPTI)
+CC Christoph
On 1/11/20 5:33 PM, Mikhail Gavrilov wrote:
> Hi folks, I caused a kernel panic by just starting downloading
> simultaneously several big files by Google Chrome browser and removing
> games in parallel in the Steam client (because the disk was almost
> full)
>
> general protection fault: 0000 [#1] SMP NOPTI
> CPU: 15 PID: 104506 Comm: Chrome_IOThread Not tainted
> 5.5.0-0.rc5.git3.2.fc32.x86_64 #1
> Hardware name: System manufacturer System Product Name/ROG STRIX
> X570-I GAMING, BIOS 1405 11/19/2019
> RIP: 0010:__kmalloc+0xa8/0x330
> Code: e3 01 00 00 4d 8b 06 65 49 8b 50 08 65 4c 03 05 be 91 cc 5e 4d
> 8b 38 4d 85 ff 0f 84 22 02 00 00 41 8b 5e 20 49 8b 3e 4c 01 fb <48> 33
> 1b 49 33 9e d0 01 00 00 40 f6 c7 0f 0f 85 1f 02 00 00 48 8d
> RSP: 0018:ffffa4428b6bfb00 EFLAGS: 00010206
> RAX: 0000000000000000 RBX: 669e19e5410de38b RCX: 0000000000000000
> RDX: 000000000016fcfc RSI: 0000000000000400 RDI: 00000000001f4080
> RBP: 0000000000000cc0 R08: ffff889a7c1f4080 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000011
> R13: ffff889a76c07800 R14: ffff889a76c07800 R15: 669e19e5410de38b
> FS: 00007fd5dc49d700(0000) GS:ffff889a7c000000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00001a0409c04000 CR3: 000000078042e000 CR4: 0000000000340ee0
> Call Trace:
> ? shmem_initxattrs+0x89/0xd0
> shmem_initxattrs+0x89/0xd0
> security_inode_init_security+0xf8/0x140
> ? shmem_enabled_store+0x1f0/0x1f0
> shmem_mknod+0x76/0xe0
> lookup_open+0x5bd/0x820
> path_openat+0x33d/0xc90
> ? touch_atime+0x33/0xe0
> do_filp_open+0x91/0x100
> ? _raw_spin_unlock+0x1f/0x30
> ? __alloc_fd+0xe9/0x1d0
> do_sys_open+0x184/0x220
> do_syscall_64+0x5c/0xa0
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x7fd5ee1d3134
> Code: 24 20 eb 8f 66 90 44 89 54 24 0c e8 26 4c f9 ff 44 8b 54 24 0c
> 44 89 e2 48 89 ee 41 89 c0 bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d
> 00 f0 ff ff 77 32 44 89 c7 89 44 24 0c e8 58 4c f9 ff 8b 44
> RSP: 002b:00007fd5dc49bc30 EFLAGS: 00000293 ORIG_RAX: 0000000000000101
> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd5ee1d3134
> RDX: 00000000000000c2 RSI: 00001baff2871330 RDI: 00000000ffffff9c
> RBP: 00001baff2871330 R08: 0000000000000000 R09: 00007fd5dc49bcd8
> R10: 0000000000000180 R11: 0000000000000293 R12: 00000000000000c2
> R13: 00007fd5ee272c60 R14: 00007fd5dc49bcd0 R15: 8421084210842109
> Modules linked in: uinput rfcomm xt_CHECKSUM xt_MASQUERADE
> xt_conntrack ipt_REJECT nf_nat_tftp nf_conntrack_tftp tun bridge stp
> llc nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast
> nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
> nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nf_tables_set
> nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat
> ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat
> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_mangle
> iptable_raw iptable_security ip_set nfnetlink ebtable_filter ebtables
> ip6table_filter ip6_tables iptable_filter cmac bnep sunrpc vfat fat
> snd_hda_codec_realtek snd_hda_codec_generic edac_mce_amd ledtrig_audio
> snd_hda_codec_hdmi iwlmvm snd_hda_intel kvm_amd snd_intel_dspcfg
> snd_usb_audio kvm snd_hda_codec snd_hda_core snd_usbmidi_lib btusb
> irqbypass snd_rawmidi mac80211 snd_hwdep uvcvideo btrtl snd_seq btbcm
> videobuf2_vmalloc btintel videobuf2_memops snd_seq_device
> videobuf2_v4l2 crct10dif_pclmul videobuf2_common bluetooth
> crc32_pclmul libarc4 snd_pcm videodev joydev iwlwifi eeepc_wmi xpad mc
> snd_timer ff_memless ghash_clmulni_intel asus_wmi ecdh_generic
> sparse_keymap ecc video sp5100_tco wmi_bmof pcspkr snd cfg80211
> k10temp ccp i2c_piix4 soundcore rfkill acpi_cpufreq binfmt_misc
> ip_tables hid_logitech_hidpp hid_logitech_dj amdgpu amd_iommu_v2
> gpu_sched ttm drm_kms_helper drm igb nvme crc32c_intel dca nvme_core
> i2c_algo_bit wmi pinctrl_amd fuse
> ---[ end trace 8503eed9a4b0cd11 ]---
> RIP: 0010:__kmalloc+0xa8/0x330
> Code: e3 01 00 00 4d 8b 06 65 49 8b 50 08 65 4c 03 05 be 91 cc 5e 4d
> 8b 38 4d 85 ff 0f 84 22 02 00 00 41 8b 5e 20 49 8b 3e 4c 01 fb <48> 33
> 1b 49 33 9e d0 01 00 00 40 f6 c7 0f 0f 85 1f 02 00 00 48 8d
> RSP: 0018:ffffa4428b6bfb00 EFLAGS: 00010206
> RAX: 0000000000000000 RBX: 669e19e5410de38b RCX: 0000000000000000
> RDX: 000000000016fcfc RSI: 0000000000000400 RDI: 00000000001f4080
> RBP: 0000000000000cc0 R08: ffff889a7c1f4080 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000011
> R13: ffff889a76c07800 R14: ffff889a76c07800 R15: 669e19e5410de38b
> FS: 00007fd5dc49d700(0000) GS:ffff889a7c000000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00001a0409c04000 CR3: 000000078042e000 CR4: 0000000000340ee0
>
> I don’t think that git bisect is really possible here because the
> state on the disk will be different each time (there are no more
> deleted files) and there is no exact case that would reproduce the
> error.
But is the bug reproducible at all? And is it always the same stack trace?
> $ /usr/src/kernels/`uname -r`/scripts/faddr2line
> /lib/debug/lib/modules/`uname -r`/vmlinux __kmalloc+0xa8/0x330
> __kmalloc+0xa8/0x330:
> freelist_ptr at mm/slub.c:261
> (inlined by) freelist_dereference at mm/slub.c:272
> (inlined by) get_freepointer at mm/slub.c:278
> (inlined by) get_freepointer_safe at mm/slub.c:292
> (inlined by) slab_alloc_node at mm/slub.c:2726
> (inlined by) slab_alloc at mm/slub.c:2767
> (inlined by) __kmalloc at mm/slub.c:3799
>
> From the trace, I see that the problem comes from mm/slub.c so I added
> this report in the linux-mm mailing list please correct me if I'm
> wrong.
SLUB is most likely just a victim of somebody else doing something wrong
with kmalloced objects. You can boot with extra debugging that could
tell us more, i.e. add this boot kernel parameter:
slub_debug=FU,kmalloc-*
Or a more thorough version, but making the system even slower:
slub_debug=FZPU,kmalloc-*
Vlastimil
> --
> Best Regards,
> Mike Gavrilov.
>
Powered by blists - more mailing lists