[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a6c98d75-078a-797f-a582-9687324e8c02@intel.com>
Date:   Mon, 3 Jun 2019 14:07:26 -0700
From:   Dave Hansen <dave.hansen@...el.com>
To:     Stephen Rothwell <sfr@...b.auug.org.au>,
        Matthew Wilcox <willy@...radead.org>
Cc:     Linux Next Mailing List <linux-next@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: linux-next: runtime failure of next-20190603
On 6/2/19 11:22 PM, Stephen Rothwell wrote:
> My qemu powerpc_pseries_le_defconfig boots failed today with the
> following output at shutdown time:
...
> [   32.112430] NIP [c000000000bbeb38] xas_load+0x8/0xd0
...
> Reverting commit
> 
> fa858b6eec3f ("XArray: Add xas_replace")
> 
> made the failure go away.
I'm seeing a similar softlockup:
> [124384.345395] watchdog: BUG: soft lockup - CPU#1 stuck for 22s!
> [TaskSchedulerFo:22804] [124384.345405] Modules linked in: bridge
> stp llc ctr ccm hid_logitech_hidpp hid_logitech_dj xt_MASQUERADE
> rfcomm hid_generic usbhid hid bnep iptable_nat nf_nat nf_conntrack
> nf_defrag_ipv6 nf_defrag_ipv4 ip_tables x_tables bpfilter arc4
> iwlmvm intel_rapl snd_hda_codec_hdmi x86_pkg_temp_thermal mac80211
> wmi_bmof coretemp snd_hda_codec_realtek snd_hda_codec_generic
> ghash_clmulni_intel snd_hda_intel snd_hda_codec aesni_intel
> snd_hwdep aes_x86_64 snd_hda_core glue_helper crypto_simd
> thinkpad_acpi cryptd snd_pcm nvram iwlwifi btusb ledtrig_audio
> btrtl snd_seq_midi btbcm snd_seq_midi_event btintel snd_rawmidi
> bluetooth snd_seq snd_timer snd_seq_device ecdh_generic cfg80211
> ecc snd joydev soundcore kvm_intel mac_hid wmi kvm irqbypass
> squashfs zstd_decompress lz4_decompress netconsole rtsx_pci_sdmmc
> rtsx_pci [124384.345426] CPU: 1 PID: 22804 Comm: TaskSchedulerFo
> Not tainted 5.2.0-rc2 #14 [124384.345427] Hardware name: LENOVO
> 20F5S7V800/20F5S7V800, BIOS R02ET70W (1.43 ) 01/28/2019 
> [124384.345431] RIP: 0010:xas_load+0x2c/0x80 [124384.345432] Code:
> 89 fb e8 67 ff ff ff eb 5b 48 3d 00 10 00 00 76 5f 0f b6 48 fe 48
> 8d 70 fe 38 4b 10 77 52 48 8b 53 08 48 d3 ea 83 e2 3f 89 d0 <48> 8d
> 44 c6 20 48 8b 40 08 48 89 73 18 48 89 c1 83 e1 03 48 83 f9 
> [124384.345433] RSP: 0018:ffffc900095f3a70 EFLAGS: 00000206
> ORIG_RAX: ffffffffffffff13 [124384.345434] RAX: 0000000000000022
> RBX: ffffc900095f3a80 RCX: 0000000000000006 [124384.345435] RDX:
> 0000000000000022 RSI: ffff888085eb4490 RDI: ffffc900095f3a80 
> [124384.345435] RBP: 00000000001dc8b0 R08: 0000000000000001 R09:
> ffff8884216fab80 [124384.345436] R10: ffff8884216fa000 R11:
> ffff8884216fa000 R12: 0000000000000000 [124384.345437] R13:
> ffff88840a006bd8 R14: 00000000001dc8b0 R15: ffff88810cc12580 
> [124384.345437] FS:  00007f1aa5b77700(0000)
> GS:ffff888411880000(0000) knlGS:0000000000000000 [124384.345438]
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [124384.345439]
> CR2: 00007f1ae3ffd9d0 CR3: 00000003e925c001 CR4: 00000000003626e0 
> [124384.345439] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000 [124384.345440] DR3: 0000000000000000 DR6:
> 00000000fffe0ff0 DR7: 0000000000000400 [124384.345440] Call Trace: 
> [124384.345464]  find_get_entry+0x74/0x1a0 [124384.345466]
> pagecache_get_page+0x27/0x250 [124384.345467]
> __try_to_reclaim_swap.isra.38+0x47/0xe0 [124384.345469]
> free_swap_and_cache+0x6e/0x70 [124384.345470]
> unmap_page_range+0x444/0xa50 [124384.345472]  unmap_vmas+0x81/0xe0 
> [124384.345474]  exit_mmap+0xab/0x1a0 [124384.345477]
> mmput+0x5d/0x130 [124384.345478]  do_exit+0x2af/0xbf0 
> [124384.345480]  do_group_exit+0x3d/0xb0 [124384.345481]
> get_signal+0x12d/0x8b0 [124384.345483]  do_signal+0x36/0x6a0 
> [124384.345485]  ? __might_fault+0x2b/0x30 [124384.345486]  ?
> _copy_from_user+0x5b/0x90 [124384.345488]  ?
> exit_to_usermode_loop+0x4a/0xb0 [124384.345489]
> exit_to_usermode_loop+0x62/0xb0 [124384.345507]
> do_syscall_64+0xfc/0x120 [124384.345508]
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
I saw this over a period of time and caught a bunch of different
softlokup messages.  They almost all appear under xas_load, though:
 RIP: 0010:xas_load+0x13/0x80
 RIP: 0010:xas_load+0x17/0x80
 RIP: 0010:xas_load+0x1b/0x80
 RIP: 0010:xas_load+0x27/0x80
 RIP: 0010:xas_load+0x2c/0x80
 RIP: 0010:xas_load+0x35/0x80
 RIP: 0010:xas_load+0x35/0x80
 RIP: 0010:xas_load+0x35/0x80
 RIP: 0010:xas_load+0x35/0x80
 RIP: 0010:xas_load+0x35/0x80
 RIP: 0010:xas_load+0x45/0x80
 RIP: 0010:xas_load+0x66/0x80
 RIP: 0010:xas_load+0xb/0x80
 RIP: 0010:xas_start+0x45/0x90
So it seems like it's actively spinning in a fairly big loop since
it's hitting a bunch of different places.
I only hit this once, though.  It's not easily reproducible for me.  I
haven't tried the above revert.
Powered by blists - more mailing lists
 
