lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <KL1P15301MB000645A10FE2B20421A7F3DBBFC50@KL1P15301MB0006.APCP153.PROD.OUTLOOK.COM>
Date:   Fri, 2 Mar 2018 22:28:50 +0000
From:   Dexuan Cui <decui@...rosoft.com>
To:     "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
        Jan Kara <jack@...e.cz>, Amir Goldstein <amir73il@...il.com>,
        Miklos Szeredi <mszeredi@...hat.com>
CC:     Haiyang Zhang <haiyangz@...rosoft.com>,
        "'linux-kernel@...r.kernel.org'" <linux-kernel@...r.kernel.org>,
        Jork Loeser <Jork.Loeser@...rosoft.com>
Subject: Any known soft lockup issue with vfs_write()->fsnotify()?

Hi,
Recently people are getting a soft lock issue with vfs_write()->fsnotify(). 
The detailed calltrace is available at:
https://github.com/coreos/bugs/issues/2356
https://github.com/coreos/bugs/issues/2364

The kernel versions showing up the issue are:
4.14.11-coreos 
4.14.19-coreos
4.13.0-1009 -- this is the kernel with which I'm personally seeing the lockup.

I have not got a chance to try the latest mainline kernel yet.

Before the lockup error message suddenly appears, Linux has been running fine for many hours.
I have NOT found a consistent way to reproduce the lockup yet.

Looks the kernel is stuck in fsnotify(), when it tries to get the fsnotify_mark_srcu lock.

"git log fs/notify/fsnotify.c" on the latest mainline shows that some recent patches might help.

I'd like to check if this is a known issue.

Looking forward to your insights!

Thanks,
-- Dexuan

For your convenience, this is a calltrace from the first link:

18h 30m 8.626s(   4ms): ip-172-45-43-199 login: [67361.641359] watchdog: BUG: soft lockup - CPU#10 stuck for 22s! [java:87260]
18h 42m 40.116s(751490ms): [67361.644600] Modules linked in: xfs xt_statistic xt_nat xt_recent ipt_REJECT nf_reject_ipv4 xt_comment xt_mark veth nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter xt_conntrack br_netfilter bridge stp llc ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c crc32c_generic vxlan ip6_udp_tunnel udp_tunnel overlay mousedev psmouse sb_edac i2c_piix4 i2c_core evdev edac_core button xenfs xen_privcmd sch_fq_codel nls_ascii nls_cp437 vfat fat dm_verity dm_bufio ext4 crc16 mbcache jbd2 fscrypto crc32c_intel ata_piix aesni_intel xen_blkfront libata aes_x86_64 crypto_simd cryptd glue_helper scsi_mod ixgbevf dm_mirror dm_region_hash dm_log dm_mod dax
18h 42m 40.142s(  26ms): [67361.668103] CPU: 10 PID: 87260 Comm: java Not tainted 4.14.11-coreos #1
18h 42m 40.142s(   0ms): [67361.670391] Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
18h 42m 40.144s(   2ms): [67361.672581] task: ffff90d6009dbc80 task.stack: ffffb2388f704000
18h 42m 40.149s(   5ms): [67361.674604] RIP: 0010:fsnotify+0x166/0x520
18h 42m 40.149s(   0ms): [67361.675971] RSP: 0018:ffffb2388f707e10 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff0c
18h 42m 40.150s(   1ms): [67361.678462] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000000
18h 42m 40.152s(   2ms): [67361.680986] RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffffffff907294c0
18h 42m 40.157s(   5ms): [67361.683340] RBP: ffff90d2eddffed8 R08: 0000000000000000 R09: 0000000000000000
18h 42m 40.157s(   0ms): [67361.685709] R10: ffffdd941da7a100 R11: 0000000000000000 R12: ffff90d2eddfff00
18h 42m 40.159s(   2ms): [67361.688199] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
18h 42m 40.165s(   6ms): [67361.690579] FS:  00007f491c3f4700(0000) GS:ffff90d6ef880000(0000) knlGS:0000000000000000
18h 42m 40.165s(   0ms): [67361.693227] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
18h 42m 40.166s(   1ms): [67361.695206] CR2: 000000c421288300 CR3: 00000005ba5fc002 CR4: 00000000001606e0
18h 42m 40.175s(   9ms): [67361.697655] Call Trace:
18h 42m 40.175s(   0ms): [67361.698499]  vfs_write+0x14f/0x1a0
18h 42m 40.175s(   0ms): [67361.699656]  SyS_write+0x52/0xc0
18h 42m 40.175s(   0ms): [67361.700745]  do_syscall_64+0x59/0x1c0
18h 42m 40.175s(   0ms): [67361.701996]  entry_SYSCALL64_slow_path+0x25/0x25
18h 42m 40.175s(   0ms): [67361.703536] RIP: 0033:0x7f4b5566643d
18h 42m 40.176s(   1ms): [67361.704751] RSP: 002b:00007f491c3f0ef0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
18h 42m 40.179s(   3ms): [67361.707375] RAX: ffffffffffffffda RBX: 0000000000000032 RCX: 00007f4b5566643d
18h 42m 40.188s(   9ms): [67361.709849] RDX: 00000000000000f0 RSI: 00007f491c3f0f50 RDI: 0000000000000f40
18h 42m 40.188s(   0ms): [67361.712205] RBP: 00007f491c3f0f20 R08: 00007f491c3f1030 R09: 00000005f4ff70d8
18h 42m 40.188s(   0ms): [67361.714573] R10: 0000000000052f06 R11: 0000000000000293 R12: 00000000000000f0
18h 42m 40.188s(   0ms): [67361.716924] R13: 00007f491c3f0f50 R14: 0000000000000f40 R15: 0000000000000000
18h 42m 40.191s(   3ms): [67361.719331] Code: 40 4c 89 7c 24 48 4c 89 7c 24 08 8b 44 24 18 25 00 00 03 00 89 44 24 34 4d 85 e4 0f 95 c2 48 83 7c 24 08 00 0f 95 c1 89 c8 08 d0 <0f> 84 96 03 00 00 84 d2 0f 84 e8 02 00 00 48 8b 44 24 40 84 c9

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ