linux-kernel - Re: [Nbd] NBD: Disconnect connection/kill NBD server cause kernel bug even kernel hang

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CA+2rt42oY+=QSv1Ku9ui=M_qqRjRBofSgDHFk_R6F2L49VZGEQ@mail.gmail.com>
Date:	Mon, 21 Sep 2015 15:13:11 -0700
From:	Sheng Yang <sheng@...ker.org>
To:	Wouter Verhelst <w@...r.be>
Cc:	Markus Pargmann <mpa@...gutronix.de>,
	nbd-general@...ts.sourceforge.net, linux-kernel@...r.kernel.org
Subject: Re: [Nbd] NBD: Disconnect connection/kill NBD server cause kernel bug
 even kernel hang

Thank you Wouter!

(Add back lkml since I got another kernel issue...)

In fact I just hit another issue with nbd-server/client on the same
machine. The following commands hang after a while:

dd if=/dev/zero of=./test

When process hanging, dd/nbd-server are no long showing in high cpu
utilization. Ctrl-C or any operation on the NBD device mount point
won't work and hang immediately.

I enabled kernel lockup detection and found this:

[  240.127132] INFO: task kworker/u8:1:28 blocked for more than 120 seconds.
[  240.129350]       Not tainted 4.3.0-rc2+ #13
[  240.131392] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  240.135290] kworker/u8:1    D ffff88013fc95f58 12160    28      2 0x00000000
[  240.138358] Workqueue: writeback wb_workfn (flush-43:0)
[  240.140886]  ffff88013a46b5d8 0000000000000082 ffffffff8106bd3a
ffffffff00000000
[  240.144799]  ffff8800bafe3d80 ffff88013a460000 ffff88013a46c000
ffff88013fc95f40
[  240.148712]  7fffffffffffffff ffff880139524e30 ffff880139524e60
ffff88013a46b5f0
[  240.152568] Call Trace:
[  240.153915]  [<ffffffff8106bd3a>] ? __queue_delayed_work+0x9a/0x190
[  240.156202]  [<ffffffff8191734e>] schedule+0x2e/0x70
[  240.158188]  [<ffffffff8191b41c>] schedule_timeout+0x1bc/0x2b0
[  240.160373]  [<ffffffff810f677a>] ? __delayacct_blkio_start+0x1a/0x30
[  240.162714]  [<ffffffff810f677a>] ? __delayacct_blkio_start+0x1a/0x30
[  240.164964]  [<ffffffff810446de>] ? kvm_clock_get_cycles+0x1e/0x20
[  240.167126]  [<ffffffff810be5b0>] ? ktime_get+0x90/0x110
[  240.169074]  [<ffffffff810f677a>] ? __delayacct_blkio_start+0x1a/0x30
[  240.171301]  [<ffffffff8191677f>] io_schedule_timeout+0x9f/0x110
[  240.173409]  [<ffffffff813193d5>] get_request+0x425/0x7d0
[  240.175385]  [<ffffffff81318fb0>] ?
trace_event_raw_event_block_get_rq+0xe0/0xe0
[  240.178256]  [<ffffffff813201c8>] ? ioc_lookup_icq+0x88/0xd0
[  240.180206]  [<ffffffff81092990>] ? wait_woken+0x80/0x80
[  240.182038]  [<ffffffff8131ceeb>] blk_queue_bio+0xcb/0x2f0
[  240.183948]  [<ffffffff8131870f>] generic_make_request+0xaf/0xf0
[  240.185932]  [<ffffffff813187b2>] submit_bio+0x62/0x140
[  240.187778]  [<ffffffff8113583b>] ? __test_set_page_writeback+0x15b/0x240
[  240.189837]  [<ffffffff81216699>] ext4_io_submit+0x39/0x50
[  240.191648]  [<ffffffff8121685a>] ext4_bio_write_page+0x18a/0x3b0
[  240.193548]  [<ffffffff8120da18>] mpage_submit_page+0x58/0x80
[  240.195405]  [<ffffffff8120dc75>] mpage_map_and_submit_buffers+0x145/0x260
[  240.197481]  [<ffffffff81212d5a>] ext4_writepages+0x5ba/0xcb0
[  240.199335]  [<ffffffff813d5291>] ? vp_notify+0x11/0x20
[  240.201590]  [<ffffffff811367cc>] do_writepages+0x1c/0x30
[  240.203935]  [<ffffffff811b9820>] __writeback_single_inode+0x50/0x300
[  240.206589]  [<ffffffff8191c012>] ? _raw_spin_unlock+0x22/0x30
[  240.209054]  [<ffffffff811ba108>] writeback_sb_inodes+0x298/0x520
[  240.211589]  [<ffffffff811ba414>] __writeback_inodes_wb+0x84/0xb0
[  240.214146]  [<ffffffff811ba730>] wb_writeback+0x250/0x2c0
[  240.216427]  [<ffffffff811bae43>] wb_workfn+0x213/0x390
[  240.218507]  [<ffffffff8106d113>] process_one_work+0x1c3/0x4c0
[  240.220757]  [<ffffffff8106d0a1>] ? process_one_work+0x151/0x4c0
[  240.223061]  [<ffffffff8106d736>] worker_thread+0x46/0x440
[  240.225225]  [<ffffffff8106d6f0>] ? rescuer_thread+0x2e0/0x2e0
[  240.227508]  [<ffffffff81072aff>] kthread+0xdf/0x100
[  240.229433]  [<ffffffff81072a20>] ? kthread_create_on_node+0x200/0x200
[  240.231809]  [<ffffffff8191cb9f>] ret_from_fork+0x3f/0x70
[  240.233888]  [<ffffffff81072a20>] ? kthread_create_on_node+0x200/0x200
[  240.236301] 4 locks held by kworker/u8:1/28:
[  240.238074]  #0:  ("writeback"){......}, at: [<ffffffff8106d0a1>]
process_one_work+0x151/0x4c0
[  240.241916]  #1:  ((&(&wb->dwork)->work)){......}, at:
[<ffffffff8106d0a1>] process_one_work+0x151/0x4c0
[  240.245960]  #2:  (&type->s_umount_key#31){......}, at:
[<ffffffff8118e9e6>] trylock_super+0x16/0x50
[  240.249995]  #3:  (jbd2_handle){......}, at: [<ffffffff81256f72>]
start_this_handle+0x112/0x440
[  240.253729] INFO: task jbd2/nbd0-8:2235 blocked for more than 120 seconds.
[  240.256081]       Not tainted 4.3.0-rc2+ #13
[  240.257692] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  240.260691] jbd2/nbd0-8     D ffff88013fd95f58 13488  2235      2 0x00000000
[  240.263250]  ffff8800ba29bc58 0000000000000086 ffff8800ba29bd40
0000000000000000
[  240.266402]  ffff8801394c9480 ffff8800ba156680 ffff8800ba29c000
ffff8800ba29bd58
[  240.269519]  ffff8800ba156680 ffff8800ba29bd40 ffff8800bb4cea00
ffff8800ba29bc70
[  240.272677] Call Trace:
[  240.273805]  [<ffffffff8191734e>] schedule+0x2e/0x70
[  240.275565]  [<ffffffff81259d35>]
jbd2_journal_commit_transaction+0x245/0x1760
[  240.278277]  [<ffffffff8108001c>] ? build_sched_domain+0x24c/0x2e0
[  240.280335]  [<ffffffff81092990>] ? wait_woken+0x80/0x80
[  240.282200]  [<ffffffff810b644a>] ? try_to_del_timer_sync+0x4a/0x60
[  240.284285]  [<ffffffff8125e780>] kjournald2+0xc0/0x260
[  240.286059]  [<ffffffff81092990>] ? wait_woken+0x80/0x80
[  240.287865]  [<ffffffff8125e6c0>] ? commit_timeout+0x10/0x10
[  240.289761]  [<ffffffff81072aff>] kthread+0xdf/0x100
[  240.291485]  [<ffffffff81072a20>] ? kthread_create_on_node+0x200/0x200
[  240.293585]  [<ffffffff8191cb9f>] ret_from_fork+0x3f/0x70
[  240.295404]  [<ffffffff81072a20>] ? kthread_create_on_node+0x200/0x200
[  240.297513] no locks held by jbd2/nbd0-8/2235.
[  240.299098] INFO: task dd:2255 blocked for more than 120 seconds.
[  240.301078]       Not tainted 4.3.0-rc2+ #13
[  240.302586] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  240.305422] dd              D ffff88013fd95f58 13240  2255   2254 0x00000004
[  240.307864]  ffff880139b3ba38 0000000000000086 0000000000000035
0000000000000000
[  240.310966]  ffff88013ab68000 ffff8801394c9480 ffff880139b3c000
0000000000000000
[  240.314027]  ffff8800bacc8900 0000000000000035 0000000000000000
ffff880139b3ba50
[  240.317130] Call Trace:
[  240.318222]  [<ffffffff8191734e>] schedule+0x2e/0x70
[  240.319979]  [<ffffffff81256b0f>] wait_transaction_locked+0x7f/0xb0
[  240.322010]  [<ffffffff81092990>] ? wait_woken+0x80/0x80
[  240.323872]  [<ffffffff81256d82>] add_transaction_credits+0x212/0x2a0
[  240.325980]  [<ffffffff81256f65>] start_this_handle+0x105/0x440
[  240.327882]  [<ffffffff81079cf5>] ? finish_task_switch+0xa5/0x1b0
[  240.329948]  [<ffffffff81256a5b>] ? new_handle+0x1b/0x50
[  240.331771]  [<ffffffff812574ea>] jbd2__journal_start+0xba/0x190
[  240.333716]  [<ffffffff8121416c>] ? ext4_da_write_begin+0x10c/0x2d0
[  240.335806]  [<ffffffff8123f13c>] __ext4_journal_start_sb+0x6c/0xd0
[  240.337835]  [<ffffffff8121416c>] ext4_da_write_begin+0x10c/0x2d0
[  240.339875]  [<ffffffff8123f1dd>] ? __ext4_journal_stop+0x3d/0xb0
[  240.341940]  [<ffffffff81128f62>] generic_perform_write+0xa2/0x1e0
[  240.343930]  [<ffffffff811a66d0>] ? file_update_time+0xa0/0xe0
[  240.345849]  [<ffffffff8112b13b>] __generic_file_write_iter+0x18b/0x1e0
[  240.347949]  [<ffffffff81209d60>] ext4_file_write_iter+0x210/0x420
[  240.349988]  [<ffffffff8118e26c>] ? __sb_start_write+0xec/0x100
[  240.351963]  [<ffffffff8118b195>] __vfs_write+0xa5/0xe0
[  240.353733]  [<ffffffff8118b7e4>] vfs_write+0xa4/0x190
[  240.355512]  [<ffffffff8118b71f>] ? vfs_read+0x10f/0x130
[  240.357310]  [<ffffffff8118c4a4>] SyS_write+0x44/0xa0
[  240.359014]  [<ffffffff8191c832>] entry_SYSCALL_64_fastpath+0x12/0x76
[  240.361142] 2 locks held by dd/2255:
[  240.362552]  #0:  (sb_writers#5){......}, at: [<ffffffff8118e26c>]
__sb_start_write+0xec/0x100
[  240.366119]  #1:  (&sb->s_type->i_mutex_key#9){......}, at:
[<ffffffff81209cdf>] ext4_file_write_iter+0x18f/0x420

--Sheng

On Sun, Sep 20, 2015 at 11:52 PM, Wouter Verhelst <w@...r.be> wrote:
> Hi,
>
> [linux-kernel CC dropped, since this is irrelevant for them]
>
> On Wed, Sep 16, 2015 at 06:18:50PM -0700, Sheng Yang wrote:
>> Hi, Markus,
>>
>> I've experienced kernel bug and system hang regarding using
>> nbd-client/server on the same machine. Kernel verison is v4.2.
>>
>> I know http://nbd.sourceforge.net/ said:
>> "Please note that read-write nbd with client and server on the same
>> machine is a bad idea: expect deadlock within seconds". But it's
>> mentioned at http://thread.gmane.org/gmane.linux.kernel/869784 that
>> NBD should be able to handle client/server on the same machine.
>
> Yes, that bug has been fixed. I need to figure out how I can update
> those pages again (sourceforge changed the procedure a few times), and
> remove that note.
>
> (the rest is kernel, so for Markus, not me)
>
> --
> It is easy to love a country that is famous for chocolate and beer
>
>   -- Barack Obama, speaking in Brussels, Belgium, 2014-03-26
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/