linux-kernel - Re: [Nbd] NBD: Disconnect connection/kill NBD server cause kernel bug even kernel hang

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150924061512.GQ32203@pengutronix.de>
Date:	Thu, 24 Sep 2015 08:15:12 +0200
From:	Markus Pargmann <mpa@...gutronix.de>
To:	Sheng Yang <sheng@...ker.org>
Cc:	Paul Clements <paul.clements@...sios.com>,
	"nbd-general@...ts.sourceforge.net" 
	<nbd-general@...ts.sourceforge.net>,
	kernel list <linux-kernel@...r.kernel.org>
Subject: Re: [Nbd] NBD: Disconnect connection/kill NBD server cause kernel
 bug even kernel hang

Hi,

Thanks for the pointer Paul.

On Mon, Sep 21, 2015 at 05:33:21PM -0700, Sheng Yang wrote:
> Thank you Paul! That's exactly the issue I met. I've read the whole
> thread and got a general idea of the issue.
> 
> I try to summarize it and please correct me if I'm wrong:
> 
> 1. The issue is the result of kill_bdev() when connection has been cut
> when IO is still flying.
> 2. Other block devices driver didn't have this issue because they
> normally keep the buffer and device, deny all the requests and only
> kill the device when all request has been settled down and device has
> been umounted.
> 3. Why NBD cannot handle it in the same way because NBD has dependency
> on userspace nfs-client, which would handle the reconnection/retry. If
> DO_IT ioctl didn't return, then there is no way for userspace to
> reconnect. How to coordinate between kernel and userspace on
> reconnecting while leaving the device open until it's safe to close
> leads to several choices and would leads to quite amount of change.
> 
> Sounds like Goswin's suggestion really make sense(
> http://sourceforge.net/p/nbd/mailman/message/31678340/ ). This issue
> has been there for years and impact the stability of NBD a lot. I
> think we should get it fixed.

It may make sense as well to create a kernel thread to replace the
userspace thread which is blocking in the DO_IT ioctl.

This way we could even react on degraded (long RTT) but working
connections. The userspace thread could return while the nbd in general
does still work.  As soon as all users closed the block device, we can
kill the threads that keep it alive. Otherwise the userspace thread can
update the socket while the nbd is still working and jump into DO_IT
again.

To force a disconnect we still have the NBD_DISCONNECT ioctl.

This would require some mechanism to switch sockets on the fly, so it
may be necessary to listen on multiple sockets for a time. Or have a
small inactive phase where we wait on the remaining requests and switch
afterwards. However this could allow some softer mechanism than the
currently available 'timeout' which completely removes the device.

Best Regards,

Markus

> 
> --Sheng
> 
> On Mon, Sep 21, 2015 at 4:06 PM, Paul Clements
> <paul.clements@...sios.com> wrote:
> > Hi Sheng,
> >
> > this is a known issue (not related to nbd-client/nbd-server on the same box,
> > but any termination of the nbd connection with mounted fs and I/O ongoing)
> >
> > It was discussed at length in Nov 2013 in this thread (along with possible
> > solutions):
> >
> > http://sourceforge.net/p/nbd/mailman/nbd-general/thread/20131114075827.GA13554%40quack.suse.cz/#msg31636917
> >
> > Maybe Markus has some thoughts on how to fix it. I think there are a couple
> > options.
> >
> > --
> > Paul
> >
> > On Wed, Sep 16, 2015 at 9:18 PM, Sheng Yang <sheng@...ker.org> wrote:
> >>
> >> Hi, Markus,
> >>
> >> I've experienced kernel bug and system hang regarding using
> >> nbd-client/server on the same machine. Kernel verison is v4.2.
> >>
> >> I know http://nbd.sourceforge.net/ said:
> >> "Please note that read-write nbd with client and server on the same
> >> machine is a bad idea: expect deadlock within seconds". But it's
> >> mentioned at http://thread.gmane.org/gmane.linux.kernel/869784 that
> >> NBD should be able to handle client/server on the same machine.
> >>
> >> The bug can be easily reproduce like this;
> >>
> >> truncate -s 1G nbd.vol
> >> mkfs.ext4 nbd.vol
> >> <add nbd.vol to nbd-server's config, named "nbd-vol">
> >> service nbd-server restart
> >>
> >> nbd-client localhost /dev/nbd0 -N nbd-vol
> >> mount /dev/ndb0 /mnt/tmp
> >> wget https://www.kernel.org/pub/linux/kernel/v4.x/linux-4.2.tar.xz
> >> --limit-rate=100k
> >>
> >> <then in another console>
> >> pkill nbd-server
> >>
> >> Then the following would be triggered:
> >>
> >> [ 2886.348891] block nbd0: Attempted send on closed socket
> >> [ 2886.350579] blk_update_request: I/O error, dev nbd0, sector 1016
> >> [ 2886.352210] EXT4-fs warning (device nbd0):
> >> htree_dirblock_to_tree:959: inode #2: lblock 0: comm ls: error -5
> >> reading directory block
> >> [ 2886.352236] block nbd0: Attempted send on closed socket
> >> [ 2886.353646] blk_update_request: I/O error, dev nbd0, sector 524
> >> [ 2886.355267] block nbd0: Attempted send on closed socket
> >> [ 2886.356665] blk_update_request: I/O error, dev nbd0, sector 526
> >> [ 2886.358271] block nbd0: Attempted send on closed socket
> >> [ 2886.359654] blk_update_request: I/O error, dev nbd0, sector 528
> >> [ 2886.361253] block nbd0: Attempted send on closed socket
> >> [ 2886.362597] blk_update_request: I/O error, dev nbd0, sector 530
> >> [ 2886.364106] block nbd0: Attempted send on closed socket
> >> [ 2886.365400] blk_update_request: I/O error, dev nbd0, sector 532
> >> [ 2886.366905] block nbd0: Attempted send on closed socket
> >> [ 2886.368205] blk_update_request: I/O error, dev nbd0, sector 534
> >> [ 2886.369714] block nbd0: Attempted send on closed socket
> >> [ 2886.371025] blk_update_request: I/O error, dev nbd0, sector 536
> >> [ 2886.372560] block nbd0: Attempted send on closed socket
> >> [ 2886.373871] blk_update_request: I/O error, dev nbd0, sector 538
> >> [ 2886.375283] block nbd0: Attempted send on closed socket
> >> [ 2886.376504] blk_update_request: I/O error, dev nbd0, sector 540
> >> ...
> >> [ 2886.405309] block nbd0: Attempted send on closed socket
> >> [ 2886.406440] EXT4-fs error (device nbd0): __ext4_get_inode_loc:3927:
> >> inode #2: block 261: comm ls: unable to read itable block
> >> [ 2886.408816] ------------[ cut here ]------------
> >> [ 2886.409800] kernel BUG at fs/buffer.c:3025!
> >> [ 2886.409800] invalid opcode: 0000 [#1] SMP
> >> [ 2886.409800] Modules linked in:
> >> [ 2886.409800] CPU: 3 PID: 17137 Comm: ls Not tainted 4.2.0 #11
> >> [ 2886.409800] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> >> BIOS 1.7.5-20150306_163512-brownie 04/01/2014
> >> [ 2886.409800] task: ffff8800b8a50000 ti: ffff8800b7b64000 task.ti:
> >> ffff8800b7b64000
> >> [ 2886.409800] RIP: 0010:[<ffffffff811a778e>]  [<ffffffff811a778e>]
> >> submit_bh_wbc+0x16e/0x180
> >> [ 2886.409800] RSP: 0018:ffff8800b7b67b58  EFLAGS: 00010246
> >> [ 2886.409800] RAX: 0000000000000005 RBX: ffff8800adb849c0 RCX:
> >> 0000000000000000
> >> [ 2886.409800] RDX: 0000000000000000 RSI: ffff8800adb849c0 RDI:
> >> 0000000000000411
> >> [ 2886.409800] RBP: ffff8800b7b67b88 R08: fffffffffffffff0 R09:
> >> 0000000000000000
> >> [ 2886.409800] R10: 0000000000000218 R11: 0000000000000006 R12:
> >> 0000000000000411
> >> [ 2886.409800] R13: ffff8800b88fc400 R14: 000000000002facc R15:
> >> ffff88013a390000
> >> [ 2886.409800] FS:  00007f83cf856840(0000) GS:ffff88013fd80000(0000)
> >> knlGS:0000000000000000
> >> [ 2886.409800] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> >> [ 2886.409800] CR2: 0000000000f75388 CR3: 0000000139896000 CR4:
> >> 00000000000006e0
> >> [ 2886.409800] Stack:
> >> [ 2886.409800]  ffff88013a395358 ffff8800adb849c0 0000000000000411
> >> ffff8800b88fc400
> >> [ 2886.409800]  000000000002facc ffff88013a390000 ffff8800b7b67ba8
> >> ffffffff811a9449
> >> [ 2886.409800]  ffffffff81f209e0 ffff8800adb849c0 ffff8800b7b67bb8
> >> ffffffff811a94ee
> >> [ 2886.409800] Call Trace:
> >> [ 2886.409800]  [<ffffffff811a9449>] __sync_dirty_buffer+0x59/0xf0
> >> [ 2886.409800]  [<ffffffff811a94ee>] sync_dirty_buffer+0xe/0x10
> >> [ 2886.409800]  [<ffffffff8120de31>] ext4_commit_super+0x181/0x200
> >> [ 2886.409800]  [<ffffffff8120e255>] __ext4_error_inode+0x85/0x140
> >> [ 2886.409800]  [<ffffffff811f0fdd>] __ext4_get_inode_loc+0x1fd/0x3d0
> >> [ 2886.409800]  [<ffffffff811f2a78>] ext4_get_inode_loc+0x18/0x20
> >> [ 2886.409800]  [<ffffffff811f442f>] ext4_reserve_inode_write+0x1f/0x80
> >> [ 2886.409800]  [<ffffffff811f7773>] ? ext4_dirty_inode+0x43/0x60
> >> [ 2886.409800]  [<ffffffff811f44d8>] ext4_mark_inode_dirty+0x48/0x1e0
> >> [ 2886.409800]  [<ffffffff811f7773>] ext4_dirty_inode+0x43/0x60
> >> [ 2886.409800]  [<ffffffff8119e603>] __mark_inode_dirty+0x163/0x310
> >> [ 2886.409800]  [<ffffffff8118d932>] generic_update_time+0x72/0xc0
> >> [ 2886.409800]  [<ffffffff810a8bd1>] ? current_fs_time+0x11/0x60
> >> [ 2886.409800]  [<ffffffff8118f143>] touch_atime+0x83/0xa0
> >> [ 2886.409800]  [<ffffffff81187096>] iterate_dir+0xd6/0x120
> >> [ 2886.409800]  [<ffffffff8118749c>] SyS_getdents+0x7c/0xe0
> >> [ 2886.409800]  [<ffffffff811871a0>] ? fillonedir+0xc0/0xc0
> >> [ 2886.409800]  [<ffffffff810485fc>] ? trace_do_page_fault+0x3c/0xb0
> >> [ 2886.409800]  [<ffffffff818cc8ae>] entry_SYSCALL_64_fastpath+0x12/0x71
> >> [ 2886.409800] Code: 09 e7 14 00 48 83 c4 08 31 c0 5b 41 5c 41 5d 41
> >> 5e 41 5f 5d c3 40 f6 c7 01 0f 84 ff fe ff ff f0 80 63 01 f7 e9 f5 fe
> >> ff ff 0f 0b <0f> 0b 0f 0b 0f 0b 0f 0b 66 2e 0f 1f 84 00 00 00 00 00 55
> >> 31 c9
> >> [ 2886.409800] RIP  [<ffffffff811a778e>] submit_bh_wbc+0x16e/0x180
> >> [ 2886.409800]  RSP <ffff8800b7b67b58>
> >> [ 2886.461658] ---[ end trace 46f00e222737016e ]---
> >>
> >>
> >> I am trying to use NBD for some HA scenario, I got above BUG using
> >> default nbd-server. After this, any access to /mnt/tmp or /dev/nbd0
> >> would hang.
> >>
> >> We also has a customerized nbd server, and in the case of killing the
> >> connection would result in resursive fault and complete hang the
> >> system. Something like this happened:
> >>
> >> [  170.552688] block nbd0: Receive control failed (result -32)
> >> [  171.253234] block nbd0: Attempted send on closed socket
> >> [  171.254772] blk_update_request: I/O error, dev nbd0, sector 8464
> >> [  171.256586] block nbd0: Attempted send on closed socket
> >> [  171.258044] blk_update_request: I/O error, dev nbd0, sector 8720
> >> [  171.259953] block nbd0: Attempted send on closed socket
> >> [  171.261415] blk_update_request: I/O error, dev nbd0, sector 8976
> >> [  171.263300] block nbd0: Attempted send on closed socket
> >> [  171.264719] blk_update_request: I/O error, dev nbd0, sector 9232
> >> [  171.266570] block nbd0: Attempted send on closed socket
> >> [  171.268016] blk_update_request: I/O error, dev nbd0, sector 9488
> >> [  171.269604] block nbd0: Attempted send on closed socket
> >> [  171.270903] blk_update_request: I/O error, dev nbd0, sector 9744
> >> [  171.272470] block nbd0: Attempted send on closed socket
> >> [  171.273904] blk_update_request: I/O error, dev nbd0, sector 10000
> >> [  171.275523] block nbd0: Attempted send on closed socket
> >> [  171.276815] blk_update_request: I/O error, dev nbd0, sector 10256
> >> [  171.278499] block nbd0: Attempted send on closed socket
> >> [  171.279851] blk_update_request: I/O error, dev nbd0, sector 10512
> >> [  171.281383] block nbd0: Attempted send on closed socket
> >> [  171.282592] blk_update_request: I/O error, dev nbd0, sector 10768
> >> [  171.284021] block nbd0: Attempted send on closed socket
> >> [  171.285231] block nbd0: Attempted send on closed socket
> >> [  171.286568] block nbd0: Attempted send on closed socket
> >> [  171.287973] block nbd0: Attempted send on closed socket
> >> [  171.289283] block nbd0: Attempted send on closed socket
> >> [  171.290565] block nbd0: Attempted send on closed socket
> >> [  173.711085] ------------[ cut here ]------------
> >> [  173.712036] kernel BUG at fs/buffer.c:3025!
> >> [  173.712036] invalid opcode: 0000 [#1] SMP
> >> [  173.712036] Modules linked in:
> >> [  173.712036] CPU: 0 PID: 2328 Comm: jbd2/nbd0-8 Not tainted 4.2.0 #10
> >> [  173.712036] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> >> BIOS 1.7.5-0150306_163512-brownie 04/01/2014
> >> [  173.712036] task: ffff8800bb8d0000 ti: ffff8800ad154000 task.ti:
> >> ffff8800ad154000
> >> [  173.712036] RIP: 0010:[<ffffffff811a778e>]  [<ffffffff811a778e>]
> >> submit_bh_wbc+0x16e/0x180
> >> [  173.712036] RSP: 0018:ffff8800ad157bf8  EFLAGS: 00010246
> >> [  173.712036] RAX: 0000000004000005 RBX: ffff880139eb6e38 RCX:
> >> 0000000000000000
> >> [  173.712036] RDX: 0000000000000000 RSI: ffff880139eb6e38 RDI:
> >> 0000000000000411
> >> [  173.712036] RBP: ffff8800ad157c28 R08: ffff8800ad154000 R09:
> >> 0000000000000001
> >> [  173.712036] R10: 00000000fffe12ab R11: 0000000000000020 R12:
> >> ffff880135f04000
> >> [  173.712036] R13: 0000000000000411 R14: ffff880139eb6e38 R15:
> >> ffff8800b7855000
> >> [  173.712036] FS:  0000000000000000(0000) GS:ffff88013fc00000(0000)
> >> knlGS:0000000000000000
> >> [  173.712036] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> >> [  173.712036] CR2: 00007fe561c38000 CR3: 0000000001e0c000 CR4:
> >> 00000000000006f0
> >> [  173.712036] Stack:
> >> [  173.712036]  0000000000000000 ffff880135f04000 ffff880135f04000
> >> 0000000000000411
> >> [  173.712036]  ffff880139eb6e38 ffff8800b7855000 ffff8800ad157c38
> >> ffffffff811a77bd
> >> [  173.712036]  ffff8800ad157c78 ffffffff81241c17 0000000000000000
> >> ffff880135f04000
> >> [  173.712036] Call Trace:
> >> [  173.712036]  [<ffffffff811a77bd>] submit_bh+0xd/0x10
> >> [  173.712036]  [<ffffffff81241c17>] jbd2_write_superblock+0x87/0x170
> >> [  173.712036]  [<ffffffff812425f2>]
> >> jbd2_journal_update_sb_log_tail+0x32/0x80
> >> [  173.712036]  [<ffffffff818cab91>] ? mutex_lock+0x11/0x40
> >> [  173.712036]  [<ffffffff8123b4ef>]
> >> jbd2_journal_commit_transaction+0x5ff/0x1680
> >> [  173.712036]  [<ffffffff81080513>] ? dequeue_entity+0x163/0x710
> >> [  173.712036]  [<ffffffff81083b7d>] ? dequeue_task_fair+0x50d/0x7b0
> >> [  173.712036]  [<ffffffff810a9c33>] ? lock_timer_base.isra.35+0x53/0x70
> >> [  173.712036]  [<ffffffff810a9e0a>] ? try_to_del_timer_sync+0x4a/0x60
> >> [  173.712036]  [<ffffffff8123fa36>] kjournald2+0xb6/0x230
> >> [  173.712036]  [<ffffffff8108c3c0>] ? wait_woken+0x80/0x80
> >> [  173.712036]  [<ffffffff8123f980>] ? commit_timeout+0x10/0x10
> >> [  173.712036]  [<ffffffff8106ec44>] kthread+0xc4/0xe0
> >> [  173.712036]  [<ffffffff8106eb80>] ? kthread_create_on_node+0x180/0x180
> >> [  173.712036]  [<ffffffff818ccc9f>] ret_from_fork+0x3f/0x70
> >> [  173.712036]  [<ffffffff8106eb80>] ? kthread_create_on_node+0x180/0x180
> >> [  173.712036] Code: 09 e7 14 00 48 83 c4 08 31 c0 5b 41 5c 41 5d 41
> >> 5e 41 5f 5d c3 40 f6 c7 01 0f 84 ff fe ff ff f0 80 63 01 f7 e9 f5 fe
> >> ff ff 0f 0b <0f> 0b 0f 0b 0f 0b 0f 0b 66 2e 0f 1f 84 00 00 00 00 00 55
> >> 31 c9
> >> [  173.712036] RIP  [<ffffffff811a778e>] submit_bh_wbc+0x16e/0x180
> >> [  173.712036]  RSP <ffff8800ad157bf8>
> >> [  173.776031] ---[ end trace d41ec1552544a207 ]---
> >> [  199.615574] block nbd0: Attempted send on closed socket
> >> [  199.617102] blk_update_request: 6 callbacks suppressed
> >> [  199.618090] blk_update_request: I/O error, dev nbd0, sector 8456
> >> [  199.620187] EXT4-fs error (device nbd0): __ext4_get_inode_loc:3927:
> >> inode #12: block 1057: comm kworker/u8:1: unable to read itable block
> >> [  199.623876] ------------[ cut here ]------------
> >> [  199.624839] kernel BUG at fs/buffer.c:3025!
> >> [  199.624839] invalid opcode: 0000 [#2] SMP
> >> [  199.624839] Modules linked in:
> >> [  199.624839] CPU: 2 PID: 25 Comm: kworker/u8:1 Tainted: G      D
> >>     4.2.0 #10
> >> [  199.624839] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> >> BIOS 1.7.5-20150306_163512-brownie 04/01/2014
> >> [  199.624839] Workqueue: writeback wb_workfn (flush-43:0)
> >> [  199.624839] task: ffff88013abbd240 ti: ffff88013a424000 task.ti:
> >> ffff88013a424000
> >> [  199.624839] RIP: 0010:[<ffffffff811a778e>]  [<ffffffff811a778e>]
> >> submit_bh_wbc+0x16e/0x180
> >> [  199.624839] RSP: 0018:ffff88013a4274e8  EFLAGS: 00010246
> >> [  199.624839] RAX: 0000000000620005 RBX: ffff8800bb78af70 RCX:
> >> 0000000000000000
> >> [  199.624839] RDX: 0000000000000000 RSI: ffff8800bb78af70 RDI:
> >> 0000000000000411
> >> [  199.624839] RBP: ffff88013a427518 R08: fffffffffffffff0 R09:
> >> 0000000000000000
> >> [  199.624839] R10: 0000000000000226 R11: 0000000000000006 R12:
> >> 0000000000000411
> >> [  199.624839] R13: ffff8800b8d70400 R14: 0000000000043220 R15:
> >> ffff880135f02800
> >> [  199.624839] FS:  0000000000000000(0000) GS:ffff88013fd00000(0000)
> >> knlGS:0000000000000000
> >> [  199.624839] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> >> [  199.624839] CR2: 0000000002024808 CR3: 00000000b8630000 CR4:
> >> 00000000000006e0
> >> [  199.624839] Stack:
> >> [  199.624839]  ffff880135f06358 ffff8800bb78af70 0000000000000411
> >> ffff8800b8d70400
> >> [  199.624839]  0000000000043220 ffff880135f02800 ffff88013a427538
> >> ffffffff811a9449
> >> [  199.624839]  ffffffff81f209e0 ffff8800bb78af70 ffff88013a427548
> >> ffffffff811a94ee
> >> [  199.624839] Call Trace:
> >> [  199.624839]  [<ffffffff811a9449>] __sync_dirty_buffer+0x59/0xf0
> >> [  199.624839]  [<ffffffff811a94ee>] sync_dirty_buffer+0xe/0x10
> >> [  199.624839]  [<ffffffff8120de31>] ext4_commit_super+0x181/0x200
> >> [  199.624839]  [<ffffffff8120e255>] __ext4_error_inode+0x85/0x140
> >> [  199.624839]  [<ffffffff8108c400>] ? autoremove_wake_function+0x40/0x40
> >> [  199.624839]  [<ffffffff811f0fdd>] __ext4_get_inode_loc+0x1fd/0x3d0
> >> [  199.624839]  [<ffffffff811f2a78>] ext4_get_inode_loc+0x18/0x20
> >> [  199.624839]  [<ffffffff811f442f>] ext4_reserve_inode_write+0x1f/0x80
> >> [  199.624839]  [<ffffffff81218fdd>] ? __ext4_ext_dirty+0x6d/0x90
> >> [  199.624839]  [<ffffffff811f44d8>] ext4_mark_inode_dirty+0x48/0x1e0
> >> [  199.624839]  [<ffffffff81218fdd>] __ext4_ext_dirty+0x6d/0x90
> >> [  199.624839]  [<ffffffff8121abdc>] ext4_ext_insert_extent+0x6ac/0x1310
> >> [  199.624839]  [<ffffffff8121e098>] ext4_ext_map_blocks+0x258/0xdc0
> >> [  199.624839]  [<ffffffff811f1ba4>] ext4_map_blocks+0x124/0x4f0
> >> [  199.624839]  [<ffffffff811f4b51>] ext4_writepages+0x4e1/0xc50
> >> [  199.624839]  [<ffffffff8132ab85>] ? find_next_bit+0x15/0x20
> >> [  199.624839]  [<ffffffff81084cbb>] ? find_busiest_group+0x11b/0x980
> >> [  199.624839]  [<ffffffff81123199>] do_writepages+0x19/0x30
> >> [  199.624839]  [<ffffffff8119e7f9>] __writeback_single_inode+0x49/0x2b0
> >> [  199.624839]  [<ffffffff8119ef64>] writeback_sb_inodes+0x274/0x4c0
> >> [  199.624839]  [<ffffffff8119f234>] __writeback_inodes_wb+0x84/0xb0
> >> [  199.624839]  [<ffffffff8119f47f>] wb_writeback+0x21f/0x290
> >> [  199.624839]  [<ffffffff8119fc9e>] wb_workfn+0x2fe/0x480
> >> [  199.624839]  [<ffffffff810698a6>] process_one_work+0x156/0x3e0
> >> [  199.624839]  [<ffffffff81069e59>] worker_thread+0x49/0x450
> >> [  199.624839]  [<ffffffff81069e10>] ? rescuer_thread+0x2e0/0x2e0
> >> [  199.624839]  [<ffffffff81069e10>] ? rescuer_thread+0x2e0/0x2e0
> >> [  199.624839]  [<ffffffff8106ec44>] kthread+0xc4/0xe0
> >> [  199.624839]  [<ffffffff8106eb80>] ? kthread_create_on_node+0x180/0x180
> >> [  199.624839]  [<ffffffff818ccc9f>] ret_from_fork+0x3f/0x70
> >> [  199.624839]  [<ffffffff8106eb80>] ? kthread_create_on_node+0x180/0x180
> >> [  199.624839] Code: 09 e7 14 00 48 83 c4 08 31 c0 5b 41 5c 41 5d 41
> >> 5e 41 5f 5d c3 40 f6 c7 01 0f 84 ff fe ff ff f0 80 63 01 f7 e9 f5 fe
> >> ff ff 0f 0b <0f> 0b 0f 0b 0f 0b 0f 0b 66 2e 0f 1f 84 00 00 00 00 00 55
> >> 31 c9
> >> [  199.624839] RIP  [<ffffffff811a778e>] submit_bh_wbc+0x16e/0x180
> >> [  199.624839]  RSP <ffff88013a4274e8>
> >> [  199.706502] ---[ end trace d41ec1552544a208 ]---
> >> [  199.707614] kworker/u8:1 (25) used greatest stack depth: 11104 bytes
> >> left
> >> [  199.708873] BUG: unable to handle kernel paging request at
> >> ffffffffffffffd8
> >> [  199.709861] IP: [<ffffffff8106f1cb>] kthread_data+0xb/0x20
> >> [  199.709861] PGD 1e0d067 PUD 1e0f067 PMD 0
> >> [  199.709861] Oops: 0000 [#3] SMP
> >> [  199.709861] Modules linked in:
> >> [  199.709861] CPU: 2 PID: 25 Comm: kworker/u8:1 Tainted: G      D
> >>     4.2.0 #10
> >> [  199.709861] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> >> BIOS 1.7.5-0150306_163512-brownie 04/01/2014
> >> [  199.709861] task: ffff88013abbd240 ti: ffff88013a424000 task.ti:
> >> ffff88013a424000
> >> [  199.709861] RIP: 0010:[<ffffffff8106f1cb>]  [<ffffffff8106f1cb>]
> >> kthread_data+0xb/0x20
> >> [  199.709861] RSP: 0018:ffff88013a427188  EFLAGS: 00010092
> >> [  199.709861] RAX: 0000000000000000 RBX: 0000000000000002 RCX:
> >> 000000000000000d
> >> [  199.709861] RDX: 000000000000000d RSI: 0000000000000002 RDI:
> >> ffff88013abbd240
> >> [  199.709861] RBP: ffff88013a427188 R08: 0000000000000000 R09:
> >> 0000000000000000
> >> [  199.709861] R10: 0000000000000000 R11: 0000000000000006 R12:
> >> 0000000000014fc0
> >> [  199.709861] R13: ffff88013fd14fc0 R14: ffff88013abbd240 R15:
> >> 0000000000000002
> >> [  199.709861] FS:  0000000000000000(0000) GS:ffff88013fd00000(0000)
> >> nlGS:0000000000000000
> >> [  199.709861] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> >> [  199.709861] CR2: 0000000000000028 CR3: 00000000b8630000 CR4:
> >> 00000000000006e0
> >> [  199.709861] Stack:
> >> [  199.709861]  ffff88013a4271a8 ffffffff8106a2d0 ffff88013a4271a8
> >> ffff88013fd14fc0
> >> [  199.709861]  ffff88013a4271f8 ffffffff818c8dac ffff880100000000
> >> ffff88013abbd240
> >> [  199.709861]  ffff88013a4271f8 ffff88013a428000 0000000000002b60
> >> ffff88013a426dd8
> >> [  199.709861] Call Trace:
> >> [  199.709861]  [<ffffffff8106a2d0>] wq_worker_sleeping+0x10/0x90
> >> [  199.709861]  [<ffffffff818c8dac>] __schedule+0x70c/0x900
> >> [  199.709861]  [<ffffffff818c8fd2>] schedule+0x32/0x80
> >> [  199.709861]  [<ffffffff810561b8>] do_exit+0x718/0xb00
> >> [  199.709861]  [<ffffffff81006c79>] oops_end+0x99/0xd0
> >> [  199.709861]  [<ffffffff81007036>] die+0x46/0x60
> >> [  199.709861]  [<ffffffff81003d38>] do_trap+0x138/0x140
> >> [  199.709861]  [<ffffffff810043a5>] do_error_trap+0x75/0xe0
> >> [  199.709861]  [<ffffffff811a778e>] ? submit_bh_wbc+0x16e/0x180
> >> [  199.709861]  [<ffffffff813d1e1f>] ? univ8250_console_write+0x19f/0x2e0
> >> [  199.709861]  [<ffffffff81099597>] ? print_time.part.13+0x67/0x90
> >> [  199.709861]  [<ffffffff81099625>] ? print_prefix+0x65/0xa0
> >> [  199.709861]  [<ffffffff8100451b>] do_invalid_op+0x1b/0x20
> >> [  199.709861]  [<ffffffff818ce19e>] invalid_op+0x1e/0x30
> >> [  199.709861]  [<ffffffff811a778e>] ? submit_bh_wbc+0x16e/0x180
> >> [  199.709861]  [<ffffffff810a9bcc>] ? internal_add_timer+0x6c/0x80
> >> [  199.709861]  [<ffffffff811a9449>] __sync_dirty_buffer+0x59/0xf0
> >> [  199.709861]  [<ffffffff811a94ee>] sync_dirty_buffer+0xe/0x10
> >> [  199.709861]  [<ffffffff8120de31>] ext4_commit_super+0x181/0x200
> >> [  199.709861]  [<ffffffff8120e255>] __ext4_error_inode+0x85/0x140
> >> [  199.709861]  [<ffffffff8108c400>] ? autoremove_wake_function+0x40/0x40
> >> [  199.709861]  [<ffffffff811f0fdd>] __ext4_get_inode_loc+0x1fd/0x3d0
> >> [  199.709861]  [<ffffffff811f2a78>] ext4_get_inode_loc+0x18/0x20
> >> [  199.709861]  [<ffffffff811f442f>] ext4_reserve_inode_write+0x1f/0x80
> >> [  199.709861]  [<ffffffff81218fdd>] ? __ext4_ext_dirty+0x6d/0x90
> >> [  199.709861]  [<ffffffff811f44d8>] ext4_mark_inode_dirty+0x48/0x1e0
> >> [  199.709861]  [<ffffffff81218fdd>] __ext4_ext_dirty+0x6d/0x90
> >> [  199.709861]  [<ffffffff8121abdc>] ext4_ext_insert_extent+0x6ac/0x1310
> >> [  199.709861]  [<ffffffff8121e098>] ext4_ext_map_blocks+0x258/0xdc0
> >> [  199.709861]  [<ffffffff811f1ba4>] ext4_map_blocks+0x124/0x4f0
> >> [  199.709861]  [<ffffffff811f4b51>] ext4_writepages+0x4e1/0xc50
> >> [  199.709861]  [<ffffffff8132ab85>] ? find_next_bit+0x15/0x20
> >> [  199.709861]  [<ffffffff81084cbb>] ? find_busiest_group+0x11b/0x980
> >> [  199.709861]  [<ffffffff81123199>] do_writepages+0x19/0x30
> >> [  199.709861]  [<ffffffff8119e7f9>] __writeback_single_inode+0x49/0x2b0
> >> [  199.709861]  [<ffffffff8119ef64>] writeback_sb_inodes+0x274/0x4c0
> >> [  199.709861]  [<ffffffff8119f234>] __writeback_inodes_wb+0x84/0xb0
> >> [  199.709861]  [<ffffffff8119f47f>] wb_writeback+0x21f/0x290
> >> [  199.709861]  [<ffffffff8119fc9e>] wb_workfn+0x2fe/0x480
> >> [  199.709861]  [<ffffffff810698a6>] process_one_work+0x156/0x3e0
> >> [  199.709861]  [<ffffffff81069e59>] worker_thread+0x49/0x450
> >> [  199.709861]  [<ffffffff81069e10>] ? rescuer_thread+0x2e0/0x2e0
> >> [  199.709861]  [<ffffffff81069e10>] ? rescuer_thread+0x2e0/0x2e0
> >> [  199.709861]  [<ffffffff8106ec44>] kthread+0xc4/0xe0
> >> [  199.709861]  [<ffffffff8106eb80>] ? kthread_create_on_node+0x180/0x180
> >> [  199.709861]  [<ffffffff818ccc9f>] ret_from_fork+0x3f/0x70
> >> [  199.709861]  [<ffffffff8106eb80>] ? kthread_create_on_node+0x180/0x180
> >> [  199.709861] Code: 8b 03 48 8b 7b 08 48 83 c3 10 4c 89 e6 ff d0 48
> >> 8b 03 48 85 c0 75 eb e9 32 ff ff ff 0f 1f 40 00 48 8b 87 f8 04 00 00
> >> 55 48 89 e5 <48> 8b 40 d8 5d c3 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00
> >> 00 00
> >> [  199.709861] RIP  [<ffffffff8106f1cb>] kthread_data+0xb/0x20
> >> [  199.709861]  RSP <ffff88013a427188>
> >> [  199.709861] CR2: ffffffffffffffd8
> >> [  199.709861] ---[ end trace d41ec1552544a209 ]---
> >> [  199.709861] Fixing recursive fault but reboot is needed!
> >>
> >>
> >> It looks like when underlaying device is gone, filesystem still trying
> >> to write to it, and wasn't handle gracefully.
> >>
> >> Is there anyway we can fix it or probably add a queue for pending
> >> actions for a relative long timeout? Just in case later the connection
> >> would come back.
> >>
> >> And I would be glad to help working on this issue if needed.
> >>
> >> Thanks!
> >>
> >>
> >> --Sheng
> >>
> >>
> >> ------------------------------------------------------------------------------
> >> Monitor Your Dynamic Infrastructure at Any Scale With Datadog!
> >> Get real-time metrics from all of your servers, apps and tools
> >> in one place.
> >> SourceForge users - Click here to start your Free Trial of Datadog now!
> >> http://pubads.g.doubleclick.net/gampad/clk?id=241902991&iu=/4140
> >> _______________________________________________
> >> Nbd-general mailing list
> >> Nbd-general@...ts.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/nbd-general
> >
> >
> 

-- 
Pengutronix e.K.                           |                             |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

Download attachment "signature.asc" of type "application/pgp-signature" (820 bytes)