[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+55aFwD9McVapb0svQrrvP1k6iSkqz5ENNGXY6b+Yo-k7wOsg@mail.gmail.com>
Date: Wed, 26 Oct 2016 12:06:21 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Dave Jones <davej@...emonkey.org.uk>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Chris Mason <clm@...com>,
Andy Lutomirski <luto@...capital.net>,
Andy Lutomirski <luto@...nel.org>, Jens Axboe <axboe@...com>,
Al Viro <viro@...iv.linux.org.uk>, Josef Bacik <jbacik@...com>,
David Sterba <dsterba@...e.com>,
linux-btrfs <linux-btrfs@...r.kernel.org>,
Linux Kernel <linux-kernel@...r.kernel.org>,
Dave Chinner <david@...morbit.com>
Subject: Re: bio linked list corruption.
On Wed, Oct 26, 2016 at 11:42 AM, Dave Jones <davej@...emonkey.org.uk> wrote:
>
> The stacks show nearly all of them are stuck in sync_inodes_sb
That's just wb_wait_for_completion(), and it means that some IO isn't
completing.
There's also a lot of processes waiting for inode_lock(), and a few
waiting for mnt_want_write()
Ignoring those, we have
> [<ffffffffa009554f>] btrfs_wait_ordered_roots+0x3f/0x200 [btrfs]
> [<ffffffffa00470d1>] btrfs_sync_fs+0x31/0xc0 [btrfs]
> [<ffffffff811fbd4e>] sync_filesystem+0x6e/0xa0
> [<ffffffff811fbebc>] SyS_syncfs+0x3c/0x70
> [<ffffffff8100255c>] do_syscall_64+0x5c/0x170
> [<ffffffff817908cb>] entry_SYSCALL64_slow_path+0x25/0x25
> [<ffffffffffffffff>] 0xffffffffffffffff
Don't know this one. There's a couple of them. Could there be some
ABBA deadlock on the ordered roots waiting?
> [<ffffffff8131ae87>] call_rwsem_down_write_failed+0x17/0x30
> [<ffffffffa008ed32>] btrfs_fallocate+0xb2/0xfd0 [btrfs]
> [<ffffffff811c6c3e>] vfs_fallocate+0x13e/0x220
> [<ffffffff811c79f3>] SyS_fallocate+0x43/0x80
> [<ffffffff8100255c>] do_syscall_64+0x5c/0x170
> [<ffffffff817908cb>] entry_SYSCALL64_slow_path+0x25/0x25
> [<ffffffffffffffff>] 0xffffffffffffffff
This one is also inode_lock(), and is interesting only because it's
fallocate(), which has shown up so many times before..
But there are other threads blocked on do_truncate, or
btrfs_file_write_iter instead, or on lseek, so this is not different
for any other reason.
> [<ffffffff81149fbf>] wait_on_page_bit+0xaf/0xc0
> [<ffffffff8114a121>] __filemap_fdatawait_range+0x151/0x170
> [<ffffffff8114d79c>] filemap_fdatawait_keep_errors+0x1c/0x20
> [<ffffffff811f59b3>] sync_inodes_sb+0x273/0x300
> [<ffffffff811fbd37>] sync_filesystem+0x57/0xa0
> [<ffffffff811fbebc>] SyS_syncfs+0x3c/0x70
> [<ffffffff8100255c>] do_syscall_64+0x5c/0x170
> [<ffffffff817908cb>] entry_SYSCALL64_slow_path+0x25/0x25
> [<ffffffffffffffff>] 0xffffffffffffffff
This is actually waiting on the page. Possibly this is the IO that is
never completing, and keeps the inode lock.
> [<ffffffffa009576b>] btrfs_start_ordered_extent+0x5b/0xb0 [btrfs]
> [<ffffffffa008bf5d>] lock_and_cleanup_extent_if_need+0x22d/0x290 [btrfs]
> [<ffffffffa008d1e8>] __btrfs_buffered_write+0x1b8/0x6e0 [btrfs]
> [<ffffffffa0090e60>] btrfs_file_write_iter+0x170/0x550 [btrfs]
> [<ffffffff811c97d8>] do_iter_readv_writev+0xa8/0x100
> [<ffffffff811ca162>] do_readv_writev+0x172/0x210
> [<ffffffff811ca42a>] vfs_writev+0x3a/0x50
> [<ffffffff811ca5c0>] do_pwritev+0xb0/0xd0
> [<ffffffff811cb57c>] SyS_pwritev+0xc/0x10
> [<ffffffff8100255c>] do_syscall_64+0x5c/0x170
> [<ffffffff817908cb>] entry_SYSCALL64_slow_path+0x25/0x25
Hmm. This is the one that *started* the ordered extents (as opposed to
the ones waiting for it)
I dunno. There might be a lost IO. More likely it's the same
corruption that causes it, it just didn't result in an oops this time.
Linus
Powered by blists - more mailing lists