[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+55aFxeWe2VQaW30qGR0syiZ75jSwFwg3Ac+wS20KDtf5UKNw@mail.gmail.com>
Date: Mon, 5 Dec 2016 09:55:07 -0800
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Vegard Nossum <vegard.nossum@...il.com>
Cc: Dave Jones <davej@...emonkey.org.uk>, Chris Mason <clm@...com>,
Jens Axboe <axboe@...com>,
Andy Lutomirski <luto@...capital.net>,
Andy Lutomirski <luto@...nel.org>,
Al Viro <viro@...iv.linux.org.uk>, Josef Bacik <jbacik@...com>,
David Sterba <dsterba@...e.com>,
linux-btrfs <linux-btrfs@...r.kernel.org>,
Linux Kernel <linux-kernel@...r.kernel.org>,
Dave Chinner <david@...morbit.com>
Subject: Re: bio linked list corruption.
On Mon, Dec 5, 2016 at 9:09 AM, Vegard Nossum <vegard.nossum@...il.com> wrote:
>
> The warning shows that it made it past the list_empty_careful() check
> in finish_wait() but then bugs out on the &wait->task_list
> dereference.
>
> Anything stick out?
I hate that shmem waitqueue garbage. It's really subtle.
I think the problem is that "wake_up_all()" in shmem_fallocate()
doesn't necessarily wake up everything. It wakes up TASK_NORMAL -
which does include TASK_UNINTERRUPTIBLE, but doesn't actually mean
"everything on the list".
I think that what happens is that the waiters somehow move from
TASK_UNINTERRUPTIBLE to TASK_RUNNING early, and this means that
wake_up_all() will ignore them, leave them on the list, and now that
list on stack is no longer empty at the end.
And the way *THAT* can happen is that the task is on some *other*
waitqueue as well, and that other waiqueue wakes it up. That's not
impossible, you can certainly have people on wait-queues that still
take faults.
Or somebody just uses a directed wake_up_process() or something.
Since you apparently can recreate this fairly easily, how about trying
this stupid patch?
NOTE! This is entirely untested. I may have screwed this up entirely.
You get the idea, though - just remove the wait queue head from the
list - the list entries stay around, but nothing points to the stack
entry (that we're going to free) any more.
And add the warning to see if this actually ever triggers (and because
I'd like to see the callchain when it does, to see if it's another
waitqueue somewhere or what..)
Linus
View attachment "patch.diff" of type "text/plain" (469 bytes)
Powered by blists - more mailing lists