[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+55aFzw1_N29mSttZ-TKn7S0-MKbvAfBr4PH+_KBMYr2Uoj+Q@mail.gmail.com>
Date: Tue, 6 Dec 2016 08:33:41 -0800
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Vegard Nossum <vegard.nossum@...il.com>,
Ingo Molnar <mingo@...nel.org>,
Dave Jones <davej@...emonkey.org.uk>, Chris Mason <clm@...com>,
Jens Axboe <axboe@...com>,
Andy Lutomirski <luto@...capital.net>,
Andy Lutomirski <luto@...nel.org>,
Al Viro <viro@...iv.linux.org.uk>, Josef Bacik <jbacik@...com>,
David Sterba <dsterba@...e.com>,
linux-btrfs <linux-btrfs@...r.kernel.org>,
Linux Kernel <linux-kernel@...r.kernel.org>,
Dave Chinner <david@...morbit.com>
Subject: Re: bio linked list corruption.
On Tue, Dec 6, 2016 at 12:16 AM, Peter Zijlstra <peterz@...radead.org> wrote:
>>
>> Of course, I'm really hoping that this shmem.c use is the _only_ such
>> case. But I doubt it.
>
> $ git grep DECLARE_WAIT_QUEUE_HEAD_ONSTACK | wc -l
> 28
Hmm. Most of them seem to be ok, because they use "wait_event()",
which will always remove itself from the wait-queue. And they do it
from the place that allocates the wait-queue.
IOW, the mm/shmem.c case really was fairly special, because it just
did "prepare_to_wait()", and then did a finish_wait() - and not in the
thread that allocated it on the stack.
So it's really that "some _other_ thread allocated the waitqueue on
the stack, and now we're doing a wait on it" that is bad.
So the normal pattern seems to be:
- allocate wq on the stack
- pass it on to a waker
- wait for it
and that's ok, because as part of "wait for it" we will also be
cleaning things up.
The reason mm/shmem.c was buggy was that it did
- allocate wq on the stack
- pass it on to somebody else to wait for
- wake it up
and *that* is buggy, because it's the waiter, not the waker, that
normally cleans things up.
Is there some good way to find this kind of pattern automatically, I wonder....
Linus
Powered by blists - more mailing lists