[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+55aFwjSZ_Q4xS21z60THsKr99A2nU4VKTkav47DzJj0+Ewnw@mail.gmail.com>
Date: Tue, 22 Aug 2017 12:15:20 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Peter Zijlstra <peterz@...radead.org>
Cc: "Liang, Kan" <kan.liang@...el.com>,
Mel Gorman <mgorman@...hsingularity.net>,
Mel Gorman <mgorman@...e.de>,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
Tim Chen <tim.c.chen@...ux.intel.com>,
Ingo Molnar <mingo@...e.hu>, Andi Kleen <ak@...ux.intel.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Johannes Weiner <hannes@...xchg.org>, Jan Kara <jack@...e.cz>,
linux-mm <linux-mm@...ck.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 1/2] sched/wait: Break up long wake list walk
On Tue, Aug 22, 2017 at 11:56 AM, Peter Zijlstra <peterz@...radead.org> wrote:
>
> Won't we now prematurely terminate the wait when we get a spurious
> wakeup?
I think there's two answers to that:
(a) do we even care?
(b) what spurious wakeup?
The "do we even care" quesiton is because wait_on_page_bit by
definition isn't really serializing. And I'm not even talking about
memory ordering, altough that is true too - I'm talking just
fundamentally, that by definition when we're not locking, by the time
wait_on_page_bit() returns to the caller, it could obviously have
changed again.
So I think wait_on_page_bit() is by definition not really guaranteeing
that the bit really is clear. And I don't think we have really have
cases that matter.
But if we do - say, 'fsync()' waiting for a page to wait for
writeback, where would you get spurious wakeups from? They normally
happen either when we have nested waiting (eg a page fault happens
while we have other wait queues active), and I'm not seeing that being
an issue here.
That said, I do think we might want to perhaps make a "careful" vs
"just wait a bit" version of this if the patch works out.
The patch is primarily for testing this particular case. I actually
think it's probably ok in general, but maybe there really is some
special case that could have multiple wakeup sources and it needs to
see *this* particular one.
(We could perhaps handle that case by checking "is the wait-queue
empty now" instead, and just get rid of the re-arming, not break out
of the loop immediately after the io_schedule()).
Linus
Powered by blists - more mailing lists