[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrXd3zMkeiOq_tnrLXonX-KY9A1pyB2So+GHP6rcts2KEA@mail.gmail.com>
Date: Tue, 30 Aug 2016 11:23:06 -0700
From: Andy Lutomirski <luto@...capital.net>
To: "H. Peter Anvin" <hpa@...or.com>
Cc: Rik van Riel <riel@...hat.com>,
Thomas Gleixner <tglx@...utronix.de>,
Benjamin Serebrin <serebrin@...gle.com>,
Borislav Petkov <bp@...e.de>, Ingo Molnar <mingo@...nel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Mel Gorman <mgorman@...e.de>,
Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH RFC UGLY] x86,mm,sched: make lazy TLB mode even lazier
On Mon, Aug 29, 2016 at 6:14 PM, H. Peter Anvin <hpa@...or.com> wrote:
> On August 29, 2016 4:55:02 PM PDT, Andy Lutomirski <luto@...capital.net> wrote:
>>On Aug 29, 2016 7:54 AM, "Rik van Riel" <riel@...hat.com> wrote:
>>>
>>> On Sun, 2016-08-28 at 01:11 -0700, Andy Lutomirski wrote:
>>> > On Aug 25, 2016 9:06 PM, "Rik van Riel" <riel@...hat.com> wrote:
>>> > >
>>> > > Subject: x86,mm,sched: make lazy TLB mode even lazier
>>> > >
>>> > > Lazy TLB mode can result in an idle CPU being woken up for a TLB
>>> > > flush, when all it really needed to do was flush %cr3 before the
>>> > > next context switch.
>>> > >
>>> > > This is mostly fine on bare metal, though sub-optimal from a
>>power
>>> > > saving point of view, and deeper C states could make TLB flushes
>>> > > take a little longer than desired.
>>> > >
>>> > > On virtual machines, the pain can be much worse, especially if a
>>> > > currently non-running VCPU is woken up for a TLB invalidation
>>> > > IPI, on a CPU that is busy running another task. It could take
>>> > > a while before that IPI is handled, leading to performance
>>issues.
>>> > >
>>> > > This patch is still ugly, and the sched.h include needs to be
>>> > > cleaned
>>> > > up a lot (how would the scheduler people like to see the context
>>> > > switch
>>> > > blocking abstracted?)
>>> > >
>>> > > This patch deals with the issue by introducing a third tlb state,
>>> > > TLBSTATE_FLUSH, which causes %cr3 to be flushed at the next
>>> > > context switch. A CPU is transitioned from TLBSTATE_LAZY to
>>> > > TLBSTATE_FLUSH with the rq lock held, to prevent context
>>switches.
>>> > >
>>> > > Nothing is done for a CPU that is already in TLBSTATE_FLUH mode.
>>> > >
>>> > > This patch is totally untested, because I am at a conference
>>right
>>> > > now, and Benjamin has the test case :)
>>> > >
>>> >
>>> > I haven't had a chance to seriously read the code yet, but what
>>> > happens when the mm is deleted outright? Or is the idea that a
>>> > reference is held until all the lazy users are gone, too?
>>>
>>> Worst case we send a TLB flush to a CPU that does
>>> not need it.
>>>
>>> As not sending an IPI will be faster than sending
>>> one, I do not think the tradeoff will be much
>>> different for a system with PCID.
>>
>>If we were fully non-lazy, we wouldn't need to send these IPIs at all,
>>right? We would just keep cr3 pointing at swapper_pg_dir when not
>>actively using the mm. The problem with doing that without PCID is
>>that cr3 writes are really slow. Or am I missing something?
>
> Writing cr3 on a PCID system doesn't (necessarily) flush the TLB context. The whole reason for PCIDs is to *enable* lazy TLB by not making it necessary to flush a TLB context during the running of another process. As such, this methodology should help a PCID system even more: we can remember if we need to flush a TLB context during the scheduling of said task, without needing any IPI.
What I mean, more precisely, is: when unusing an mm, if we have PCID,
we could actually switch to swapper_pg_dir without flushing the TLB.
Then, when we resume the old task, we can use the tracking (that I add
in my patches) to decide when to flush them.
I'm not sure this would actually improve matters in any meaningful way.
--Andy
Powered by blists - more mailing lists