[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4b542b49b2994c9d8c4c73b9e3b42dde@AcuMS.aculab.com>
Date: Mon, 18 Mar 2024 15:53:07 +0000
From: David Laight <David.Laight@...LAB.COM>
To: 'Pasha Tatashin' <pasha.tatashin@...een.com>, Matthew Wilcox
<willy@...radead.org>
CC: "H. Peter Anvin" <hpa@...or.com>, Kent Overstreet
<kent.overstreet@...ux.dev>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "linux-mm@...ck.org" <linux-mm@...ck.org>,
"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>, "x86@...nel.org"
<x86@...nel.org>, "bp@...en8.de" <bp@...en8.de>, "brauner@...nel.org"
<brauner@...nel.org>, "bristot@...hat.com" <bristot@...hat.com>,
"bsegall@...gle.com" <bsegall@...gle.com>, "dave.hansen@...ux.intel.com"
<dave.hansen@...ux.intel.com>, "dianders@...omium.org"
<dianders@...omium.org>, "dietmar.eggemann@....com"
<dietmar.eggemann@....com>, "eric.devolder@...cle.com"
<eric.devolder@...cle.com>, "hca@...ux.ibm.com" <hca@...ux.ibm.com>,
"hch@...radead.org" <hch@...radead.org>, "jacob.jun.pan@...ux.intel.com"
<jacob.jun.pan@...ux.intel.com>, "jgg@...pe.ca" <jgg@...pe.ca>,
"jpoimboe@...nel.org" <jpoimboe@...nel.org>, "jroedel@...e.de"
<jroedel@...e.de>, "juri.lelli@...hat.com" <juri.lelli@...hat.com>,
"kinseyho@...gle.com" <kinseyho@...gle.com>,
"kirill.shutemov@...ux.intel.com" <kirill.shutemov@...ux.intel.com>,
"lstoakes@...il.com" <lstoakes@...il.com>, "luto@...nel.org"
<luto@...nel.org>, "mgorman@...e.de" <mgorman@...e.de>, "mic@...ikod.net"
<mic@...ikod.net>, "michael.christie@...cle.com"
<michael.christie@...cle.com>, "mingo@...hat.com" <mingo@...hat.com>,
"mjguzik@...il.com" <mjguzik@...il.com>, "mst@...hat.com" <mst@...hat.com>,
"npiggin@...il.com" <npiggin@...il.com>, "peterz@...radead.org"
<peterz@...radead.org>, "pmladek@...e.com" <pmladek@...e.com>,
"rick.p.edgecombe@...el.com" <rick.p.edgecombe@...el.com>,
"rostedt@...dmis.org" <rostedt@...dmis.org>, "surenb@...gle.com"
<surenb@...gle.com>, "tglx@...utronix.de" <tglx@...utronix.de>,
"urezki@...il.com" <urezki@...il.com>, "vincent.guittot@...aro.org"
<vincent.guittot@...aro.org>, "vschneid@...hat.com" <vschneid@...hat.com>
Subject: RE: [RFC 00/14] Dynamic Kernel Stacks
From: Pasha Tatashin
> Sent: 18 March 2024 15:31
>
> On Mon, Mar 18, 2024 at 11:19 AM Matthew Wilcox <willy@...radead.org> wrote:
> >
> > On Mon, Mar 18, 2024 at 11:09:47AM -0400, Pasha Tatashin wrote:
> > > The TLB load is going to be exactly the same as today, we already use
> > > small pages for VMA mapped stacks. We won't need to have extra
> > > flushing either, the mappings are in the kernel space, and once pages
> > > are removed from the page table, no one is going to access that VA
> > > space until that thread enters the kernel again. We will need to
> > > invalidate the VA range only when the pages are mapped, and only on
> > > the local cpu.
> >
> > No; we can pass pointers to our kernel stack to other threads. The
> > obvious one is a mutex; we put a mutex_waiter on our own stack and
> > add its list_head to the mutex's waiter list. I'm sure you can
> > think of many other places we do this (eg wait queues, poll(), select(),
> > etc).
>
> Hm, it means that stack is sleeping in the kernel space, and has its
> stack pages mapped and invalidated on the local CPU, but access from
> the remote CPU to that stack pages would be problematic.
>
> I think we still won't need IPI, but VA-range invalidation is actually
> needed on unmaps, and should happen during context switch so every
> time we go off-cpu. Therefore, what Brian/Andy have suggested makes
> more sense instead of kernel/enter/exit paths.
I think you'll need to broadcast an invalidate.
Consider:
CPU A: task allocates extra pages and adds something to some list.
CPU B: accesses that data and maybe modifies it.
Some page-table walk setup ut the TLB.
CPU A: task detects the modify, removes the item from the list,
collapses back the stack and sleeps.
Stack pages freed.
CPU A: task wakes up (on the same cpu for simplicity).
Goes down a deep stack and puts an item on a list.
Different physical pages are allocated.
CPU B: accesses the associated KVA.
It better not have a cached TLB.
Doesn't that need an IPI?
Freeing the pages is much harder than allocating them.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Powered by blists - more mailing lists