[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bcc4c9fa41ace6f9d98d88d24d1bd67a469cbfeb.camel@kernel.org>
Date: Tue, 31 May 2022 18:15:36 +0200
From: nicolas saenz julienne <nsaenz@...nel.org>
To: Frederic Weisbecker <frederic@...nel.org>
Cc: LKML <linux-kernel@...r.kernel.org>,
Peter Zijlstra <peterz@...radead.org>,
Phil Auld <pauld@...hat.com>,
Alex Belits <abelits@...vell.com>,
Xiongfeng Wang <wangxiongfeng2@...wei.com>,
Neeraj Upadhyay <quic_neeraju@...cinc.com>,
Thomas Gleixner <tglx@...utronix.de>,
Yu Liao <liaoyu15@...wei.com>,
Boqun Feng <boqun.feng@...il.com>,
"Paul E . McKenney" <paulmck@...nel.org>,
Marcelo Tosatti <mtosatti@...hat.com>,
Paul Gortmaker <paul.gortmaker@...driver.com>,
Uladzislau Rezki <uladzislau.rezki@...y.com>,
Joel Fernandes <joel@...lfernandes.org>,
Mark Rutland <mark.rutland@....com>
Subject: Re: [PATCH 20/21] rcu/context_tracking: Merge dynticks counter and
context tracking states
On Tue, 2022-05-31 at 16:23 +0200, Frederic Weisbecker wrote:
> On Mon, May 30, 2022 at 08:02:57PM +0200, nicolas saenz julienne wrote:
> > Hi Frederic,
> >
> > On Thu, 2022-05-19 at 16:58 +0200, Frederic Weisbecker wrote:
> > > Updating the context tracking state and the RCU dynticks counter
> > > atomically in a single operation is a first step towards improving CPU
> > > isolation. This makes the context tracking state updates fully ordered
> > > and therefore allow for later enhancements such as postponing some work
> > > while a task is running isolated in userspace until it ever comes back
> > > to the kernel.
> > >
> > > The state field becomes divided in two parts:
> > >
> > > 1) Two Lower bits for context tracking state:
> > >
> > > CONTEXT_KERNEL = 0
> > > CONTEXT_IDLE = 1,
> > > CONTEXT_USER = 2,
> > > CONTEXT_GUEST = 3,
> > >
> > > 2) Higher bits for RCU eqs dynticks counting:
> > >
> > > RCU_DYNTICKS_IDX = 4
> > >
> > > The dynticks counting is always incremented by this value.
> > > (state & RCU_DYNTICKS_IDX) means we are NOT in an extended quiescent
> > > state. This makes the chance for a collision more likely between two
> > > RCU dynticks snapshots but wrapping up 28 bits of eqs dynticks
> > > increments still takes some bad luck (also rdp.dynticks_snap could be
> > > converted from int to long?)
> > >
> > > Some RCU eqs functions have been renamed to better reflect their broader
> > > scope that now include context tracking state.
> > >
> > > Signed-off-by: Frederic Weisbecker <frederic@...nel.org>
> > > Cc: Paul E. McKenney <paulmck@...nel.org>
> > > Cc: Peter Zijlstra <peterz@...radead.org>
> > > Cc: Thomas Gleixner <tglx@...utronix.de>
> > > Cc: Neeraj Upadhyay <quic_neeraju@...cinc.com>
> > > Cc: Uladzislau Rezki <uladzislau.rezki@...y.com>
> > > Cc: Joel Fernandes <joel@...lfernandes.org>
> > > Cc: Boqun Feng <boqun.feng@...il.com>
> > > Cc: Nicolas Saenz Julienne <nsaenz@...nel.org>
> > > Cc: Marcelo Tosatti <mtosatti@...hat.com>
> > > Cc: Xiongfeng Wang <wangxiongfeng2@...wei.com>
> > > Cc: Yu Liao<liaoyu15@...wei.com>
> > > Cc: Phil Auld <pauld@...hat.com>
> > > Cc: Paul Gortmaker<paul.gortmaker@...driver.com>
> > > Cc: Alex Belits <abelits@...vell.com>
> > > ---
> >
> > While working on a feature on top of this series (IPI deferral stuff) I believe
> > I've found a discrepancy on how context state is being updated:
> >
> > - When servicing an IRQ from user-space, we increment dynticks, and clear the
> > ct state to show we're in-kernel.
> >
> > - When servicing an IRQ from idle/guest or an NMI from any context we only
> > increment the dynticks counter. The ct state remains unchanged.
>
> Hmm, an IRQ from userspace does:
>
> ct_user_enter()
> //run in user
> //-----IRQ
> ct_user_exit()
> ct_irq_enter()
> ct_irq_exit()
> ct_user_enter()
> //run in user
>
> An IRQ from guest does:
>
> for (;;) {
> context_tracking_guest_enter()
> //vmrun
> //IRQ pending
> #VMEXIT
> context_tracking_guest_exit()
> local_irq_enable()
> ct_irq_enter()
> ct_irq_exit()
> local_irq_disable()
> }
>
>
> (although I see there is an "sti" right before "vmrun" so it looks
> possible to have ct_irq_enter() after context_tracking_guest_enter()
> if a host IRQ fires between the sti and the vmrun though I might be
> missing some kvm subtelty).
>
> An IRQ from idle does just:
>
> ct_idle_enter()
> //IRQ
> ct_irq_enter()
> ct_irq_exit()
> ct_idle_exit()
>
> So guest looks mostly ok to me (except for the little sti before vmrun for
> which I have a doubt).
Yes, shouldn't have mentioned guests. I got carried away.
> But idle at least is an exception and CONTEXT_IDLE will remain during the
> interrupt handling. It's not that trivial to handle the idle case because
> ct_irq_exit() needs to know that it is called between ct_idle_enter() and
> ct_idle_exit().
Just for the record, this behaviour was already here regardless of this series,
so it's not something it needs to fix.
Something like this should work, right?
ct_idle_enter()
//IRQ or NMI
if (__ct_state() == CONTEXT_IDLE)
ct_idle_exit()
ct_irq_enter()
...
ct_irq_exit()
if (needs_update_state()) //using irqentry_state_t for ex.
ct_idle_entry()
ct_idle_exit()
Note that it's not a big issue as we can work around this behaviour by checking
through dynticks whether a CPU is really idle.
Do you think it's worth fixing nonetheless?
Regards,
Nicolas
Powered by blists - more mailing lists