[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.2.02.1105050110440.3005@ionos>
Date: Thu, 5 May 2011 01:29:49 +0200 (CEST)
From: Thomas Gleixner <tglx@...utronix.de>
To: Andi Kleen <andi@...stfloor.org>
cc: Dave Kleikamp <dkleikamp@...il.com>,
Chris Mason <chris.mason@...cle.com>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Tim Chen <tim.c.chen@...ux.intel.com>,
linux-kernel@...r.kernel.org, lenb@...nel.org, paulmck@...ibm.com
Subject: Re: idle issues running sembench on 128 cpus
On Thu, 5 May 2011, Andi Kleen wrote:
> > No, it does not even need refcounting. We can access it outside of the
>
> Ok.
>
> > lock as this is atomic context called on the cpu which is about to go
> > idle and therefor the device cannot go away. Easy and straightforward
> > fix.
>
> Ok. Patch appended. Looks good?
Mostly. See below.
> BTW why must the lock be irqsave?
Good question. Probably safety frist paranoia :)
Indeed that code should only be called from irq disabled regions, so
we could avoid the irqsave there. Otherwise that needs to be irqsave
for obvious reasons.
> > > But yes it would be still good to fix Nehalem too.
> > >
> > > One fix would be to make all the masks hierarchical,
> > > similar to what RCU does. Perhaps even some code
> > > could be shared with RCU on that because it's a very
> > > similar problem.
> >
> > In theory. It's not about the mask. The mask is uninteresting. It's
> > about the expiry time, which we have to protect. There is nothing
> > hierarchical about that. It all boils down on _ONE_ single functional
>
> The mask can be used to see if another thread on this core is still
> running. If yes you don't need that. Right now Linux doesn't
> know that, but it could be taught. The only problem is that once
> the other guy goes idle too their timeouts have to be merged.
>
> This would cut contention in half.
That makes sense, but merging the timeouts race free will be a real
PITA.
> Also if it's HPET you could actually use multiple independent HPET channels.
> I remember us discussing this a long time ago... Not sure if it's worth
> it, but it may be a small relief.
Multiple broadcast devices. That sounds still horrible :)
> > device and you don't want to miss out your deadline just because you
> > decided to be extra clever. RCU does not care much whether you run the
> > callbacks a tick later on not. Time and timekeeping does.
>
> You can at least check lockless if someone else has a <= timeout, right?
Might be worth a try. Need some sleep to remember why I discarded that
idea long ago.
> -Andi
>
> ---
>
> Move C3 stop test outside lock
>
> Avoid taking locks in the idle path for systems where the timer
> doesn't stop in C3.
>
> Signed-off-by: Andi Kleen <ak@...ux.intel.com>
>
> diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
> index da800ff..9cf0415 100644
> --- a/kernel/time/tick-broadcast.c
> +++ b/kernel/time/tick-broadcast.c
> @@ -456,23 +456,22 @@ void tick_broadcast_oneshot_control(unsigned long reason)
> unsigned long flags;
> int cpu;
>
> - raw_spin_lock_irqsave(&tick_broadcast_lock, flags);
> -
> /*
> * Periodic mode does not care about the enter/exit of power
> * states
> */
> if (tick_broadcast_device.mode == TICKDEV_MODE_PERIODIC)
> - goto out;
> + return;
>
> + cpu = raw_smp_processor_id();
Why raw_ ? As I said above this should always be called with irqs
disabled.
If that ever gets called from an irq enabled, preemptible and
migratable context then we just open up a very narrow but ugly to
debug race window as we can look at the wrong per cpu device.
> bc = tick_broadcast_device.evtdev;
> - cpu = smp_processor_id();
> td = &per_cpu(tick_cpu_device, cpu);
> dev = td->evtdev;
Thanks,
tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists