linux-kernel - Re: idle issues running sembench on 128 cpus

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LFD.2.02.1105050110440.3005@ionos>
Date:	Thu, 5 May 2011 01:29:49 +0200 (CEST)
From:	Thomas Gleixner <tglx@...utronix.de>
To:	Andi Kleen <andi@...stfloor.org>
cc:	Dave Kleikamp <dkleikamp@...il.com>,
	Chris Mason <chris.mason@...cle.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Tim Chen <tim.c.chen@...ux.intel.com>,
	linux-kernel@...r.kernel.org, lenb@...nel.org, paulmck@...ibm.com
Subject: Re: idle issues running sembench on 128 cpus

On Thu, 5 May 2011, Andi Kleen wrote:
> > No, it does not even need refcounting. We can access it outside of the
> 
> Ok.
> 
> > lock as this is atomic context called on the cpu which is about to go
> > idle and therefor the device cannot go away. Easy and straightforward
> > fix.
> 
> Ok. Patch appended. Looks good?

Mostly. See below.
 
> BTW why must the lock be irqsave?

Good question. Probably safety frist paranoia :)

Indeed that code should only be called from irq disabled regions, so
we could avoid the irqsave there. Otherwise that needs to be irqsave
for obvious reasons.

> > > But yes it would be still good to fix Nehalem too.
> > > 
> > > One fix would be to make all the masks hierarchical,
> > > similar to what RCU does. Perhaps even some code 
> > > could be shared with RCU on that because it's a very
> > > similar problem.
> > 
> > In theory. It's not about the mask. The mask is uninteresting. It's
> > about the expiry time, which we have to protect. There is nothing
> > hierarchical about that. It all boils down on _ONE_ single functional
> 
> The mask can be used to see if another thread on this core is still
> running. If yes you don't need that. Right now Linux doesn't 
> know that, but it could be taught. The only problem is that once
> the other guy goes idle too their timeouts have to be merged.
> 
> This would cut contention in half.

That makes sense, but merging the timeouts race free will be a real
PITA.

> Also if it's HPET you could actually use multiple independent HPET channels.
> I remember us discussing this a long time ago... Not sure if it's worth
> it, but it may be a small relief.

Multiple broadcast devices. That sounds still horrible :)
 
> > device and you don't want to miss out your deadline just because you
> > decided to be extra clever. RCU does not care much whether you run the
> > callbacks a tick later on not. Time and timekeeping does.
> 
> You can at least check lockless if someone else has a <= timeout, right?

Might be worth a try. Need some sleep to remember why I discarded that
idea long ago.

> -Andi
> 
> ---
> 
> Move C3 stop test outside lock
> 
> Avoid taking locks in the idle path for systems where the timer
> doesn't stop in C3.
> 
> Signed-off-by: Andi Kleen <ak@...ux.intel.com>
> 
> diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
> index da800ff..9cf0415 100644
> --- a/kernel/time/tick-broadcast.c
> +++ b/kernel/time/tick-broadcast.c
> @@ -456,23 +456,22 @@ void tick_broadcast_oneshot_control(unsigned long reason)
>  	unsigned long flags;
>  	int cpu;
>  
> -	raw_spin_lock_irqsave(&tick_broadcast_lock, flags);
> -
>  	/*
>  	 * Periodic mode does not care about the enter/exit of power
>  	 * states
>  	 */
>  	if (tick_broadcast_device.mode == TICKDEV_MODE_PERIODIC)
> -		goto out;
> +		return;
>  
> +	cpu = raw_smp_processor_id();

Why raw_ ? As I said above this should always be called with irqs
disabled.

If that ever gets called from an irq enabled, preemptible and
migratable context then we just open up a very narrow but ugly to
debug race window as we can look at the wrong per cpu device.

>  	bc = tick_broadcast_device.evtdev;
> -	cpu = smp_processor_id();
>  	td = &per_cpu(tick_cpu_device, cpu);
>  	dev = td->evtdev;

Thanks,

	tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/