lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080824211946.GI6851@linux.vnet.ibm.com>
Date:	Sun, 24 Aug 2008 14:19:46 -0700
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Manfred Spraul <manfred@...orfullife.com>
Cc:	linux-kernel@...r.kernel.org, cl@...ux-foundation.org,
	mingo@...e.hu, akpm@...ux-foundation.org, dipankar@...ibm.com,
	josht@...ux.vnet.ibm.com, schamp@....com, niv@...ibm.com,
	dvhltc@...ibm.com, ego@...ibm.com, laijs@...fujitsu.com,
	rostedt@...dmis.org
Subject: Re: [PATCH, RFC, tip/core/rcu] scalable classic RCU implementation

On Sun, Aug 24, 2008 at 08:25:02PM +0200, Manfred Spraul wrote:
> Paul E. McKenney wrote:
>>>> + */
>>>> +struct rcu_node {
>>>> +	spinlock_t lock;
>>>> +	unsigned long	qsmask;	/* CPUs or groups that need to switch in      */
>>>> +				/*  order for current grace period to proceed.*/
>>>> +	unsigned long	qsmaskinit;
>>>> +				/* Per-GP initialization for qsmask.	      */
>>>>         
>>> I'm not sure if a bitmap is the right storage. If I understand the code 
>>> correctly, it contains two information:
>>> 1) If the bitmap is clear, then all cpus have completed whatever they 
>>> need to do.
>>> A counter is more efficient than a bitmap. Especially: It would allow to 
>>> choose the optimal fan-out, independent from 32/64 bits.
>>> 2) The information if the current cpu must do something to complete the 
>>> current period.non
>>> This is a local information, usually (always?) only the current cpu needs 
>>> to know if it must do something.
>>> But this doesn't need to be stored in a shared structure, the information 
>>> could be stored in a per-cpu structure.
>>
>> I am using the bitmap in force_quiescent_state() to work out who to
>> check dynticks and who to send reschedule IPIs to.  I could scan all
>> of the per-CPU rcu_data structures, but am assuming that after a few
>> jiffies there would typically be relatively few CPUs still needing to do
>> a quiescent state.  Given this assumption, on systems with large numbers
>> of CPUs, scanning the bitmask greatly reduces the number of cache misses
>> compared to scanning the rcu_data structures.
>>   
> It's an optimization question: What is rarer? force_quiescent_state() or 
> "normal" cpu_quiet calls.
> You have optimized for force_quiescent_state(), I have optimized for 
> "normal" cpu_quiet calls. [ok, I admit: force_quiescent_state() is still 
> missing in my code].

;-)

> Do you have any statistics?

If the system is completely busy, then I would expect normal cpu_quiet()
calls to be more common.  But if the system were sized for peak
workload, it would spend a fair amount of time with many of the CPUs
idle.  Power-conservation measures would hopefully push the idleness
into single cores/dies/whatever which could then be powered down.

A large fraction of the systems I see have utilizations well under 50%.
And latency concerns would also focus on force_quiescent state.

That said, I haven't had much to do with systems having more than 128
CPUs.

							Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ