lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20091122174203.GE9029@linux.vnet.ibm.com>
Date:	Sun, 22 Nov 2009 09:42:03 -0800
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
Cc:	linux-kernel@...r.kernel.org, mingo@...e.hu, laijs@...fujitsu.com,
	dipankar@...ibm.com, akpm@...ux-foundation.org,
	josh@...htriplett.org, dvhltc@...ibm.com, niv@...ibm.com,
	tglx@...utronix.de, peterz@...radead.org, rostedt@...dmis.org,
	Valdis.Kletnieks@...edu, dhowells@...hat.com
Subject: Re: [PATCH tip/core/rcu 0/3] rcu: resend of grace-period stall and
	cleanup patches

On Sun, Nov 22, 2009 at 12:05:42PM -0500, Mathieu Desnoyers wrote:
> * Paul E. McKenney (paulmck@...ux.vnet.ibm.com) wrote:
> > Hello!
> > 
> > This patch series is a resend of the three RCU patches that are candidates
> > for the upcoming 2.6.33 merge window, but that are not yet in -tip.
> > These are:
> > 
> > 1.	A fix for a grace-period-stall bug that occurs on large
> > 	machines.
> [...]
> 
> Hi Paul,
> 
> I was thinking about the last bugs you discovered. Some caracteristics
> they had in common were that they occur only on large marchines (32+ or
> 64+ CPUs). This is caused by the fact that some of your code is only
> covered by tests when the number of CPUs go over the architecture size
> (in bits).
> 
> I managed to cover this kind of scenario with smaller state-space in the
> LTTng formal models (but it also applies to kernel code) by tweaking the
> code, with bitmasks, to ensure that the number of bits the code uses is,
> e.g., no more than the minimum amount of required bits. Therefore, you
> are ensured to run into overflow scenarios either more quickly or, as in
> this case, on decently-sized hardware.

You mean by setting CONFIG_RCU_FANOUT=2 in order to get three levels
of rcu_node hierarchy on an eight-CPU machine, which would otherwise
require more than 1024 CPU on a 32-bit system or more that 4096 CPUs on
a 64-bit system?  ;-)

http://paulmck.livejournal.com/14969.html

But yes, the largest machine I have access to has "only" 128 CPUs,
and it is often heavily used by others.  So I heartily agree with
your point, which is that we should use various techniques to test
code on smaller machines in ways that larger machines will stress it.
Of course, my favorite such technique is differential profiling, which
allows performance results collected on small machines to reveal problems
that would only show up on large machines:

http://www.rdrop.com/users/paulmck/scalability/paper/profiling.2002.06.04.pdf

(This is a revision of a paper that appeared in the 1995 MASCOTS
conference and in the 1999 Software Practice & Experience journal.)

							Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ