lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 30 Aug 2013 11:53:42 -0700
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Waiman Long <waiman.long@...com>
Cc:	Ingo Molnar <mingo@...nel.org>,
	Benjamin Herrenschmidt <benh@...nel.crashing.org>,
	Alexander Viro <viro@...iv.linux.org.uk>,
	Jeff Layton <jlayton@...hat.com>,
	Miklos Szeredi <mszeredi@...e.cz>,
	Ingo Molnar <mingo@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	linux-fsdevel <linux-fsdevel@...r.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Andi Kleen <andi@...stfloor.org>,
	"Chandramouleeswaran, Aswin" <aswin@...com>,
	"Norton, Scott J" <scott.norton@...com>
Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless
 update of refcount

On Fri, Aug 30, 2013 at 11:33 AM, Waiman Long <waiman.long@...com> wrote:
>
> I tested your patch on a 2-socket (12 cores, 24 threads) DL380 with 2.9GHz
> Westmere-EX CPUs, the test results of your test program (max threads
> increased to 24 to match the thread count) were:
>
> with patch = 68M
> w/o patch = 12M

Ok, that's certainly noticeable.

> I have reviewed the patch, and it looks good to me with the exception that I
> added a cpu_relax() call at the end of while loop in the CMPXCHG_LOOP macro.

Yeah, that's probably a good idea.

> I also got the perf data of the test runs with and without the patch.

So the perf data would be *much* more interesting for a more varied
load. I know pretty much exactly what happens with my silly
test-program, and as you can see it never really gets to the actual
spinlock, because that test program will only ever hit the fast-path
case.

It would be much more interesting to see another load that may trigger
the d_lock actually being taken. So:

> For the other test cases that I am interested in, like the AIM7 benchmark,
> your patch may not be as good as my original one. I got 1-3M JPM (varied
> quite a lot in different runs) in the short workloads on a 80-core system.
> My original one got 6M JPM. However, the test was done on 3.10 based kernel.
> So I need to do more test to see if that has an effect on the JPM results.

I'd really like to see a perf profile of that, particularly with some
call chain data for the relevant functions (ie "what it is that causes
us to get to spinlocks"). Because it may well be that you're hitting
some of the cases that I didn't see, and thus didn't notice.

In particular, I suspect AIM7 actually creates/deletes files and/or
renames them too. Or maybe I screwed up the dget_parent() special case
thing, which mattered because AIM7 did a lot of getcwd() calls or
someting odd like that.

                Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ