linux-kernel - Re: regression introduced by - timers: fix itimer/many thread hang

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <1227450296.7685.20759.camel@twins>
Date:	Sun, 23 Nov 2008 15:24:56 +0100
From:	Peter Zijlstra <peterz@...radead.org>
To:	Petr Tesarik <ptesarik@...e.cz>
Cc:	Frank Mayhar <fmayhar@...gle.com>,
	Christoph Lameter <cl@...ux-foundation.org>,
	Doug Chapman <doug.chapman@...com>, mingo@...e.hu,
	roland@...hat.com, adobriyan@...il.com, akpm@...ux-foundation.org,
	linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: regression introduced by - timers: fix itimer/many thread hang

On Fri, 2008-11-21 at 19:42 +0100, Petr Tesarik wrote:

> > > In any event, while this particular implementation may not be optimal,
> > > at least it's _right_.  Whatever happened to "make it right, then make
> > > it fast?"
> >
> > Well, I'm not thinking you did it right ;-)
> >
> > While I agree that the linear loop is sub-optimal, but it only really
> > becomes a problem when you have hundreds or thousands of threads in your
> > application, which I'll argue to be insane anyway.
> 
> This is just not true. I've seen a very real example of a lockup with a very 
> sane number of threads (one per CPU), but on a very large machine (1024 CPUs 
> IIRC). The application set per-process CPU profiling with an interval of 1 
> tick, which translates to 1024 timers firing off with each tick...
> 
> Well, yes, that was broken, too, but that's the way one quite popular FORTRAN 
> compiler works...

I'm not sure what side you're arguing...

The current (per-cpu) code is utterly broken on large machines too, I've
asked SGI to run some tests on real numa machines (something multi-brick
altix) and even moderately small machines with 256 cpus in them grind to
a halt (or make progress at a snails pace) when the itimer stuff is
enabled.

Furthermore, I really dislike the per-process-per-cpu memory cost, it
bloats applications and makes the new per-cpu alloc work rather more
difficult than it already is.

I basically think the whole process wide itimer stuff is broken by
design, there is no way to make it work on reasonably large machines,
the whole problem space just doesn't scale. You simply cannot maintain a
global count without bouncing cachelines like mad, so you might as well
accept it and do the process wide counter and bounce only a single line,
instead of bouncing a line per-cpu.

Furthermore, I stand by my claims that anything that runs more than a
hand-full of threads per physical core is utterly braindead and deserves
all the pain it can get. (Yes, I'm a firm believer in state machines and
don't think just throwing threads at a problem is a sane solution).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/