linux-kernel - Re: regression introduced by - timers: fix itimer/many thread hang

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <1227519208.7685.21951.camel@twins>
Date:	Mon, 24 Nov 2008 10:33:28 +0100
From:	Peter Zijlstra <peterz@...radead.org>
To:	Petr Tesarik <ptesarik@...e.cz>
Cc:	Frank Mayhar <fmayhar@...gle.com>,
	Christoph Lameter <cl@...ux-foundation.org>,
	Doug Chapman <doug.chapman@...com>, mingo@...e.hu,
	roland@...hat.com, adobriyan@...il.com, akpm@...ux-foundation.org,
	linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: regression introduced by - timers: fix itimer/many thread hang

On Mon, 2008-11-24 at 09:46 +0100, Petr Tesarik wrote:
> Peter Zijlstra píše v Ne 23. 11. 2008 v 15:24 +0100:
> > On Fri, 2008-11-21 at 19:42 +0100, Petr Tesarik wrote:
> > 
> > > > > In any event, while this particular implementation may not be optimal,
> > > > > at least it's _right_.  Whatever happened to "make it right, then make
> > > > > it fast?"
> > > >
> > > > Well, I'm not thinking you did it right ;-)
> > > >
> > > > While I agree that the linear loop is sub-optimal, but it only really
> > > > becomes a problem when you have hundreds or thousands of threads in your
> > > > application, which I'll argue to be insane anyway.
> > > 
> > > This is just not true. I've seen a very real example of a lockup with a very 
> > > sane number of threads (one per CPU), but on a very large machine (1024 CPUs 
> > > IIRC). The application set per-process CPU profiling with an interval of 1 
> > > tick, which translates to 1024 timers firing off with each tick...
> > > 
> > > Well, yes, that was broken, too, but that's the way one quite popular FORTRAN 
> > > compiler works...
> > 
> > I'm not sure what side you're arguing...
> 
> In this particular case I'm arguing against both, it seems. The old
> behaviour is broken and the new one is not better. :(

OK, then we agree ;-)

> > The current (per-cpu) code is utterly broken on large machines too, I've
> > asked SGI to run some tests on real numa machines (something multi-brick
> > altix) and even moderately small machines with 256 cpus in them grind to
> > a halt (or make progress at a snails pace) when the itimer stuff is
> > enabled.
> > 
> > Furthermore, I really dislike the per-process-per-cpu memory cost, it
> > bloats applications and makes the new per-cpu alloc work rather more
> > difficult than it already is.
> > 
> > I basically think the whole process wide itimer stuff is broken by
> > design, there is no way to make it work on reasonably large machines,
> > the whole problem space just doesn't scale. You simply cannot maintain a
> > global count without bouncing cachelines like mad, so you might as well
> > accept it and do the process wide counter and bounce only a single line,
> > instead of bouncing a line per-cpu.
> 
> Very true. Unfortunately per-process itimers are prescribed by the
> Single Unix Specification, so we have to cope with them in some way,
> while not permitting a non-privileged process a DoS attack. This is
> going to be hard, and we'll probably have to twist the specification a
> bit to still conform to its wording. :((

Feel like reading the actual spec and trying to come up with a creative
interpretation? :-)

> I really don't think it's a good idea to set a per-process ITIMER_PROF
> to one timer tick on a large machine, but the kernel does allow any
> process to do it, and then it can even cause hard freeze on some
> hardware. This is _not_ acceptable.
> 
> What is worse, we can't just limit the granularity of itimers, because
> threads can come into being _after_ the itimer was set.

Currently it has jiffy granularity, right? And jiffies are different
depending on some compile time constant (HZ), so can't we, for the sake
of per-process itimers, pretend to have a 1 minute jiffie?

That should be as compliant as we are now, and utterly useless for
everybody, thereby discouraging its use, hmm? :-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/