[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1227516403.4487.20.camel@nathan.suse.cz>
Date: Mon, 24 Nov 2008 09:46:43 +0100
From: Petr Tesarik <ptesarik@...e.cz>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Frank Mayhar <fmayhar@...gle.com>,
Christoph Lameter <cl@...ux-foundation.org>,
Doug Chapman <doug.chapman@...com>, mingo@...e.hu,
roland@...hat.com, adobriyan@...il.com, akpm@...ux-foundation.org,
linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: regression introduced by - timers: fix itimer/many thread hang
Peter Zijlstra píše v Ne 23. 11. 2008 v 15:24 +0100:
> On Fri, 2008-11-21 at 19:42 +0100, Petr Tesarik wrote:
>
> > > > In any event, while this particular implementation may not be optimal,
> > > > at least it's _right_. Whatever happened to "make it right, then make
> > > > it fast?"
> > >
> > > Well, I'm not thinking you did it right ;-)
> > >
> > > While I agree that the linear loop is sub-optimal, but it only really
> > > becomes a problem when you have hundreds or thousands of threads in your
> > > application, which I'll argue to be insane anyway.
> >
> > This is just not true. I've seen a very real example of a lockup with a very
> > sane number of threads (one per CPU), but on a very large machine (1024 CPUs
> > IIRC). The application set per-process CPU profiling with an interval of 1
> > tick, which translates to 1024 timers firing off with each tick...
> >
> > Well, yes, that was broken, too, but that's the way one quite popular FORTRAN
> > compiler works...
>
> I'm not sure what side you're arguing...
In this particular case I'm arguing against both, it seems. The old
behaviour is broken and the new one is not better. :(
> The current (per-cpu) code is utterly broken on large machines too, I've
> asked SGI to run some tests on real numa machines (something multi-brick
> altix) and even moderately small machines with 256 cpus in them grind to
> a halt (or make progress at a snails pace) when the itimer stuff is
> enabled.
>
> Furthermore, I really dislike the per-process-per-cpu memory cost, it
> bloats applications and makes the new per-cpu alloc work rather more
> difficult than it already is.
>
> I basically think the whole process wide itimer stuff is broken by
> design, there is no way to make it work on reasonably large machines,
> the whole problem space just doesn't scale. You simply cannot maintain a
> global count without bouncing cachelines like mad, so you might as well
> accept it and do the process wide counter and bounce only a single line,
> instead of bouncing a line per-cpu.
Very true. Unfortunately per-process itimers are prescribed by the
Single Unix Specification, so we have to cope with them in some way,
while not permitting a non-privileged process a DoS attack. This is
going to be hard, and we'll probably have to twist the specification a
bit to still conform to its wording. :((
I really don't think it's a good idea to set a per-process ITIMER_PROF
to one timer tick on a large machine, but the kernel does allow any
process to do it, and then it can even cause hard freeze on some
hardware. This is _not_ acceptable.
What is worse, we can't just limit the granularity of itimers, because
threads can come into being _after_ the itimer was set.
> Furthermore, I stand by my claims that anything that runs more than a
> hand-full of threads per physical core is utterly braindead and deserves
> all the pain it can get. (Yes, I'm a firm believer in state machines and
> don't think just throwing threads at a problem is a sane solution).
Yes, anything with many threads per-core is badly designed. My point is
that it's not the only broken case.
Petr Tesarik
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists