lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1204314904.4850.23.camel@peace.smo.corp.google.com>
Date:	Fri, 29 Feb 2008 11:55:04 -0800
From:	Frank Mayhar <fmayhar@...gle.com>
To:	parag.warudkar@...il.com
Cc:	Alejandro Riveira Fernández 
	<ariveira@...il.com>, Andrew Morton <akpm@...ux-foundation.org>,
	bugme-daemon@...zilla.kernel.org, linux-kernel@...r.kernel.org,
	Ingo Molnar <mingo@...e.hu>,
	Thomas Gleixner <tglx@...utronix.de>,
	Roland McGrath <roland@...hat.com>,
	Jakub Jelinek <jakub@...hat.com>
Subject: Re: [Bugme-new] [Bug 9906] New: Weird hang with NPTL and SIGPROF.

On Thu, 2008-02-07 at 11:53 -0500, Parag Warudkar wrote:
> On Thu, 7 Feb 2008, Parag Warudkar wrote:
> > Yep. I will enable PREEMPT and see if it reproduces for me.
> 
> Not reproducible with PREEMPT either. 

Okay, here's an analysis of the problem and a potential solution.  I
mentioned this in the bug itself but I'll repeat it here:

A couple of us here have been investigating this thing and have
concluded that the problem lies in the implementation of
run_posix_cpu_timers() and specifically in the quadratic nature of the
implementation.  It calls check_process_timers() to sum the
utime/stime/sched_time (in 2.6.18.5, under another name in 2.6.24+) of
all threads in the thread group.  This means that runtime there grows
with the number of threads.  It can go through the list _again_ if and
when it decides to rebalance expiry times.

After thinking through it, it seems clear that the critical number of
threads is that in which run_posix_cpu_timers() takes as long as or
longer than a tick to get its work done.  The system makes progress to
that point but after that everything goes to hell as it gets further and
further behind.  This explains all the symptoms we've seen, including
seeing run_posix_cpu_timers() at the top of a bunch of profiling stats
(I saw it get more than a third of overall processing time on a bunch of
tests, even where the system _didn't_ hang!).  It explains the fact that
things get slow right before they go to hell and it explains why under
certain conditions the system can recover (if the threads have started
exiting by the time it hangs, for example).

I've come up with a potential fix for the problem.  It does two things.
First, rather than summing the utime/stime/sched_time at interrupt it
adds all of those times to a new task_struct field on the group leader
then at interrupt just consults those fields; this avoids repeatedly
blowing the cache as well as a loop across all the threads.

Second, if there are more than 1000 threads in the process (as noted in
task->signal->live), it just punts all of the processing to a workqueue.

With these changes I've gone from a hang at 4500 (or fewer) threads to
running out of resources at more than 32000 threads on a single-CPU box.
When I've finished testing I'll polish the patch a bit and submit it to
the LKML but I thought you guys might want to know the state of things.

Oh, and one more note:  This bug is also dependent on HZ, since it
matters how long a tick is.  I've been running with HZ=1000.  A faster
machine or one with HZ=100 would potentially need to generate a _lot_
more threads to see the hang.
-- 
Frank Mayhar <fmayhar@...gle.com>
Google, Inc.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ