lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20070724071320.GA12169@linux.vnet.ibm.com>
Date:	Tue, 24 Jul 2007 12:43:20 +0530
From:	Srivatsa Vaddagiri <vatsa@...ux.vnet.ibm.com>
To:	Dhaval Giani <dhaval@...ux.vnet.ibm.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Balbir Singh <balbir@...ibm.com>, linux-kernel@...r.kernel.org,
	Ingo Molnar <mingo@...e.hu>
Subject: Re: System hangs on running kernbench

On Wed, Jul 18, 2007 at 01:26:48PM +0530, Dhaval Giani wrote:
> Hi Andrew,
> 
> I was running kernbench on top of 2.6.22-rc6-mm1 and I got a Hangcheck
> alert (This is when kernbench reached make -j).
> 
> Also make -j is hanging.

[refer http://marc.info/?l=linux-kernel&m=118474574807055 for complete
report of this bug]

Ingo,
	Dhaval tracked the root cause of this problem to be in cfs (btw
cfs patches weren't git-bisect safe).

Basically, "make -s -j" workload hanged the machine, leading to lot of
OOM killings. This was on a 8-cpu machine with no swap space configured and
4GB RAM. The same workload works "fine" (runs to completion) on 2.6.22.

I played with the scheduler tunables a bit and found that the problem
goes away if I set sched_granularity_ns to 100ms (default value 32ms).

So my theory is this: 32ms preemption granularity is too low value for
any compile thread to make "usefull" progress. As a result of this rapid
context switch, job retiral rate slows down compared to job arrival
rate. This builds up job pressure on the system very quickly (than may
have happened with 100ms default granularity_ns or 2.6.22 kernel),
leading to OOM killings (and hang).

>From a user perspective, who is running with default granularity_ns
value, this may be seen as a regression.

Perhaps, these new tunables in cfs are something for users to become used to
and tune it to approp setting for their system. It would have been
nice for kernel to auto-tune the settings based on workload, but I guess
that's harder.

--
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ