lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110516062930.GA24836@elte.hu>
Date:	Mon, 16 May 2011 08:29:30 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	James Bottomley <James.Bottomley@...senPartnership.com>
Cc:	x86@...nel.org, linux-mm <linux-mm@...ck.org>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	Mel Gorman <mgorman@...e.de>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Mike Galbraith <efault@....de>,
	Thomas Gleixner <tglx@...utronix.de>,
	"H. Peter Anvin" <hpa@...or.com>
Subject: Re: Possible sandybridge livelock issue


* James Bottomley <James.Bottomley@...senPartnership.com> wrote:

> We've just come off a large round of debugging a kswapd problem over on
> linux-mm:
> 
> http://marc.info/?t=130392066000001
> 
> The upshot was that kswapd wasn't being allowed to sleep (which we're
> now fixing).  However, in spite of intensive efforts, the actual hang
> was only reproducible on sandybridge laptops.
> 
> When the hang occurred, kswapd basically pegged one core in 100% system
> time.  This looks like there's something specific to sandybridge that
> causes this type of bad interaction.  I was wondering if it could be
> something to to with a scheduling problem in turbo mode?  Once kswapd
> goes flat out, the core its on will kick into turbo mode, which causes
> it to get preferentially scheduled there, leading to the live lock.

There's no explicit 'schedule Sandybridge differently' logic in the scheduler.

Thus turbo mode can only affect scheduling by executing code faster. Executing 
faster *does* mean more scheduling on that CPU: it's faster to do work so it's 
faster back to idle again.

I.e. i can see Sandybridge being special only due to timing and performance 
differences.

> The only evidence I have to support this theory is that when I reproduce the 
> problem with PREEMPT, the core pegs at 100% system time and stays there even 
> if I turn off the load.  However, if I can execute work that causes kswapd to 
> be kicked off the core it's running on, it will calm back down and go to 
> sleep.

At first sight this looks like some sort of kswapd problem: if you put kswapd 
into TASK_*INTERRUPTIBLE and schedule() it then the scheduler won't keep it 
running, on Sandybridge or elsewhere. The scheduler can't magically make kswapd 
runnable unless there's some big bug in it. So you first need to examine why 
kswapd never schedules to idle.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ