linux-kernel - Re: [BUG] long freezes on thinkpad t60

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20070618012055.81a7c837.akpm@linux-foundation.org>
Date:	Mon, 18 Jun 2007 01:20:55 -0700
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	Ingo Molnar <mingo@...e.hu>,
	Ravikiran G Thirumalai <kiran@...lex86.org>
Cc:	Miklos Szeredi <miklos@...redi.hu>, cebbert@...hat.com,
	chris@...ee.ca, linux-kernel@...r.kernel.org, tglx@...utronix.de,
	torvalds@...ux-foundation.org
Subject: Re: [BUG] long freezes on thinkpad t60

On Mon, 18 Jun 2007 10:12:04 +0200 Ingo Molnar <mingo@...e.hu> wrote:

> ---------------------------------------------------->
> Subject: [patch] x86: fix spin-loop starvation bug
> From: Ingo Molnar <mingo@...e.hu>
> 
> Miklos Szeredi reported very long pauses (several seconds, sometimes 
> more) on his T60 (with a Core2Duo) which he managed to track down to 
> wait_task_inactive()'s open-coded busy-loop. He observed that an 
> interrupt on one core tries to acquire the runqueue-lock but does not 
> succeed in doing so for a very long time - while wait_task_inactive() on 
> the other core loops waiting for the first core to deschedule a task 
> (which it wont do while spinning in an interrupt handler).
> 
> The problem is: both the spin_lock() code and the wait_task_inactive() 
> loop uses cpu_relax()/rep_nop(), so in theory the CPU should have 
> guaranteed MESI-fairness to the two cores - but that didnt happen: one 
> of the cores was able to monopolize the cacheline that holds the 
> runqueue lock, for extended periods of time.
> 
> This patch changes the spin-loop to assert an atomic op after every REP 
> NOP instance - this will cause the CPU to express its "MESI interest" in 
> that cacheline after every REP NOP.

Kiran, if you're still able to reproduce that zone->lru_lock starvation problem,
this would be a good one to try...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/