linux-kernel - Re: [BUG] long freezes on thinkpad t60

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <467303C9.9030706@redhat.com>
Date:	Fri, 15 Jun 2007 17:25:29 -0400
From:	Chuck Ebbert <cebbert@...hat.com>
To:	Miklos Szeredi <miklos@...redi.hu>
CC:	mingo@...e.hu, chris@...ee.ca, linux-kernel@...r.kernel.org,
	tglx@...utronix.de
Subject: Re: [BUG] long freezes on thinkpad t60

On 06/14/2007 12:04 PM, Miklos Szeredi wrote:
> I've got some more info about this bug.  It is gathered with
> nmi_watchdog=2 and a modified nmi_watchdog_tick(), which instead of
> calling die_nmi() just prints a line and calls show_registers().
> 
> This makes the machine actually survive the NMI tracing.  The attached
> traces are gathered over about an hour of stressing.  An mp3 player is
> also going on continually, and I can hear a couple of seconds of
> "looping" quite often, but it gets as far as the NMI trace only
> rarely.  AFAICS only the last pair shows a trace for both CPUs during
> the same "freeze".
> 
> I've put some effort into understanding what's going on, but I'm not
> familiar with how interrupts work and that sort of thing.
> 
> The pattern that emerges is that on CPU0 we have an interrupt, which
> is trying to acquire the rq lock, but can't.
> 
> On CPU1 we have strace which is doing wait_task_inactive(), which sort
> of spins acquiring and releasing the rq lock.  I've checked some of
> the traces and it is just before acquiring the rq lock, or just after
> releasing it, but is not actually holding it.
> 
> So is it possible that wait_task_inactive() could be starving the
> other waiters of the rq spinlock?  Any ideas?

Spinlocks aren't fair, so this kind of problem is always a possibility.
I think maybe we need another kind of unlock that gives another processor
a fair chance at the lock. Some things you could try to see if they help:

- add smp_mb() after the unlock
- replace cpu_relax() with usleep()
- use an xchcg instruction to do the unlock, like i386 does when
  CONFIG_X86_OOSTORE is set

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/