lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 6 Mar 2008 14:57:39 -0800
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	mingo@...e.hu, g.liakhovetski@....de, rjw@...k.pl,
	linux-kernel@...r.kernel.org, torvalds@...ux-foundation.org,
	bunk@...nel.org, gregkh@...e.de
Subject: Re: 2.6.25-rc3-git3: Reported regressions from 2.6.24

On Thu, 6 Mar 2008 13:36:32 -0800
Andrew Morton <akpm@...ux-foundation.org> wrote:

> On Thu, 6 Mar 2008 21:59:51 +0100
> Ingo Molnar <mingo@...e.hu> wrote:
> 
> > 
> > * Andrew Morton <akpm@...ux-foundation.org> wrote:
> > 
> > > I'd love to poke around in kgdb (what does kthread_stop_info.k point 
> > > at?) but it seems that -mm's copy of kgdb got taken away when I wasn't 
> > > looking. Can I have it back please?
> > 
> > it's in the full x86.git or you can pick up the kgdb-light tree:
> > 
> >   http://people.redhat.com/mingo/kgdb-light.git/README
> > 
> 
> We'll see.
> 
> Meanwhile, further investigation show that cpu_callback() (the one in
> kernel/softlockup.c) is waiting on this thread:
> 
> watchdog/1    R  running task        0     8      2 task_struct:ffff81025f1089e0

Note the "/1".

>  ffff81025f10deb0 0000000000000046 0000000000000000 0000000000000246
>  ffff81025f10de20 ffff81025f1089e0 ffff81025f1080c0 ffff81025f108d30
>  000000015f10de50 00000000ffff2adf ffffffffffffffff ffffffffffffffff
> Call Trace:
>  [<ffffffff80263290>] ? watchdog+0x0/0x1dc
>  [<ffffffff802632d6>] watchdog+0x46/0x1dc
>  [<ffffffff80263290>] ? watchdog+0x0/0x1dc
>  [<ffffffff8024704d>] kthread+0x44/0x6b
>  [<ffffffff8020cd88>] child_rip+0xa/0x12
>  [<ffffffff80247009>] ? kthread+0x0/0x6b
>  [<ffffffff8020cd7e>] ? child_rip+0x0/0x12
> 
> kthread_stop_info.k=ffff81025f1089e0
> 
> (gdb) l *0xffffffff802632d6
> 0xffffffff802632d6 is in watchdog (kernel/softlockup.c:229).
> 224              */
> 225             while (!kthread_should_stop()) {
> 226                     touch_softlockup_watchdog();
> 227                     schedule();
> 228     
> 229                     if (kthread_should_stop())
> 230                             break;
> 231     
> 232                     if (this_cpu == check_cpu) {
> 233                             if (sysctl_hung_task_timeout_secs)
> 
> so this watchdog thread seems to be runnable, but not running.  What would
> cause this?  

At the start of the sysrq-T trace we have:

sd 1:0:0:0: [sdb] Stopping disk
sd 0:0:0:0: [sda] Synchronizing SCSI cache
sd 0:0:0:0: [sda] Stopping disk
ACPI: PCI interrupt for device 0000:05:00.1 disabled
ACPI: PCI interrupt for device 0000:05:00.0 disabled
ACPI: Preparing to enter system sleep state S5
Disabling non-boot CPUs ...
CPU 1 is now offline
SysRq : Show State
  task                        PC stack   pid father

So CPU 1 is offline.  But the comatose watchdog thread is pinned to CPU 1. 
Could this be related to the problem?  By what means is a task which is
pinned to a going-away CPU handled?  How is this guy supposed to ever run
again?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ