linux-kernel - nfsroot on multiple-NIC serial-over-LAN system -> deadlock?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <87k54cu8z5.fsf@hades.wkstn.nix>
Date:	Tue, 19 May 2009 21:35:26 +0100
From:	Nix <nix@...eri.org.uk>
To:	linux-kernel@...r.kernel.org
Cc:	linux-net@...r.kernel.org
Subject: nfsroot on multiple-NIC serial-over-LAN system -> deadlock?

I'm using 2.6.30rc (git head as of yesterday) and getting a bunch of
machines bootstrapped from the bare metal via PXE/pxelinux/nfsroot.
nfsroot plainly *works*, as I've got several machines booting happily.

But then I come to a machine with multiple NICs and IPMI, and things
fall over. I have to manually specify the NIC to use or it goes into a
DHCP-probing deadlock (cause undiagnosed but it looks identical to this
one so may be identical): but if I give the NIC info by hand, I *still*
see a deadlock:

[   89.613880] IP-Config: Complete:
[   89.616943]      device=eth0, addr=192.168.14.15, mask=255.255.255.0, gw=192.168.14.1,
[   89.624921]      host=spindle, domain=, nis-domain=(none),
[   89.630430]      bootserver=192.168.14.18, rootserver=192.168.14.18, rootpath=
[   90.333195] e1000e: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX
[   90.340668] 0000:03:00.0: eth0: 10/100 speed: disabling TSO
[  325.182384] INFO: task swapper:1 blocked for more than 120 seconds.
[  325.188653] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  325.196473] swapper       D 00000014     0     1      0
[  325.201766]  f7061eec 00000046 dd66aa4a 00000014 00000000 00000000 00000000 c05d1480
[  325.209749]  c05d1480 00000000 00000000 f705ec40 f705eed4 c2805480 00000000 ded7f8e3
[  325.217743]  00000014 00000000 c0548160 00000000 00000000 00000000 00000000 f705eed4
[  325.225742] Call Trace:
[  325.228202]  [<c0408ebc>] schedule+0x8/0x17
[  325.232391]  [<c0408fa6>] schedule_timeout+0x17/0x164
[  325.237454]  [<c01346d1>] ? __wake_up+0x31/0x3b
[  325.241987]  [<c040844e>] wait_for_common+0xaa/0xfc
[  325.246872]  [<c013ae99>] ? default_wake_function+0x0/0xd
[  325.252271]  [<c0408512>] wait_for_completion+0x12/0x14
[  325.257498]  [<c014d003>] flush_cpu_workqueue+0x59/0x62
[  325.262720]  [<c014ced7>] ? wq_barrier_func+0x0/0xd
[  325.267605]  [<c014d177>] flush_workqueue+0x2b/0x49
[  325.272485]  [<c014d1a2>] flush_scheduled_work+0xd/0xf
[  325.277626]  [<c0585578>] kernel_init+0x10e/0x152
[  325.282340]  [<c058546a>] ? kernel_init+0x0/0x152
[  325.287045]  [<c011d8cf>] kernel_thread_helper+0x7/0x10

Its cause is unclear. I'd expect to see something like

 Looking up port of RPC 100003/2 on 192.168.14.18
 Looking up port of RPC 100005/1 on 192.168.14.18
 VFS: Mounted root (nfs filesystem) readonly on device 0:15.

at this point, but I don't. Just dead silence.

The boot parameters were:

 root=/dev/nfs ip=192.168.14.15:192.168.14.18:192.168.14.1:255.255.255.0:spindle:eth0:off nfsroot=/mnt/spindle-root console=ttyS0,115200

(IP addresses are definitely correct, and the interface name is
apparently correct because we can see it bring the link up in the kernel
messages: if I use the other interface name, there's no such chatter in
the log.)

(I'm using IPMI and a serial console, with the console redirected by
IPMI over the same NIC: but as this uses a distinct MAC --- hell, a
distinct processor --- it surely can't interfere. Can it?)

Any ideas? Am I missing something obvious? (Probably.)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/