[<prev] [next>] [day] [month] [year] [list]
Message-ID: <503C8762.5070607@itechnical.de>
Date: Tue, 28 Aug 2012 10:54:58 +0200
From: Heiko Nardmann <heiko.nardmann@...chnical.de>
To: linux-kernel@...r.kernel.org
Subject: Q: dlm_recoverd takes 100%
Hi together,
maybe someone can give me a hint which ML to contact (if I am wrong here)?
In a two-node cluster system I see 'dlm_recoverd' taking 100% time of
one cpu for around 6 minutes. Here is small excerpt from a 'top' output
during that period:
top - 10:51:01 up 3 days, 17:21, 5 users, load average: 10.19, 5.39, 2.76
Tasks: 536 total, 3 running, 533 sleeping, 0 stopped, 0 zombie
Cpu(s): 1.2%us, 6.6%sy, 0.0%ni, 92.1%id, 0.1%wa, 0.0%hi, 0.0%si,
0.0%st
Mem: 12183344k total, 11827540k used, 355804k free, 160332k buffers
Swap: 14417912k total, 0k used, 14417912k free, 8299364k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3121 root 20 0 0 0 0 R 100.0 0.0 3:36.15 dlm_recoverd
The cluster nodes use a shared SAN (GFS2). The second node has been
rebooted while I experience this behaviour. The real problem is that my
application is unable to open a file on the SAN for these 6 minutes.
After the reboot of the second node all is fine again and the
application succeeds in opening the file. So I am not sure what can
cause those two symptoms.
Thanks in advance for any hint!
Kind regards,
Heiko
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists