linux-kernel - Re: 2.6.27-rc6 xen soft lockup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <48C83546.7090206@xs4all.nl>
Date:	Wed, 10 Sep 2008 22:59:50 +0200
From:	Rambaldi <rambaldi@...all.nl>
To:	Jeremy Fitzhardinge <jeremy@...p.org>
CC:	linux-kernel <linux-kernel@...r.kernel.org>,
	Ingo Molnar <mingo@...e.hu>
Subject: Re: 2.6.27-rc6 xen soft lockup

Jeremy Fitzhardinge wrote:
> Rambaldi wrote:
>> The machine has two Intel(R) Xeon(R) E5420's so that gives a total of
>> 8 cpu's
>> During the time of the lockup the cpu load, as measured with cacti,
>> was about 4%
>> with a increase to 15% at the time the BUG was triggered. So I would
>> say mostly idle
>> but not very idle.
> 
> So that's the cpu load within the domain?  How about the overall system
> load?  What other domains are running?
> 
Yes I was talking about the load in the domU.
The overall load of the system was idle. Cacti, monitoring dom0, showed no signs of
network or disk activity. The cpu load for the dom0 was also idle.
The system hosts a total of 6 domU's. I have no cpu load measurements for the other domU's
but judging from the lack of disk and network activity and constant cpu temperature readings
I would say that there cpu load was also mostly idle.


>>> Did anything fail or misbehave?
>> No nothing failed or misbehaved (as far as I could tell)
>>
>> With dynticks I guess you mean: CONFIG_NO_HZ ; this option is not set.
> 
> (In general its a good idea to set it for virtual machines, to avoid
> spuriously scheduling vcpus.)
> 
Ok, I will change that.

>> I have attached my .config. I have also attached the output of
>> (date ; cat /proc/interrupts ; sleep 10 ; date ;  cat /proc/interrupts
>> )> /tmp/interrupts
>> to give an impression about the number of interrupts after 11:30 hours
>> of uptime.
> 
> Well, there were 1001 interrupts on cpu 1 in that interval, which shows
> that the timer interrupts are going at full rate on the idle cpu.
> 
> I'm a bit confused.  I'm not sure what would trigger a lockup at that
> point, unless it really stopped taking interrupts for a while. 
> Unfortunately the RIP and backtrace are not particularly helpful.  I'm
> assuming the message is spurious, and indicates some other kind of
> timekeeping bug.
> 
>> Any other info that you need?
> 
> Full dmesg output, for completeness.
> 
>     J
> 
dmesg output attached.

thanks for looking into it.

R


View attachment "dmesg" of type "text/plain" (11487 bytes)