lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 20 Aug 2009 02:10:25 +0200 (CEST)
From:	Marton Balint <cus@...ekas.hu>
To:	Peter Zijlstra <peterz@...radead.org>
cc:	Andreas Mohr <andi@...as.de>, linux-kernel@...r.kernel.org,
	mingo@...e.hu
Subject: Re: CPU scheduler weirdness?


On Wed, 19 Aug 2009, Peter Zijlstra wrote:

> On Wed, 2009-08-19 at 14:34 +0200, Marton Balint wrote:
>>
>> On Wed, 19 Aug 2009, Peter Zijlstra wrote:
>>
>>> On Wed, 2009-08-19 at 14:01 +0200, Marton Balint wrote:
>>>> On Wed, 19 Aug 2009, Peter Zijlstra wrote:
>>>>> On Tue, 2009-08-18 at 21:49 +0200, Marton Balint wrote:
>>>>>
>>>>>> In the meantime, I was able to create a tiny C program which always
>>>>>> succesfully reproduces the bug. It's basically an endless loop which does
>>>>>> not stop while the process is running on the last CPU core. The program
>>>>>> creates multiple instances of itself, to be able to keep all of the CPU
>>>>>> cores busy. After 1 second, the processes running on other than the last
>>>>>> CPU core die, the processes running on the last CPU core remain stuck
>>>>>> there...
>>>>>>
>>>>>> I tested it on my dual core system, if someone could test it on a quad
>>>>>> core and report back that would probably be useful.
>>>>>>
>>>>>> Usage: ./schedtest <number of CPU cores>
>>>>>>
>>>>>> And don't forget to kill the stuck processes after using the program! :)
>>>>>
>>>>> So what's the bug? Sure one task will stay on the cpu, and because there
>>>>> is no contention it doesn't get migrated, and therefore won't quit,
>>>>> how's that a problem?
>>>>
>>>> Problem is that more than one processes remain on that CPU core, and none
>>>> of them get migrated to other (idle) cores. I tested it with my E8400
>>>> processor and 2.6.31-rc5-git3 kernel.
>>>
>>> Only one remains here.. on a c2q running 2.6.31-rc6-tip
>>>
>>> Do you have a .config handy?
>>>
>>
>> Yes it's in my original post:
>>
>> http://marc.info/?l=linux-kernel&m=125012584709800&w=2
>
> Right you are,.. so I build a kernel with the cgroup scheduler in and
> tested it on a dual-core opteron machine, but I can't seem to reproduce
> this.
>
> Are you using cgroups in any way, or do you simply have it enabled in
> your config?

No, it's just enabled. Actually the kernel is from the openSUSE build 
service:

http://download.opensuse.org/repositories/Kernel:/HEAD/openSUSE_11.1/x86_64/

But the problem is present for both the kernel-default kernel and the 
kernel-vanilla kernel which does not contain any suse-specific patches.

This evening I had a bit more time to test, and I've made a surprising 
discovery: I can only reproduce the bug if the kernel module of my TV 
tuner card is loaded. I have a Leadtek Winfast 2000 XP Expert TV card, it 
uses the cx8800 kernel module. It seems that the problem is somehow 
related to the infrared sensor of the TV card, because I recompiled the 
module with the 'case CX88_BOARD_WINFAST2000XP_EXPERT:' line removed from 
cx88-input.c and I couldn't reproduce the bug with the new kernel module.

Regards,
   Marton
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ