lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LNX.2.00.0909040933540.7248@cinke.fazekas.hu>
Date:	Fri, 4 Sep 2009 09:53:51 +0200 (CEST)
From:	Marton Balint <cus@...ekas.hu>
To:	Mike Galbraith <efault@....de>
cc:	Ingo Molnar <mingo@...e.hu>, Peter Zijlstra <peterz@...radead.org>,
	Andreas Mohr <andi@...as.de>, linux-kernel@...r.kernel.org
Subject: Re: CPU scheduler weirdness?



On Fri, 4 Sep 2009, Mike Galbraith wrote:

> On Thu, 2009-09-03 at 23:57 +0200, Marton Balint wrote:
>
>>>> In the meantime, I updated my original C program and also created a kernel
>>>> module (schedtest_mod.c) which causes the same scheduling problems as the
>>>> kernel module of my TV card. The kernel module is a skeleton of the
>>>> infrared sensor polling code in cx88-input.c. It uses
>>>> schedule_delayed_work, this seems to cause the problem. The C program
>>>> (schedtest.c) is also updated, it now detects the number of CPU cores, from
>>>> now, what you can set as a command line parameter is the CPU core number,
>>>> on which the schedtest processes will not quit. (previously this was always
>>>> the last core).
>>>>
>>>> So to reproduce the bug on a dual core system, compile and insert the
>>>> kernel module (schedtest_mod.c). Then check dmesg, it should contain on
>>>> which CPU core is the delayed_work running. You should use the CPU core id
>>>> of the _other_ CPU core as a command line parameter to the updated
>>>> schedtest program.
>>>>
>>>> And by the way, thank you guys for the help so far, hopefully we'll get to
>>>> the bottom of this :)
>>>
>>> I reproduced the bug with the previously provided kernel module and C program
>>> on a different computer (it's a laptop with a core2 duo P8400 CPU), and also
>>> bisected the bug to this commit:
>>>
>>> sched: fine-tune SD_MC_INIT:
>>> 14800984706bf6936bbec5187f736e928be5c218
>>>
>>> If I add again the removed SD_BALANCE_NEWIDLE to flags, then everything works
>>> as expected. So what would be the correct fix for this bug? Revert the patch?
>>> Or just add SD_BALANCE_NEWIDLE to flags?
>
> Or, figure out what's going weird with that module loaded.

The problem is most likely caused by scheduled_delayed_work, a work 
function is called every time a CPU wakes up.

>> Ingo, Peter, could any of you guys have a look at the commit that caused
>> this bug? Is it OK to revert it? Or a fix somewhere else is necessary? I'm
>> pushing this because I hope that this bug will get fixed in the upcoming
>> stable kernel...
>
> Where does your schedtest.c and schedtest_mod.c live?

They were attached to one of my previous mails, i'm inlining them here to 
make the discussion easier. Thanks for looking into this.

Regards,
   Marton


schedtest_mod.c
-------------------
#include <linux/module.h>
#include <linux/init.h>
#include <linux/workqueue.h>
#include <asm/smp.h>

static int i;
static struct delayed_work d_work;

static void schedtest_work(struct work_struct *work)
{
 	schedule_delayed_work(&d_work, msecs_to_jiffies(1));
 	if (i++ % 500 == 0) {
 		printk(KERN_DEBUG "schedtest: I am on CPU %d.\n", get_cpu());
 		put_cpu();
 	}
}

static int __init schedtest_init_module(void)
{
 	INIT_DELAYED_WORK(&d_work, schedtest_work);
 	schedule_delayed_work(&d_work, 0);
 	return 0;
}

static void __exit schedtest_cleanup_module(void)
{
 	cancel_delayed_work_sync(&d_work);
}

module_init(schedtest_init_module);
module_exit(schedtest_cleanup_module);

MODULE_LICENSE("GPL");



schedtest.c:
--------------------

#define _GNU_SOURCE
#include <utmpx.h>
#include <sys/time.h>
#include <unistd.h>

/* Usage: ./schedtest <cpu core to test> */

int miliseconds() {
   struct timeval tv;
   gettimeofday(&tv, 0);
   return tv.tv_usec/1000;
}

int main(int argc, char *argv[]) {
   int lives = 1000, time, lasttime, childs, cores, core_to_test;
   cores = sysconf(_SC_NPROCESSORS_ONLN);
   childs = cores * 2;
   if (argc > 1)
     core_to_test = atoi(argv[1]);
   else
     core_to_test = cores-1;
   while (childs-- && !fork());
   while (lives) {
     time = miliseconds();
     if (lasttime != time && sched_getcpu() != core_to_test)
        lives--;
     lasttime = time;
   }
   return 0;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ