lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <ba7d8f720804201121h6c024dcrabd301c8b4be7a4@mail.gmail.com>
Date:	Sun, 20 Apr 2008 14:21:39 -0400
From:	"Dan Upton" <upton.dan.linux@...il.com>
To:	linux-kernel@...r.kernel.org
Subject: migration thread and active_load_balance

Back again with more questions about the scheduler, as I've spent two
or three days trying to debug on my own and I'm just not getting
anywhere.

Basically, I'm trying to add a new active balancing mechanism.  I made
out a diagram of how migration_thread  calls active_load_balance and
so on, and I use a flag (set by writing to a file in sysfs) to
determine whether to use the standard iterator for the CFS runqueue or
a different iterator I wrote.  The new iterator seems to work fine, as
I've been using it (again, with a flag) to replace the regular
iterator when it's called from schedule by idle_balance.  I basically
tried adding an extra conditional in migration_thread that sets up
some state and then calls active_load_balance, but I was getting
deadlocks.  I'm not really sure why, since all I've really changed is
add a few variables to struct rq and struct cfs_rq.

I tried only doing my state setup and restore in that conditional,
without actually calling active_load_balance, which has given me an
even more frustrating result--the kernel does not deadlock, but it
does seem to crash in such a manner as to require a hard reset of the
machine.  (For instance, at one point I got an "invalid page state in
process 'init'" message from the kernel; if I try to reboot from Gnome
though it hangs.)  I don't understand this at all, since as far as I
can tell I'm using thread-local variables and really all I'm doing
right now is assignments to them.  Unless, of course the struct rq
(from rq = cpu_rq(cpu);) could be being manipulated elsewhere, leading
to some sort of race condition...

Anyway, like I said, I've spent several days trying to understand this
error by putting in printk()s galore and doing traces through the
source code to figure out the call chain, but I'm really stuck here.
Can anybody shed some light, or point me to some more thorough
documentation on the scheduler and active load balancing?

Thanks,
-dan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ