linux-kernel - Re: migration thread and active_load

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <ba7d8f720804221219wc1f04ecp3df0ec627dbc6c8e@mail.gmail.com>
Date:	Tue, 22 Apr 2008 15:19:33 -0400
From:	"Dan Upton" <upton.dan.linux@...il.com>
To:	"Dmitry Adamushko" <dmitry.adamushko@...il.com>
Cc:	linux-kernel@...r.kernel.org
Subject: Re: migration thread and active_load_balance

On Mon, Apr 21, 2008 at 4:39 PM, Dmitry Adamushko
<dmitry.adamushko@...il.com> wrote:
> On 21/04/2008, Dan Upton <upton.dan.linux@...il.com> wrote:
>  > On Mon, Apr 21, 2008 at 7:03 AM, Dmitry Adamushko
>  >
>  > <dmitry.adamushko@...il.com> wrote:
>  >
>  > > On 21/04/2008, Dan Upton <upton.dan.linux@...il.com> wrote:
>  >  >  > [ ... ]
>  >  >
>  >  > >
>  >  >  >  kernel BUG at kernel/sched.c:2103
>  >  >
>  >  >  and what's this line in your patched sched.c?
>  >  >
>  >  >  is it -- BUG_ON(!irqs_disabled());  ?
>  >  >
>  >  >  anything in your unposted code (e.g. find_coolest_cpu()) that might
>  >  >  re-enable the interrupts before __migration_task() is called?
>  >  >
>  >  >  If you post your modifications as a patch
>  >  >  (Documentation/applying-patches.txt) that contains _all_ relevant
>  >  >  modifications, it'd be easier to guess what's wrong.
>  >
>  >
>  > Yes, that's the line.  I don't recall ever reenabling interrupts,
>
>  migration_thread() -> find_coolest_cpu() -> get_temperature() ->
>  rdmsr_on_cpu() -> [ if your configuration is SMP ] ->
>  smp_call_function_single() ->
>
>  (arch/x86/kernel/smpcommon.c)
>  ...
>         if (cpu == me) {
>                 local_irq_disable();
>                 func(info);
>                 local_irq_enable();   <----------- REENABLES the interrupts
>                 put_cpu();
>                 return 0;
>         }
>  ...
>
>  as a result, __migrate_task() -> double_rq_lock() -> BUG_ON(!irqs_disabled())
>  gives you an "oops".
>

Ah, how about that.  Thanks, I at least fixed the oops by caching
return values from get_temperature() and then using those instead of
calling rdmsr_on_cpu when calling from migration_thread().  Everything
works up to the point of uncommenting the new call to
active_load_balance, which again yields a deadlock.  (Man, I love
working in the scheduler...) Anyway, I'll keep trying to debug that on
my own again, but did anybody notice anything I'm doing that might
lead to deadlock?

-dan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/