lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Mon, 7 Jul 2008 13:31:41 +0200
From:	"Dmitry Adamushko" <dmitry.adamushko@...il.com>
To:	miaox@...fujitsu.com
Cc:	"Lai Jiangshan" <laijs@...fujitsu.com>,
	"Ingo Molnar" <mingo@...e.hu>,
	"Heiko Carstens" <heiko.carstens@...ibm.com>,
	"Peter Zijlstra" <a.p.zijlstra@...llo.nl>,
	"Avi Kivity" <avi@...ranet.com>, linux-kernel@...r.kernel.org,
	"Andrew Morton" <akpm@...ux-foundation.org>
Subject: Re: [BUG] CFS vs cpu hotplug

2008/7/7 Miao Xie <miaox@...fujitsu.com>:
> on 3:59 Lai Jiangshan wrote:
>> Dmitry Adamushko wrote:
>>>
>>> [ ... ]
>>>
>>> We should see then all tasks that have been migrated (or failed to be
>>> migrated) during migration_call(CPU_DEAD, ...).
>>>
>> Thank you. I'll test it again with your debugging patch applied
>> and get more info.
>
> I tested it with Dmitry's patch, and found that all the tasks on the offline
> cpu were migrated to an online cpu by migrate_live_tasks() in migration_call().
> But some tasks(such as klogd and so on)was moved back to the offline cpu
> immediately before BUG_ON(rq->nr_running != 0) checking, even before acquiring
> rq's lock.
>
>        static int __cpuinit
>        migration_call(struct notifier_block *nfb, unsigned long action, void *
>        {
>                ...
>                switch (action) {
>                ...
>                case CPU_DEAD:
>                case CPU_DEAD_FROZEN:
>                        cpuset_lock();
>                        migrate_live_tasks(cpu);
>                        rq = cpu_rq(cpu);
>                        ...
>                        spin_lock_irq(&rq->lock);
>                        ...
>                        migrate_dead_tasks(cpu);
>                        spin_unlock_irq(&rq->lock);
>                        cpuset_unlock();
>                        migrate_nr_uninterruptible(rq);
>                        BUG_ON(rq->nr_running != 0);
>                        ...
>                        break;
>                }
>                ...
>        }
>
> By debuging, I found this bug was caused by select_task_rq_fair().

Thanks for tracking this down!


> After migrating the tasks on the offline cpu to an online cpu, the kernel would
> wake up these migrated tasks quickly by try_to_wake_up(). try_to_wake_up() would
> invoke select_task_rq_fair() to find a lower-load cpu in sched domains for them.
> But the sched domains weren't updated and the offline cpu was still in the sched
> domains.

Hmm... if so, then this should be fixed, not select_task_rq_fair(). I
don't think this is expected behavior.


> So select_task_rq_fair() might return the offline cpu's id, then the
> bug occurred.
>
> I fix the bug just by checking the select_task_rq_fair()'s return value in
> try_to_wake_up().
>
> [ ... ]


-- 
Best regards,
Dmitry Adamushko
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ