linux-kernel - Re: [BUG] sched: leaf_cfs_rq

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160310125417.GW6344@twins.programming.kicks-ass.net>
Date:	Thu, 10 Mar 2016 13:54:17 +0100
From:	Peter Zijlstra <peterz@...radead.org>
To:	Niklas Cassel <niklas.cassel@...s.com>
Cc:	tj@...nel.org,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [BUG] sched: leaf_cfs_rq_list use after free

On Fri, Mar 04, 2016 at 11:41:17AM +0100, Niklas Cassel wrote:

> A snippet of the trace_printks I've added when analyzing the problem.
> The prints show that a certain cfs_rq gets readded after it has been removed,
> and that update_blocked_averages uses the cfs_rq which has already been freed:
> 
>          systemd-1     [000]    22.664453: bprint:               alloc_fair_sched_group: allocated cfs_rq 0x8efb0780 tg 0x8efb1800 tg->css.id 0
>          systemd-1     [000]    22.664479: bprint:               alloc_fair_sched_group: allocated cfs_rq 0x8efb1680 tg 0x8efb1800 tg->css.id 0
>          systemd-1     [000]    22.664481: bprint:               cpu_cgroup_css_alloc: tg 0x8efb1800 tg->css.id 0
>          systemd-1     [000]    22.664547: bprint:               cpu_cgroup_css_online: tg 0x8efb1800 tg->css.id 80
>          systemd-874   [001]    27.389000: bprint:               list_add_leaf_cfs_rq: cfs_rq 0x8efb1680 cpu 1 on_list 0x0
>     migrate_cert-820   [001]    27.421337: bprint:               update_blocked_averages: cfs_rq 0x8efb1680 cpu 1 on_list 0x1

>      kworker/0:1-24    [000]    27.421356: bprint:               cpu_cgroup_css_offline: tg 0x8efb1800 tg->css.id 80

So we take the cgroup offline

>      kworker/0:1-24    [000]    27.421445: bprint:               list_del_leaf_cfs_rq: cfs_rq 0x8efb1680 cpu 1 on_list 0x1

Remove our cfs_rq from the list

>     migrate_cert-820   [001]    27.421506: bprint:               list_add_leaf_cfs_rq: cfs_rq 0x8efb1680 cpu 1 on_list 0x0

And stuff it back on again -> *FAIL*

>    system-status-815   [001]    27.491358: bprint:               update_blocked_averages: cfs_rq 0x8efb1680 cpu 1 on_list 0x1
>      kworker/0:1-24    [000]    27.501561: bprint:               cpu_cgroup_css_free: tg 0x8efb1800 tg->css.id 80
>     migrate_cert-820   [001]    27.511337: bprint:               update_blocked_averages: cfs_rq 0x8efb1680 cpu 1 on_list 0x1
>      ksoftirqd/0-3     [000]    27.521830: bprint:               free_fair_sched_group: freeing cfs_rq 0x8efb0780 tg 0x8efb1800 tg->css.id 80
>      ksoftirqd/0-3     [000]    27.521857: bprint:               free_fair_sched_group: freeing cfs_rq 0x8efb1680 tg 0x8efb1800 tg->css.id 80
>           logger-1252  [001]    27.531355: bprint:               update_blocked_averages: cfs_rq 0x8efb1680 cpu 1 on_list 0x6b6b6b6b
> 
> 
> I've reproduced this on v4.4, but I've also managed to reproduce the bug
> after cherry-picking the following patches
> (all but one were marked for v4.4 stable):
> 
> 6fe1f34 sched/cgroup: Fix cgroup entity load tracking tear-down
> d6e022f workqueue: handle NUMA_NO_NODE for unbound pool_workqueue lookup
> 041bd12 Revert "workqueue: make sure delayed work run in local cpu"
> 8bb5ef7 cgroup: make sure a parent css isn't freed before its children
> aa226ff cgroup: make sure a parent css isn't offlined before its children
> e93ad19 cpuset: make mm migration asynchronous

Hmm, that is most unfortunate indeed.

Can you describe a reliable reproducer?

So we only call list_add_leaf_cfs_rq() through enqueue_task_fair(),
which means someone is still running inside that cgroup.

TJ, I thought we only call offline when the cgroup is empty, don't we?