lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YjrWOrd4Ze3/6sl2@hirez.programming.kicks-ass.net>
Date:   Wed, 23 Mar 2022 09:11:38 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     Chengming Zhou <zhouchengming@...edance.com>
Cc:     mingo@...hat.com, acme@...nel.org, mark.rutland@....com,
        alexander.shishkin@...ux.intel.com, jolsa@...nel.org,
        namhyung@...nel.org, eranian@...gle.com,
        linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org,
        duanxiongchun@...edance.com, songmuchun@...edance.com
Subject: Re: [External] Re: [PATCH v2 1/6] perf/core: Fix incosistency
 between cgroup sched_out and sched_in

On Tue, Mar 22, 2022 at 11:28:41PM +0800, Chengming Zhou wrote:
> On 2022/3/22 11:16 下午, Chengming Zhou wrote:
> > Hi peter,
> > 
> > On 2022/3/22 10:54 下午, Peter Zijlstra wrote:
> >> On Tue, Mar 22, 2022 at 09:38:21PM +0800, Chengming Zhou wrote:
> >>> On 2022/3/22 8:59 下午, Peter Zijlstra wrote:
> >>>> On Tue, Mar 22, 2022 at 08:08:29PM +0800, Chengming Zhou wrote:
> >>>>> There is a race problem that can trigger WARN_ON_ONCE(cpuctx->cgrp)
> >>>>> in perf_cgroup_switch().
> >>>>>
> >>>>> CPU1					CPU2
> >>>>> (in context_switch)			(attach running task)
> >>>>> perf_cgroup_sched_out(prev, next)
> >>>>> 	cgrp1 == cgrp2 is True
> >>>>> 					next->cgroups = cgrp3
> >>>>> 					perf_cgroup_attach()
> >>>>> perf_cgroup_sched_in(prev, next)
> >>>>> 	cgrp1 == cgrp3 is False
> 
> I see, you must have been misled by my wrong drawing above ;-)
> I'm sorry, perf_cgroup_attach() on the right should be put at the bottom.
> 
> CPU1						CPU2
> (in context_switch)				(attach running task)
> perf_cgroup_sched_out(prev, next)
> 	cgrp1 == cgrp2 is True
> 						next->cgroups = cgrp3
> perf_cgroup_sched_in(prev, next)
> 	cgrp1 == cgrp3 is False
> 						__perf_cgroup_move()
> 

Ohhhh, you're taking about CPU2 running cgroup_migrate_execute()...
clear as mud this :/

I think I remember this race; in the scheduler we fixed it by not using
task_css to track the active cgroup and using the various cgroup_subsys
hooks to keep an internally consistent set of state.

But let me go look at what you did in this new light.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ