[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <79ec0073-2db7-a7a2-ec60-265a617d463a@linaro.org>
Date: Fri, 3 Jun 2022 11:06:17 -0700
From: Tadeusz Struk <tadeusz.struk@...aro.org>
To: Tejun Heo <tj@...nel.org>
Cc: Michal Koutny <mkoutny@...e.com>,
Zefan Li <lizefan.x@...edance.com>,
Johannes Weiner <hannes@...xchg.org>,
Christian Brauner <brauner@...nel.org>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Andrii Nakryiko <andrii@...nel.org>,
Martin KaFai Lau <kafai@...com>,
Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
John Fastabend <john.fastabend@...il.com>,
KP Singh <kpsingh@...nel.org>, cgroups@...r.kernel.org,
netdev@...r.kernel.org, bpf@...r.kernel.org,
stable@...r.kernel.org, linux-kernel@...r.kernel.org,
syzbot+e42ae441c3b10acf9e9d@...kaller.appspotmail.com
Subject: Re: [PATCH] cgroup: serialize css kill and release paths
On 6/3/22 10:34, Tadeusz Struk wrote:
> Syzbot found a corrupted list bug scenario that can be triggered from
> cgroup_subtree_control_write(cgrp). The reproduces writes to
> cgroup.subtree_control file, which invokes:
> cgroup_apply_control_enable()->css_create()->css_populate_dir(), which
> then fails with a fault injected -ENOMEM.
> In such scenario the css_killed_work_fn will be en-queued via
> cgroup_apply_control_disable(cgrp)->kill_css(css), and bail out to
> cgroup_kn_unlock(). Then cgroup_kn_unlock() will call:
> cgroup_put(cgrp)->css_put(&cgrp->self), which will try to enqueue
> css_release_work_fn for the same css instance, causing a list_add
> corruption bug, as can be seen in the syzkaller report [1].
>
> Fix this by synchronizing the css ref_kill and css_release jobs.
> css_release() function will check if the css_killed_work_fn() has been
> scheduled for the css and only en-queue the css_release_work_fn()
> if css_killed_work_fn wasn't already en-queued. Otherwise css_release() will
> set the CSS_REL_LATER flag for that css. This will cause the css_release_work_fn()
> work to be executed after css_killed_work_fn() is finished.
>
> Two scc flags have been introduced to implement this serialization mechanizm:
>
> * CSS_KILL_ENQED, which will be set when css_killed_work_fn() is en-queued, and
> * CSS_REL_LATER, which, if set, will cause the css_release_work_fn() to be
> scheduled after the css_killed_work_fn is finished.
>
> There is also a new lock, which will protect the integrity of the css flags.
>
> [1]https://syzkaller.appspot.com/bug?id=e26e54d6eac9d9fb50b221ec3e4627b327465dbd
>
> Cc: Tejun Heo<tj@...nel.org>
> Cc: Michal Koutny<mkoutny@...e.com>
> Cc: Zefan Li<lizefan.x@...edance.com>
> Cc: Johannes Weiner<hannes@...xchg.org>
> Cc: Christian Brauner<brauner@...nel.org>
> Cc: Alexei Starovoitov<ast@...nel.org>
> Cc: Daniel Borkmann<daniel@...earbox.net>
> Cc: Andrii Nakryiko<andrii@...nel.org>
> Cc: Martin KaFai Lau<kafai@...com>
> Cc: Song Liu<songliubraving@...com>
> Cc: Yonghong Song<yhs@...com>
> Cc: John Fastabend<john.fastabend@...il.com>
> Cc: KP Singh<kpsingh@...nel.org>
> Cc:<cgroups@...r.kernel.org>
> Cc:<netdev@...r.kernel.org>
> Cc:<bpf@...r.kernel.org>
> Cc:<stable@...r.kernel.org>
> Cc:<linux-kernel@...r.kernel.org>
>
> Reported-and-tested-by:syzbot+e42ae441c3b10acf9e9d@...kaller.appspotmail.com
> Fixes: 8f36aaec9c92 ("cgroup: Use rcu_work instead of explicit rcu and work item")
> Signed-off-by: Tadeusz Struk<tadeusz.struk@...aro.org>
I just spotted an issue with this. I'm holding invalid lock in css_killed_work_fn().
I will follow up with a v2 of the patch soon.
--
Thanks,
Tadeusz
Powered by blists - more mailing lists