lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 18 May 2022 09:48:21 -0700
From:   Tadeusz Struk <tadeusz.struk@...aro.org>
To:     Michal Koutný <mkoutny@...e.com>,
        Tejun Heo <tj@...nel.org>
Cc:     cgroups@...r.kernel.org, Zefan Li <lizefan.x@...edance.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Christian Brauner <brauner@...nel.org>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Andrii Nakryiko <andrii@...nel.org>,
        Martin KaFai Lau <kafai@...com>,
        Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
        John Fastabend <john.fastabend@...il.com>,
        KP Singh <kpsingh@...nel.org>, netdev@...r.kernel.org,
        bpf@...r.kernel.org, stable@...r.kernel.org,
        linux-kernel@...r.kernel.org,
        syzbot+e42ae441c3b10acf9e9d@...kaller.appspotmail.com
Subject: Re: [PATCH] cgroup: don't queue css_release_work if one already
 pending

On 4/22/22 04:05, Michal Koutný wrote:
> On Thu, Apr 21, 2022 at 02:00:56PM -1000, Tejun Heo <tj@...nel.org> wrote:
>> If this is the case, we need to hold an extra reference to be put by the
>> css_killed_work_fn(), right?
> 
> I looked into it a bit more lately and found that there already is such
> a fuse in kill_css() [1].
> 
> At the same type syzbots stack trace demonstrates the fuse is
> ineffective
> 
>> css_release+0xae/0xc0 kernel/cgroup/cgroup.c:5146                    (**)
>> percpu_ref_put_many include/linux/percpu-refcount.h:322 [inline]
>> percpu_ref_put include/linux/percpu-refcount.h:338 [inline]
>> percpu_ref_call_confirm_rcu lib/percpu-refcount.c:162 [inline]        (*)
>> percpu_ref_switch_to_atomic_rcu+0x5a2/0x5b0 lib/percpu-refcount.c:199
>> rcu_do_batch+0x4f8/0xbc0 kernel/rcu/tree.c:2485
>> rcu_core+0x59b/0xe30 kernel/rcu/tree.c:2722
>> rcu_core_si+0x9/0x10 kernel/rcu/tree.c:2735
>> __do_softirq+0x27e/0x596 kernel/softirq.c:305
> 
> (*) this calls css_killed_ref_fn confirm_switch
> (**) zero references after confirmed kill?
> 
> So, I was also looking at the possible race with css_free_rwork_fn()
> (from failed css_create()) but that would likely emit a warning from
> __percpu_ref_exit().
> 
> So, I still think there's something fishy (so far possible only via
> artificial ENOMEM injection) that needs an explanation...

I can't reliably reproduce this issue on neither mainline nor v5.10, where
syzbot originally found it. It still triggers for syzbot though.

-- 
Thanks,
Tadeusz

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ