linux-kernel - Re: [syzbot] WARNING in mntput_no

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YoR8yrgorv8QssX6@zeniv-ca.linux.org.uk>
Date:   Wed, 18 May 2022 04:57:46 +0000
From:   Al Viro <viro@...iv.linux.org.uk>
To:     syzbot <syzbot+5b1e53987f858500ec00@...kaller.appspotmail.com>
Cc:     hdanton@...a.com, linux-kernel@...r.kernel.org,
        syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] WARNING in mntput_no_expire (3)

On Wed, May 18, 2022 at 04:38:53AM +0000, Al Viro wrote:
> On Wed, May 18, 2022 at 01:58:40AM +0000, Al Viro wrote:
> > On Wed, May 18, 2022 at 01:10:20AM +0000, Al Viro wrote:
> > > On Wed, May 18, 2022 at 12:59:46AM +0000, Al Viro wrote:
> > > > On Tue, May 17, 2022 at 10:58:15PM +0000, Al Viro wrote:
> > > > > On Tue, May 17, 2022 at 03:49:07PM -0700, syzbot wrote:
> > > > > > Hello,
> > > > > > 
> > > > > > syzbot has tested the proposed patch but the reproducer is still triggering an issue:
> > > > > > WARNING in mntput_no_expire
> > > > > 
> > > > > Obvious question: which filesystem it is?
> > > > 
> > > > FWIW, can't reproduce here - at least not with C reproducer +
> > > > -rc7^ kernel + .config from report + debian kvm image (bullseye,
> > > > with systemd shite replaced with sysvinit, which might be relevant).
> > > > 
> > > > In case systemd-specific braindamage is needed to reproduce it...
> > > > Hell knows; at least mount --make-rshared / doesn't seem to suffice.
> > > 
> > > ... doesn't reproduce with genuine systemd either.  FWIW, 4-way SMP
> > > setup here.
> > 
> > OK, reproduced...
> 
> FWIW, it smells like something (cgroup?) fucking up percpu allocation/freeing.
> Note that struct mount has both refcount and writers count held in percpu;
> replacing the refcount with atomic_t gets rid of seeing negative refcount
> in mntput_no_expire(), but leaves negative writers count caught in
> cleanup_mnt(); turn that from WARN_ON into printk and we get past that,
> only to see
> 	percpu ref (css_release) <= 0 (-4294967294)
> immediately afterwards.
> 
> IOW, it looks like we are getting not messed refcounting on either side,
> but same refcount physically shared by unrelated objects.

Gotcha.
percpu_ref_init():
        ref->percpu_count_ptr = (unsigned long)
                __alloc_percpu_gfp(sizeof(unsigned long), align, gfp);
        if (!ref->percpu_count_ptr)
                return -ENOMEM;
        data = kzalloc(sizeof(*ref->data), gfp);
        if (!data) {
                free_percpu((void __percpu *)ref->percpu_count_ptr);
                return -ENOMEM;
        }

cgroup_create():
        err = percpu_ref_init(&css->refcnt, css_release, 0, GFP_KERNEL);
        if (err)
                goto err_free_css;

        err = cgroup_idr_alloc(&ss->css_idr, NULL, 2, 0, GFP_KERNEL);
        if (err < 0)
                goto err_free_css;

Now note that we end up hitting the same path in case of successful and
failed percpu_ref_init().  With no way to tell if css->refcnt.percpu_count_ptr
is an already freed object or needs to be freed.  And sure enough, we have

err_free_css:
        list_del_rcu(&css->rstat_css_node);
        INIT_RCU_WORK(&css->destroy_rwork, css_free_rwork_fn);
        queue_rcu_work(cgroup_destroy_wq, &css->destroy_rwork);

with css_free_rwork_fn() starting with
        percpu_ref_exit(&css->refcnt);

which will give that double free.  That might be not the only cause of
trouble, but this looks like a bug and a plausible source of the
symptoms observed here.  Let's see if this helps:

diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c
index af9302141bcf..e5c5315da274 100644
--- a/lib/percpu-refcount.c
+++ b/lib/percpu-refcount.c
@@ -76,6 +76,7 @@ int percpu_ref_init(struct percpu_ref *ref, percpu_ref_func_t *release,
 	data = kzalloc(sizeof(*ref->data), gfp);
 	if (!data) {
 		free_percpu((void __percpu *)ref->percpu_count_ptr);
+		ref->percpu_count_ptr = 0;
 		return -ENOMEM;
 	}