linux-kernel - Re: [PATCH] x86/resctrl: Fix memory leak on kernfs dir removal

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CA+FuTScL9bwxmKKSvPmUaN+Kmo_cwMocHURZt4u9EqjMJ0_kfw@mail.gmail.com>
Date:   Mon, 26 Oct 2020 12:48:28 -0400
From:   Willem de Bruijn <willemdebruijn.kernel@...il.com>
To:     Reinette Chatre <reinette.chatre@...el.com>
Cc:     Willem de Bruijn <willemdebruijn.kernel@...il.com>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Fenghua Yu <fenghua.yu@...el.com>,
        Thomas Gleixner <tglx@...utronix.de>, mingo@...hat.com,
        bp@...en8.de, x86@...nel.org, HPA <hpa@...or.com>,
        Xiaochen Shen <xiaochen.shen@...el.com>
Subject: Re: [PATCH] x86/resctrl: Fix memory leak on kernfs dir removal

On Mon, Oct 26, 2020 at 12:24 PM Reinette Chatre
<reinette.chatre@...el.com> wrote:
>
> +Xiaochen
>
> Hi Willem,
>
> As you described in the report you sent directly to us there are indeed
> more issues than the one described here surrounding the kernfs node
> reference counting in resctrl. Xiaochen is actively working on patch(es)
> for all the issues and you could continue working with him ... now
> externally?

Great to hear. I wasn't aware of that. Of course. Externally or
off-list first, whichever you prefer.

For reference, one other issue occurs on mount/umount:

    for i in {1..200000}; do
        mount -t resctrl resctrl /sys/fs/resctrl;
        umount /sys/fs/resctrl;
    done

> On 10/26/2020 8:09 AM, Willem de Bruijn wrote:
> > From: Willem de Bruijn <willemb@...gle.com>
> >
> > Resctrl takes an extra kernfs ref on directory entries, to access
> > the entry on cleanup in rdtgroup_kn_unlock after removing the entire
> > subtree with kfree_remove.
> >
> > But the path takes an extra ref both on mkdir and on rmdir.
> >
>
> On resource group (control as well as monitoring) creation via a mkdir
> an extra kernfs node reference is obtained to ensure that the rdtgroup
> structure remains accessible for the rdtgroup_kn_unlock() calls where it
> is removed on deletion. This symmetry ties the resource group's lifetime
> with the kernfs node. The extra kernfs node reference count is dropped
> by kernfs_put() in rdtgroup_kn_unlock() as is documented in the comment
> removed by this patch.
>
> As you state there is an extra reference obtained in rmdir, that is
> unnecessary.
>
> > The kernfs_get on mkdir causes a memleak in the unlikely exit with
> > error in the same function, as no extra kernfs_put exists and no extra
> > rdtgroup_kn_unlock occurs.
>
> This is a bug.
>
> >
> > More importantly, essentially the same happens in the normal path, as
> > this simple program demonstrates:
> >
> >      for i in {1..200000}; do
> >        mkdir /sys/fs/resctrl/task1
> >        rmdir /sys/fs/resctrl/task1
> >      done
> >      slabtop
> >
> > When taking an extra ref for the duration of kernfs_remove, it is
> > easiest to reason about when holding this extra ref as short as
> > possible. For that, the refcnt on error reason and free on umount
> > (rmdir_all_sub), remove the first kernfs_get on mkdir, leaving the
> > other on rmdir.
>
> rmdir_all_sub() may be prevented from just removing the resource group
> if there are any waiters. In this case the resource group would be
> removed by rdtgroup_kn_unlock() by the last waiter at which point a
> reference would be dropped. With this patch there would be no reference
> to drop.

Ah, indeed. It would be easier to reason about if rdtgroup_kn_lock_live
takes an extra ref that rdtgroup_kn_unlock releases? But either way.
I had certainly missed that path.

> Indeed, there is another issue where the kfree(rdtgrp) in
> rmdir_all_sub() (the case when there are no waiters) is missing a
> kernfs_put(). Xiaochen is meticulously working through all of this.
>
> >
> > As the caller of rdtgroup_rmdir, kernfs_iop_rmdir, itself takes a
> > reference on the kernfs object, the extra reference is possibly not
> > needed at all.
>
> This is not obvious to me. Are you referring to
> kernfs_iop_rmdir()->kernfs_get_active(kn)? That is a different reference
> (kn->active as opposed to kn->count)?

I thought that would have the same effect of ensuring that kn can
be dereferenced safely throughout rdtgroup_rmdir. But judging from the
WARN_ONCE in kernfs_put the rules on count vs active are not quite that
simple.