linux-kernel - Re: [PATCH 2/2] drivers: core: Remove glue dirs from sysfs earlier

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 02 Jul 2018 20:22:17 +1000
From:   Benjamin Herrenschmidt <benh@...nel.crashing.org>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        "Eric W. Biederman" <ebiederm@...ssion.com>,
        Joel Stanley <joel@....id.au>
Subject: Re: [PATCH 2/2] drivers: core: Remove glue dirs from sysfs earlier

On Sun, 2018-07-01 at 10:04 -0700, Linus Torvalds wrote:
> On Sun, Jul 1, 2018 at 12:16 AM Benjamin Herrenschmidt
> <benh@...nel.crashing.org> wrote:
> > 
> > I suspect you didn't read it my entire argument or I wasn't clear
> > enough :-) This is actually the crux of the problem:
> > 
> > Yes the object continues to exist. However, the *last* kobject_put to
> > it will no longer be done under whatever higher level locking the
> > subsystem provides (whatever prevents for example concurrent add and
> > removes).
> 
> Well, yes and no.
> 
> Why "no"?
> 
> The last dropping is actually not necessarily that interesting.
> Especially with the sysfs interface, we basically know that you can
> look up the object using RCU (since that's what the filesystem lookup
> does), and that basically means that the refcount is always the final
> serialization mechanism.There is nothing else that can possibly lock
> it.
> 
> So this is where we disagree:
> 
> > Thus in that scenario the "last minute" kobject_release() done by the
> > last kobject_put() will be effectively unprotected from for example the
> > gdp_mutex (in the case of the gluedirs) or whatever other locking the
> > subsystem owning the kobject is using to avoid making that "refount 0"
> > object "discoverable".
> 
> No. *Fundamentally*, there is only one thing that protects that
> object: the refcount.
> 
> And my argument is that anything that has this model (which means
> anything that has any sysfs linkage, which pretty much means any
> kobject) absolutely *must* use "kobject_get_unless_zero()" unless it
> had an existing stable pointer and just wants to increase the refcount
> (ie "already got a reference throuigh one of my data structures that
> use the lock")
> 
> But if you have that model, that means that the "last drop" is
> actually almost totally irrelevant.  Because it doesn't matter if it's
> done inside the lock or not - you know that once it has been done,
> that object is entirely gone. It's not discoverable any more -
> regardless of locking. The discoverability is basically controlled
> entirely by the refcount.
> 
> So what happens then?
> 
> The locking isn't important for the last release, but it *is*
> important for new object *creation*.
> 
> Why?
> 
> The refcount means that once an object is gone, it's gone for
> *everyone*. It's a one-way thing, and it's thread-safe. So the code
> that does *creation* can do this:
> 
>  - get subsystem lock
>  - look up object (using the "unless_zero()" model)
>  - if you got the object, re-use it, and you're done: drop lock and return
>  - otherwise, you know that nobody else can get it either
>  - create new object and instantiate it
>  - drop lock
> 
> and this means that you create a new object IFF the old object had its
> refcount drop to zero. So you always have exactly one copy (or no copy
> at all, in between the last drop and the creation of the new one).
> 
> See? The lack of locking at drop time didn't matter.  The refcount
> itself serialized things.
> 
> So the above is what I *think* the "glue_dir" logic should be. No need
> for any other count. Just re-use the old glue dir if you can find it,
> and create a new one if you can't.
> 
> The one important thing is that the object lookup above needs to use
> the lookup needs to find the object all the way until the refcount has
> become zero. And that actually means that the object MUST NOT be
> removed from the object lists until *after* the refcount has been
> decremented to zero. Which is actually why that "automatic cleanup"
> that you hate is actually an integral and important part of the
> process: removing the object *before* the refcount went to zero is
> broken, because that means that the "look up object" phase can now
> miss an object that still has a non-zero refcount.
> 
> > So my patch 1/2 prevents us from finding the old dying object (and thus
> > from crashing) but replaces this with the duplicate name problem.
> 
> So I absolutely agree with your patch 1/2. My argument against it is
> actually that I think the "unless_zero" thing needs to be more
> universal.
> 
> > My patch 2/2 removes that second problem by ensuring we remove the
> > object from sysfs synchronously in device_del when it no longer
> > contains any children, explicitely rather than implicitely by the
> > virtue of doing the "last" kobject_put.
> 
> No. See above. The reason I think your patch 2/2 is wrong is that is
> actually *breaks* the above model, exactly because of that thing that
> you hatre.
> 
> The explicit removal is actively wrong for the "I want to reuse the
> object" model, exactly because it happens before the refcount has gone
> to zero.
> 
> > > No.  That the zero kobject_get() will not result in a warning. It just
> > > does a kref_get(), no warnings anywhere.
> > 
> > It's there but it's in refcount:
> > 
> > void refcount_inc(refcount_t *r)
> > {
> >         WARN_ONCE(!refcount_inc_not_zero(r), "refcount_t: increment on 0; use-after-free.\n");
> > }
> > EXPORT_SYMBOL(refcount_inc);
> > 
> > In fact that's how I started digging into that problem in the first place :-)
> 
> Hey, you are again hitting this because of extra config options.
> 
> Because the refcount_inc() that I found looks like this:
> 
>   static inline void refcount_inc(refcount_t *r)
>   {
>           atomic_inc(&r->refs);
>   }
> 
> and has no warning.
> 
> I wonder how many people actually run with REFCOUNT_FULL that warns -
> because it's way too expensive. It's not set in the default Fedora
> config, for example, and that's despite how Fedora tends to set all
> the other debug options.
> 
> So no, it really doesn't warn normally.
> 
>                 Linus