linux-kernel - Re: kobject_init_and

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190510023538.GA10356@eros.localdomain>
Date:   Fri, 10 May 2019 12:35:38 +1000
From:   "Tobin C. Harding" <me@...in.cc>
To:     "Rafael J. Wysocki" <rafael@...nel.org>
Cc:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Tyrel Datwyler <tyreld@...ux.vnet.ibm.com>,
        Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
        Petr Mladek <pmladek@...e.com>,
        Miroslav Benes <mbenes@...e.cz>,
        Viresh Kumar <viresh.kumar@...aro.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: kobject_init_and_add() confusion

On Wed, May 01, 2019 at 09:54:16AM +0200, Rafael J. Wysocki wrote:
> On Wed, May 1, 2019 at 1:38 AM Tobin C. Harding <me@...in.cc> wrote:
> >
> > Hi,
> >
> > Looks like I've created a bit of confusion trying to fix memleaks in
> > calls to kobject_init_and_add().  Its spread over various patches and
> > mailing lists so I'm starting a new thread and CC'ing anyone that
> > commented on one of those patches.
> >
> > If there is a better way to go about this discussion please do tell me.
> >
> > The problem
> > -----------
> >
> > Calls to kobject_init_and_add() are leaking memory throughout the kernel
> > because of how the error paths are handled.
> >
> > The solution
> > ------------
> >
> > Write the error path code correctly.
> >
> > Example
> > -------
> >
> > We have samples/kobject/kobject-example.c but it uses
> > kobject_create_and_add().  I thought of adding another example file here
> > but could not think of how to do it off the top of my head without being
> > super contrived.  Can add this to the TODO list if it will help.
> >
> > Here is an attempted canonical usage of kobject_init_and_add() typical
> > of the code that currently is getting it wrong.  This is the second time
> > I've written this and the first time it was wrong even after review (you
> > know who you are, you are definitely buying the next round of drinks :)
> >
> >
> > Assumes we have an object in memory already that has the kobject
> > embedded in it. Variable 'kobj' below would typically be &ptr->kobj
> >
> >
> >         void fn(void)
> >         {
> >                 int ret;
> >
> >                 ret = kobject_init_and_add(kobj, ktype, NULL, "foo");
> >                 if (ret) {
> >                         /*
> >                          * This means kobject_init() has succeeded
> >                          * but kobject_add() failed.
> >                          */
> >                         goto err_put;
> >                 }
> >
> >                 ret = some_init_fn();
> >                 if (ret) {
> >                         /*
> >                          * We need to wind back kobject_add() AND kobject_put().
> 
> kobject_add() and kobject_init() I suppose?
> 
> >                          * kobject_add() incremented the refcount in
> >                          * kobj->parent, that needs to be decremented THEN we need
> >                          * the call to kobject_put() to decrement the refcount of kobj.
> >                          */
> 
> So actually, if you look at kobject_cleanup(), it calls kobject_del()
> if kobj->state_in_sysfs is set.
> 
> Now, if you look at kobject_add_internal(), it sets
> kobj->state_in_sysfs when about to return 0 (success).
> 
> Therefore calling kobject_put() without the preceding kobject_del() is
> not a bug technically, even though it will trigger the "auto cleanup
> kobject_del" message with debug enabled.
> 
> >                         goto err_del;
> >                 }
> >
> >                 ret = some_other_init_fn();
> >                 if (ret)
> >                         goto other_err;
> >
> >                 kobject_uevent(kobj, KOBJ_ADD);
> >                 return 0;
> >
> >         other_err:
> >                 other_clean_up_fn();
> >         err_del:
> >                 kobject_del(kobj);
> >         err_put:
> >                 kobject_put(kobj);
> >
> >                 return ret;
> >         }
> >
> >
> > Have I got this correct?
> >
> > TODO
> > ----
> >
> > - Fix all the callsites to kobject_init_and_add()
> > - Further clarify the function docstring for kobject_init_and_add() [perhaps]
> > - Add a section to Documentation/kobject.txt [optional]
> > - Add a sample usage file under samples/kobject [optional]
> 
> The plan sounds good to me, but there is one thing to note IMO:
> kobject_cleanup() invokes the ->release() callback for the ktype, so
> these callbacks need to be able to cope with kobjects after a failing
> kobject_add() which may not be entirely obvious to developers
> introducing them ATM.

It has taken a while for this to soak in.  This is actually quite an
insidious issue.  If I give an example and perhaps we can come to a
solution.  This example is based on the code (and assumptions) in
mm/slub.c

If a developer has an object that they wish to add to sysfs they go
ahead and embed a kobject in it.  Correctly set up a ktype including
release function that just frees the object (using container of).  Now
assume that the object is already set up and in use when we go to set up
the sysfs entry.  If kobject_init_and_add() fails and we correctly call
kobject_put() the containing object will be free'd.  Yet the calling
code may not be done with the object, more to the point just because
sysfs setup fails the object is now unusable.  Besides the interesting
theoretical discussion this means we cannot just go and willy-nilly add
calls to kobject_put() in the error path of kobject_init_and_add() if
the original code was not written under the assumption that the release
method could be called during the error path (I have found 2 places at
least where behaviour of calling the release method is non-trivial to
ascertain).

I guess, as Greg said, its just a matter that reference counting within
the kernel is a hard problem.  So we fix the easy ones and then look a
bit harder at the hard ones ...

Any better suggestion?

thanks,
Tobin.