[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <202106231555.871D23D50@keescook>
Date:   Wed, 23 Jun 2021 15:59:37 -0700
From:   Kees Cook <keescook@...omium.org>
To:     Luis Chamberlain <mcgrof@...nel.org>
Cc:     gregkh@...uxfoundation.org, rafael@...nel.org, davem@...emloft.net,
        kuba@...nel.org, ast@...nel.org, andriin@...com,
        daniel@...earbox.net, atenart@...nel.org, alobakin@...me,
        weiwan@...gle.com, ap420073@...il.com, jeyu@...nel.org,
        ngupta@...are.org, sergey.senozhatsky.work@...il.com,
        minchan@...nel.org, axboe@...nel.dk, mbenes@...e.com,
        jpoimboe@...hat.com, tglx@...utronix.de, jikos@...nel.org,
        rostedt@...dmis.org, peterz@...radead.org,
        linux-block@...r.kernel.org, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4] sysfs: fix kobject refcount to address races with
 kobject removal
On Wed, Jun 23, 2021 at 02:50:07PM -0700, Luis Chamberlain wrote:
> It's possible today to have a device attribute read or store
> race against device removal. This is known to happen as follows:
> 
> write system call -->
>   ksys_write () -->
>     vfs_write() -->
>       __vfs_write() -->
>         kernfs_fop_write_iter() -->
>           sysfs_kf_write() -->
>             dev_attr_store() -->
>               null reference
> 
> This happens because the dev_attr->store() callback can be
> removed prior to its call, after dev_attr_store() was initiated.
> The null dereference is possible because the sysfs ops can be
> removed on module removal, for instance, when device_del() is
> called, and a sysfs read / store is not doing any kobject reference
> bumps either. This allows a read/store call to initiate, a
> device_del() to kick off, and then the read/store call can be
> gone by the time to execute it.
> 
> The sysfs filesystem is not doing any kobject reference bumps during a
> read / store ops to prevent this.
> 
> To fix this in a simplified way, just bump the kobject reference when
> we create a directory and remove it on directory removal.
> 
> The big unfortunate eye-sore is addressing the manual kobject reference
> assumption on the networking code, which leads me to believe we should
> end up replacing that eventually with another sort of check.
> 
> Suggested-by: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
> Signed-off-by: Luis Chamberlain <mcgrof@...nel.org>
> ---
> 
> This v4 moves to fixing the race condition on dev_attr_store() and
> dev_attr_read() to sysfs by bumping the kobject reference count
> on directory creation / deletion as suggested by Greg.
> 
> Unfortunately at least the networking core has a manual refcount
> assumption, which needs to be adjusted to account for this change.
> This should also mean there is runtime for other kobjects which may
> not be explored yet which may need fixing as well. We may want to
> change the check to something else on the networking front, but its
> not clear to me yet what to use.
> 
>  fs/sysfs/dir.c | 3 +++
>  net/core/dev.c | 4 ++--
>  2 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
> index 59dffd5ca517..6c47aa4af6f5 100644
> --- a/fs/sysfs/dir.c
> +++ b/fs/sysfs/dir.c
> @@ -56,12 +56,14 @@ int sysfs_create_dir_ns(struct kobject *kobj, const void *ns)
>  
>  	kobject_get_ownership(kobj, &uid, &gid);
>  
> +	kobject_get(kobj);
>  	kn = kernfs_create_dir_ns(parent, kobject_name(kobj),
>  				  S_IRWXU | S_IRUGO | S_IXUGO, uid, gid,
>  				  kobj, ns);
>  	if (IS_ERR(kn)) {
>  		if (PTR_ERR(kn) == -EEXIST)
>  			sysfs_warn_dup(parent, kobject_name(kobj));
> +		kobject_put(kobj);
>  		return PTR_ERR(kn);
>  	}
>  
> @@ -100,6 +102,7 @@ void sysfs_remove_dir(struct kobject *kobj)
>  	if (kn) {
>  		WARN_ON_ONCE(kernfs_type(kn) != KERNFS_DIR);
>  		kernfs_remove(kn);
> +		kobject_put(kobj);
>  	}
>  }
Shouldn't this be taken on open() not sysfs creation, or is the problem
that the kobject is held the module memory rather than duplicated by
sysfs?
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 222b1d322c96..3a0ffa603d14 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -10429,7 +10429,7 @@ static void netdev_wait_allrefs(struct net_device *dev)
>  	rebroadcast_time = warning_time = jiffies;
>  	refcnt = netdev_refcnt_read(dev);
>  
> -	while (refcnt != 1) {
> +	while (refcnt != 3) {
>  		if (time_after(jiffies, rebroadcast_time + 1 * HZ)) {
>  			rtnl_lock();
>  
> @@ -10544,7 +10544,7 @@ void netdev_run_todo(void)
>  		netdev_wait_allrefs(dev);
>  
>  		/* paranoia */
> -		BUG_ON(netdev_refcnt_read(dev) != 1);
> +		BUG_ON(netdev_refcnt_read(dev) != 3);
And surely there are things besides netdevs that would suffer from this
change?
>  		BUG_ON(!list_empty(&dev->ptype_all));
>  		BUG_ON(!list_empty(&dev->ptype_specific));
>  		WARN_ON(rcu_access_pointer(dev->ip_ptr));
> -- 
> 2.27.0
> 
-- 
Kees Cook
Powered by blists - more mailing lists
 
