linux-kernel - Re: [PATCH-next] block: fix null-deref in percpu_ref

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Y4/mzMd4evRg9yDi@fedora>
Date:   Tue, 6 Dec 2022 17:05:16 -0800
From:   Dennis Zhou <dennis@...nel.org>
To:     Zhong Jinghua <zhongjinghua@...wei.com>
Cc:     tj@...nel.org, cl@...ux.com, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, yi.zhang@...wei.com,
        yukuai3@...wei.com
Subject: Re: [PATCH-next] block: fix null-deref in percpu_ref_put

Hello,

On Tue, Dec 06, 2022 at 05:09:39PM +0800, Zhong Jinghua wrote:
> A problem was find in stable 5.10 and the root cause of it like below.
> 
> In the use of q_usage_counter of request_queue, blk_cleanup_queue using
> "wait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->q_usage_counter))"
> to wait q_usage_counter becoming zero. however, if the q_usage_counter
> becoming zero quickly, and percpu_ref_exit will execute and ref->data
> will be freed, maybe another process will cause a null-defef problem
> like below:
> 
> 	CPU0                             CPU1
> blk_mq_destroy_queue
>  blk_freeze_queue
>   blk_mq_freeze_queue_wait
> 				scsi_end_request
> 				 percpu_ref_get
> 				 ...
> 				 percpu_ref_put
> 				  atomic_long_sub_and_test
>  blk_put_queue
>   kobject_put
>    kref_put
>     blk_release_queue
>      percpu_ref_exit
>       ref->data -> NULL
>    				   ref->data->release(ref) -> null-deref
> 

I remember thinking about this a while ago. I don't think this fix works
as nicely as it may seem. Please correct me if I'm wrong.

q->q_usage_counter has the oddity that the lifetime of the percpu_ref
object isn't managed by the release function. The freeing is handled by
a separate path where it depends on the percpu_ref hitting 0. So here we
have 2 concurrent paths racing to run with 1 destroying the object. We
probably need blk_release_queue() to wait on percpu_ref's release
finishing, not starting.

I think the above works in this specific case because there is a
call_rcu() in blk_release_queue(). If there wasn't a call_rcu(),
then by the same logic we could delay ref->data->release(ref) further
and that could potentially lead to a use after free.

Ideally, I think fixing the race in q->q_usage_counter's pattern is
better than masking it here as I think we're being saved by the
call_rcu() call further down the object release path.

Thanks,
Dennis

> As suggested by Ming Lei, fix it by getting the release method before
> the referebce count is minus 0.
> 
> Suggested-by: Ming Lei <ming.lei@...hat.com>
> Signed-off-by: Zhong Jinghua <zhongjinghua@...wei.com>
> ---
>  include/linux/percpu-refcount.h | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h
> index d73a1c08c3e3..11e717c95acb 100644
> --- a/include/linux/percpu-refcount.h
> +++ b/include/linux/percpu-refcount.h
> @@ -331,8 +331,11 @@ static inline void percpu_ref_put_many(struct percpu_ref *ref, unsigned long nr)
>  
>  	if (__ref_is_percpu(ref, &percpu_count))
>  		this_cpu_sub(*percpu_count, nr);
> -	else if (unlikely(atomic_long_sub_and_test(nr, &ref->data->count)))
> -		ref->data->release(ref);
> +	else {
> +		percpu_ref_func_t *release = ref->data->release;
> +		if (unlikely(atomic_long_sub_and_test(nr, &ref->data->count)))
> +			release(ref);
> +	}
>  
>  	rcu_read_unlock();
>  }
> -- 
> 2.31.1
>