linux-kernel - Re: [PATCH] block: fix io hung by block throttle

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <1c3a2874-35bd-3ff0-3088-1a6ea968d016@oracle.com>
Date:   Sun, 18 Apr 2021 23:09:41 -0700
From:   Junxiao Bi <junxiao.bi@...cle.com>
To:     Hillf Danton <hdanton@...a.com>
Cc:     linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
        axboe@...nel.dk
Subject: Re: [PATCH] block: fix io hung by block throttle


On 4/18/21 5:33 AM, Hillf Danton wrote:
> On Sat, 17 Apr 2021 14:37:57  Junxiao Bi  wrote:
>> On 4/17/21 3:10 AM, Hillf Danton wrote:
>>> +	if (acquire_inflight_cb(rqw, private_data))
>> This function is to increase atomic variable rq_wait->inflight.
> You are right.
>
>> What's the mutex for?
> It cuts the race between we peek at the sleepers on rqw->wait while they are
> coming and going, and we cant update rqw->inflight without making sure there
> are no sleepers.

Why? I think checking the sleeper in original code is for a fast path.

For wbt, acquire_inflight_cb is wbt_inflight_cb where atomic_inc_below 
is used to update rqw->inflight. I don't see why a mutex is needed for 
this atomic operation.

>
> With the mutex in place, in addition to the certainty of !sleepers, we can
> avoid the race between us and waker in terms of updating inflight by removing
> the invokation of acquire_inflight_cb in the wakeup callback, and the bonus is
> we no longer need the wakeup cb and the rq_qos_wait_data because the more
> traditional wait_event() can do the job.
>
> Finally we can dump the cleanup_cb_t.
>
> +++ b/block/blk-rq-qos.c
> @@ -200,96 +200,24 @@ bool rq_depth_scale_down(struct rq_depth
>   	return true;
>   }
>   
> -struct rq_qos_wait_data {
> -	struct wait_queue_entry wq;
> -	struct task_struct *task;
> -	struct rq_wait *rqw;
> -	acquire_inflight_cb_t *cb;
> -	void *private_data;
> -	bool got_token;
> -};
> -
> -static int rq_qos_wake_function(struct wait_queue_entry *curr,
> -				unsigned int mode, int wake_flags, void *key)
> -{
> -	struct rq_qos_wait_data *data = container_of(curr,
> -						     struct rq_qos_wait_data,
> -						     wq);
> -
> -	/*
> -	 * If we fail to get a budget, return -1 to interrupt the wake up loop
> -	 * in __wake_up_common.
> -	 */
> -	if (!data->cb(data->rqw, data->private_data))
> -		return -1;
> -
> -	data->got_token = true;
> -	smp_wmb();
> -	list_del_init(&curr->entry);
> -	wake_up_process(data->task);
> -	return 1;
> -}
> -
>   /**
>    * rq_qos_wait - throttle on a rqw if we need to
>    * @rqw: rqw to throttle on
>    * @private_data: caller provided specific data
>    * @acquire_inflight_cb: inc the rqw->inflight counter if we can
> - * @cleanup_cb: the callback to cleanup in case we race with a waker
>    *
>    * This provides a uniform place for the rq_qos users to do their throttling.
>    * Since you can end up with a lot of things sleeping at once, this manages the
>    * waking up based on the resources available.  The acquire_inflight_cb should
>    * inc the rqw->inflight if we have the ability to do so, or return false if not
>    * and then we will sleep until the room becomes available.
> - *
> - * cleanup_cb is in case that we race with a waker and need to cleanup the
> - * inflight count accordingly.
>    */
>   void rq_qos_wait(struct rq_wait *rqw, void *private_data,
> -		 acquire_inflight_cb_t *acquire_inflight_cb,
> -		 cleanup_cb_t *cleanup_cb)
> +		 acquire_inflight_cb_t *acquire_inflight_cb)
>   {
> -	struct rq_qos_wait_data data = {
> -		.wq = {
> -			.func	= rq_qos_wake_function,
> -			.entry	= LIST_HEAD_INIT(data.wq.entry),
> -		},
> -		.task = current,
> -		.rqw = rqw,
> -		.cb = acquire_inflight_cb,
> -		.private_data = private_data,
> -	};
> -	bool has_sleeper;
> -
> -	has_sleeper = wq_has_sleeper(&rqw->wait);
> -	if (!has_sleeper && acquire_inflight_cb(rqw, private_data))
> -		return;
> -
> -	prepare_to_wait_exclusive(&rqw->wait, &data.wq, TASK_UNINTERRUPTIBLE);
> -	has_sleeper = !wq_has_single_sleeper(&rqw->wait);
> -	do {
> -		/* The memory barrier in set_task_state saves us here. */
> -		if (data.got_token)
> -			break;
> -		if (!has_sleeper && acquire_inflight_cb(rqw, private_data)) {
> -			finish_wait(&rqw->wait, &data.wq);
> -
> -			/*
> -			 * We raced with wbt_wake_function() getting a token,
> -			 * which means we now have two. Put our local token
> -			 * and wake anyone else potentially waiting for one.
> -			 */
> -			smp_rmb();
> -			if (data.got_token)
> -				cleanup_cb(rqw, private_data);
> -			break;
> -		}
> -		io_schedule();
> -		has_sleeper = true;
> -		set_current_state(TASK_UNINTERRUPTIBLE);
> -	} while (1);
> -	finish_wait(&rqw->wait, &data.wq);
> +	mutex_lock(&rqw->throttle_mutex);
> +	wait_event(rqw->wait, acquire_inflight_cb(rqw, private_data));
> +	mutex_unlock(&rqw->throttle_mutex);

This will break the throttle? There is a inflight io limitation. With 
this change, there can be only one io inflight whatever the limit is.

Thanks,

Junxiao.

>   }
>   
>   void rq_qos_exit(struct request_queue *q)