linux-kernel - Re: [PATCH] block: Limit number of items taken from the I/O scheduler in one go

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <6499bf21-df25-3881-4252-8c51a9785bb7@acm.org>
Date:   Mon, 3 Feb 2020 19:47:02 -0800
From:   Bart Van Assche <bvanassche@....org>
To:     Salman Qazi <sqazi@...gle.com>, Jens Axboe <axboe@...nel.dk>,
        linux-block@...r.kernel.org, linux-kernel@...r.kernel.org
Cc:     Jesse Barnes <jsbarnes@...gle.com>,
        Gwendal Grignou <gwendal@...gle.com>,
        Christoph Hellwig <hch@....de>, Ming Lei <ming.lei@...hat.com>,
        Hannes Reinecke <hare@...e.com>
Subject: Re: [PATCH] block: Limit number of items taken from the I/O scheduler
 in one go

On 2020-02-03 12:59, Salman Qazi wrote:
> We observed that it is possible for a flush to bypass the I/O
> scheduler and get added to hctx->dispatch in blk_mq_sched_bypass_insert.
> This can happen while a kworker is running blk_mq_do_dispatch_sched call
> in blk_mq_sched_dispatch_requests.
> 
> However, the blk_mq_do_dispatch_sched call doesn't end in bounded time.
> As a result, the flush can sit there indefinitely, as the I/O scheduler
> feeds an arbitrary number of requests to the hardware.
> 
> The solution is to periodically poll hctx->dispatch in
> blk_mq_do_dispatch_sched, to put a bound on the latency of the commands
> sitting there.

(added Christoph, Ming and Hannes to the Cc-list)

Thank you for having posted a patch; that really helps.

I see that my name occurs first in the "To:" list. Since Jens is the
block layer maintainer I think Jens should have been mentioned first.

In version v4.20 of the Linux kernel I found the following in the legacy
block layer code:
* From blk_insert_flush():
	list_add_tail(&rq->queuelist, &q->queue_head);
* From elv_next_request():
	list_for_each_entry(rq, &q->queue_head, queuelist)

I think this means that the legacy block layer sent flush requests to
the scheduler instead of directly to the block driver. How about
modifying the blk-mq code such that it mimics that approach? I'm asking
this because this patch, although the code looks clean, doesn't seem the
best solution to me.

> +		if (count > 0 && count % q->max_sched_batch == 0 &&
> +		    !list_empty_careful(&hctx->dispatch))
> +			break;

A modulo operation in the hot path? Please don't do that.

> +static ssize_t queue_max_sched_batch_store(struct request_queue *q,
> +					   const char *page,
> +					   size_t count)
> +{
> +	int err, val;
> +
> +	if (!q->mq_ops)
> +		return -EINVAL;
> +
> +	err = kstrtoint(page, 10, &val);
> +	if (err < 0)
> +		return err;
> +
> +	if (val <= 0)
> +		return -EINVAL;
> +
> +	q->max_sched_batch = val;
> +
> +	return count;
> +}

Has it been considered to use kstrtouint() instead of checking whether
the value returned by kstrtoint() is positive?

> +	int			max_sched_batch;

unsigned int?

Thanks,

Bart.