linux-kernel - Re: [PATCH RFC v2 1/5] xen, blkfront: port to the the multi-queue block layer API

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20141001201851.GB12581@laptop.dumpdata.com>
Date:	Wed, 1 Oct 2014 16:18:51 -0400
From:	Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>
To:	Arianna Avanzini <avanzini.arianna@...il.com>
Cc:	boris.ostrovsky@...cle.com, david.vrabel@...rix.com,
	xen-devel@...ts.xenproject.org, linux-kernel@...r.kernel.org,
	hch@...radead.org, bob.liu@...cle.com, felipe.franciosi@...rix.com,
	axboe@...com
Subject: Re: [PATCH RFC v2 1/5] xen, blkfront: port to the the multi-queue
 block layer API

On Fri, Sep 12, 2014 at 01:57:20AM +0200, Arianna Avanzini wrote:
> This commit introduces support for the multi-queue block layer API,
> and at the same time removes the existing request_queue API support.
> The changes are only structural, and the number of supported hardware
> contexts is forcedly set to one.

Hey Arianna,

Thank you for posting them and sorry for the long time in reviewing it.

> 
> Signed-off-by: Arianna Avanzini <avanzini.arianna@...il.com>
> ---
>  drivers/block/xen-blkfront.c | 171 ++++++++++++++++++++-----------------------
>  1 file changed, 80 insertions(+), 91 deletions(-)
> 
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index 5deb235..109add6 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -37,6 +37,7 @@
>  
>  #include <linux/interrupt.h>
>  #include <linux/blkdev.h>
> +#include <linux/blk-mq.h>
>  #include <linux/hdreg.h>
>  #include <linux/cdrom.h>
>  #include <linux/module.h>
> @@ -134,6 +135,8 @@ struct blkfront_info
>  	unsigned int feature_persistent:1;
>  	unsigned int max_indirect_segments;
>  	int is_ready;
> +	/* Block layer tags. */
> +	struct blk_mq_tag_set tag_set;
>  };
>  
>  static unsigned int nr_minors;
> @@ -582,66 +585,69 @@ static inline void flush_requests(struct blkfront_info *info)
>  		notify_remote_via_irq(info->irq);
>  }
>  
> -/*
> - * do_blkif_request
> - *  read a block; request is in a request queue
> - */
> -static void do_blkif_request(struct request_queue *rq)

> +static int blkfront_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *req)
>  {
> -	struct blkfront_info *info = NULL;
> -	struct request *req;
> -	int queued;
> -
> -	pr_debug("Entered do_blkif_request\n");
> -
> -	queued = 0;
> -

This loop allowed us to queue up on the ring up to 32 requests
(or more depending depending on the options), and then when done
(or when no more could be fitted), call the 'flush_request'
which would update the producer index and kick the backend (if needed).

With the removal of the loop we could be called for each
I/O and do a bunch of : write in ring, update producer index, kick backend,
write in ring, update producer ring, kick backend, etc.


> -	while ((req = blk_peek_request(rq)) != NULL) {
> -		info = req->rq_disk->private_data;
> +	struct blkfront_info *info = req->rq_disk->private_data;
>  
> -		if (RING_FULL(&info->ring))
> -			goto wait;
> +	spin_lock_irq(&info->io_lock);
> +	if (RING_FULL(&info->ring))
> +		goto wait;
>  
> -		blk_start_request(req);
> +	if ((req->cmd_type != REQ_TYPE_FS) ||
> +			((req->cmd_flags & (REQ_FLUSH | REQ_FUA)) &&
> +			 !info->flush_op)) {

You had an earlier patch (xen, blkfront: factor out flush-related checks from do_blkif_request())

Which made that 'if' statement much nicer. I put said patch on the 3.18 train
for Jens - so once v3.18-rc1 is out you can rebase on that and pick this
up.

> +		req->errors = -EIO;
> +		blk_mq_complete_request(req);
> +		spin_unlock_irq(&info->io_lock);
> +		return BLK_MQ_RQ_QUEUE_ERROR;
> +	}
>  
> -		if ((req->cmd_type != REQ_TYPE_FS) ||
> -		    ((req->cmd_flags & (REQ_FLUSH | REQ_FUA)) &&
> -		    !info->flush_op)) {
> -			__blk_end_request_all(req, -EIO);
> -			continue;
> -		}
> +	if (blkif_queue_request(req)) {
> +		blk_mq_requeue_request(req);
> +		goto wait;
> +	}
>  
> -		pr_debug("do_blk_req %p: cmd %p, sec %lx, "
> -			 "(%u/%u) [%s]\n",
> -			 req, req->cmd, (unsigned long)blk_rq_pos(req),
> -			 blk_rq_cur_sectors(req), blk_rq_sectors(req),
> -			 rq_data_dir(req) ? "write" : "read");
> +	flush_requests(info);
> +	spin_unlock_irq(&info->io_lock);
> +	return BLK_MQ_RQ_QUEUE_OK;
>  
> -		if (blkif_queue_request(req)) {
> -			blk_requeue_request(rq, req);
>  wait:
> -			/* Avoid pointless unplugs. */
> -			blk_stop_queue(rq);
> -			break;
> -		}
> -
> -		queued++;
> -	}
> -
> -	if (queued != 0)
> -		flush_requests(info);
> +	/* Avoid pointless unplugs. */
> +	blk_mq_stop_hw_queue(hctx);
> +	spin_unlock_irq(&info->io_lock);
> +	return BLK_MQ_RQ_QUEUE_BUSY;
>  }
>  
> +static struct blk_mq_ops blkfront_mq_ops = {
> +	.queue_rq = blkfront_queue_rq,
> +	.map_queue = blk_mq_map_queue,
> +};
> +
>  static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size,
>  				unsigned int physical_sector_size,
>  				unsigned int segments)
>  {
>  	struct request_queue *rq;
>  	struct blkfront_info *info = gd->private_data;
> +	int ret;
> +
> +	memset(&info->tag_set, 0, sizeof(info->tag_set));

You can ditch that. In 'blkfront_probe' we use 'kzalloc' to setup
the 'info' structure so it is already set to zero values.

> +	info->tag_set.ops = &blkfront_mq_ops;
> +	info->tag_set.nr_hw_queues = 1;
> +	info->tag_set.queue_depth = BLK_RING_SIZE;

Could you add an simple comment (or I can do it when I pick v3):
/* TODO: Need to figure out how to square this with 'xen_blkif_max_segments' */

> +	info->tag_set.numa_node = NUMA_NO_NODE;
> +	info->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
> +	info->tag_set.cmd_size = 0;

Could you add an comment about why we have zero here? My recollection is
that we do not need it as we keep track of the request internally (for
migration purposes and if we did persistent mapping).

/*
 * And the reason we want it set to zero is because we want to be responsible
 * for page recycling (in case it is persistent and we need to manually
 * deal with unmapping it, etc).
 */

> +	info->tag_set.driver_data = info;

Or perhaps mention that since we have 'driver_data' we can access 'info'
and do all internal mapping/unmapping/etc on that.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/