[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9F2C4E7DFB7839489C89757A66C5AD629EB997@AMSPEX01CL03.citrite.net>
Date: Wed, 18 Feb 2015 18:08:07 +0000
From: Felipe Franciosi <felipe.franciosi@...rix.com>
To: 'Konrad Rzeszutek Wilk' <konrad.wilk@...cle.com>,
Roger Pau Monne <roger.pau@...rix.com>,
Bob Liu <bob.liu@...cle.com>
CC: "xen-devel@...ts.xen.org" <xen-devel@...ts.xen.org>,
David Vrabel <david.vrabel@...rix.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"axboe@...com" <axboe@...com>,
"hch@...radead.org" <hch@...radead.org>,
"avanzini.arianna@...il.com" <avanzini.arianna@...il.com>
Subject: RE: [PATCH 04/10] xen/blkfront: separate ring information to an new
struct
> -----Original Message-----
> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@...cle.com]
> Sent: 18 February 2015 17:38
> To: Roger Pau Monne
> Cc: Bob Liu; xen-devel@...ts.xen.org; David Vrabel; linux-
> kernel@...r.kernel.org; Felipe Franciosi; axboe@...com; hch@...radead.org;
> avanzini.arianna@...il.com
> Subject: Re: [PATCH 04/10] xen/blkfront: separate ring information to an new
> struct
>
> On Wed, Feb 18, 2015 at 06:28:49PM +0100, Roger Pau Monné wrote:
> > El 15/02/15 a les 9.18, Bob Liu ha escrit:
> > > A ring is the representation of a hardware queue, this patch
> > > separate ring information from blkfront_info to an new struct
> > > blkfront_ring_info to make preparation for real multi hardware queues
> supporting.
> > >
> > > Signed-off-by: Arianna Avanzini <avanzini.arianna@...il.com>
> > > Signed-off-by: Bob Liu <bob.liu@...cle.com>
> > > ---
> > > drivers/block/xen-blkfront.c | 403
> > > +++++++++++++++++++++++--------------------
> > > 1 file changed, 218 insertions(+), 185 deletions(-)
> > >
> > > diff --git a/drivers/block/xen-blkfront.c
> > > b/drivers/block/xen-blkfront.c index 5a90a51..aaa4a0e 100644
> > > --- a/drivers/block/xen-blkfront.c
> > > +++ b/drivers/block/xen-blkfront.c
> > > @@ -102,23 +102,15 @@ MODULE_PARM_DESC(max, "Maximum amount
> of
> > > segments in indirect requests (default #define BLK_RING_SIZE
> > > __CONST_RING_SIZE(blkif, PAGE_SIZE)
> > >
> > > /*
> > > - * We have one of these per vbd, whether ide, scsi or 'other'.
> > > They
> > > - * hang in private_data off the gendisk structure. We may end up
> > > - * putting all kinds of interesting stuff here :-)
> > > + * Per-ring info.
> > > + * A blkfront_info structure can associate with one or more
> > > + blkfront_ring_info,
> > > + * depending on how many hardware queues supported.
> > > */
> > > -struct blkfront_info
> > > -{
> > > +struct blkfront_ring_info {
> > > spinlock_t io_lock;
> > > - struct mutex mutex;
> > > - struct xenbus_device *xbdev;
> > > - struct gendisk *gd;
> > > - int vdevice;
> > > - blkif_vdev_t handle;
> > > - enum blkif_state connected;
> > > int ring_ref;
> > > struct blkif_front_ring ring;
> > > unsigned int evtchn, irq;
> > > - struct request_queue *rq;
> > > struct work_struct work;
> > > struct gnttab_free_callback callback;
> > > struct blk_shadow shadow[BLK_RING_SIZE]; @@ -126,6 +118,22 @@
> > > struct blkfront_info
> > > struct list_head indirect_pages;
> > > unsigned int persistent_gnts_c;
> > > unsigned long shadow_free;
> > > + struct blkfront_info *info;
> >
> > AFAICT you seem to have a list of persistent grants, indirect pages
> > and a grant table callback for each ring, isn't this supposed to be
> > shared between all rings?
> >
> > I don't think we should be going down that route, or else we can hoard
> > a large amount of memory and grants.
>
> It does remove the lock that would have to be accessed by each ring thread to
> access those. Those values (grants) can be limited to be a smaller value such
> that the overall number is the same as it was with the previous version. As in:
> each ring has = MAX_GRANTS / nr_online_cpus().
> >
We should definitely be concerned with the amount of memory consumed on the backend for each plugged virtual disk. We have faced several problems in XenServer around this area before; it drastically affects VBD scalability per host.
This makes me think that all the persistent grants work was done as a workaround while we were facing several performance problems around concurrent grant un/mapping operations. Given all the recent submissions made around this (grant ops) area, is this something we should perhaps revisit and discuss whether we want to continue offering persistent grants as a feature?
Thanks,
Felipe
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists