[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130305214950.GE8235@phenom.dumpdata.com>
Date: Tue, 5 Mar 2013 16:49:50 -0500
From: Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>
To: Roger Pau Monné <roger.pau@...rix.com>
Cc: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"xen-devel@...ts.xen.org" <xen-devel@...ts.xen.org>
Subject: Re: [PATCH RFC 06/12] xen-blkback: implement LRU mechanism for
persistent grants
On Tue, Mar 05, 2013 at 07:10:04PM +0100, Roger Pau Monné wrote:
> On 04/03/13 21:10, Konrad Rzeszutek Wilk wrote:
> > On Thu, Feb 28, 2013 at 11:28:49AM +0100, Roger Pau Monne wrote:
> >> This mechanism allows blkback to change the number of grants
> >> persistently mapped at run time.
> >>
> >> The algorithm uses a simple LRU mechanism that removes (if needed) the
> >> persistent grants that have not been used since the last LRU run, or
> >> if all grants have been used it removes the first grants in the list
> >> (that are not in use).
> >>
> >> The algorithm has several parameters that can be tuned by the user
> >> from sysfs:
> >>
> >> * max_persistent_grants: maximum number of grants that will be
> >> persistently mapped.
> >> * lru_interval: minimum interval (in ms) at which the LRU should be
> >> run
> >> * lru_num_clean: number of persistent grants to remove when executing
> >> the LRU.
> >>
> >> Signed-off-by: Roger Pau Monné <roger.pau@...rix.com>
> >> Cc: Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>
> >> Cc: xen-devel@...ts.xen.org
> >> ---
> >> drivers/block/xen-blkback/blkback.c | 207 +++++++++++++++++++++++++++--------
> >> drivers/block/xen-blkback/common.h | 4 +
> >> drivers/block/xen-blkback/xenbus.c | 1 +
> >> 3 files changed, 166 insertions(+), 46 deletions(-)
> >
> > You also should add a Documentation/sysfs/ABI/stable/sysfs-bus-xen-backend
>
> OK
>
> >>
> >> diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
> >> index 415a0c7..c14b736 100644
> >> --- a/drivers/block/xen-blkback/blkback.c
> >> +++ b/drivers/block/xen-blkback/blkback.c
> >> @@ -63,6 +63,44 @@ static int xen_blkif_reqs = 64;
> >> module_param_named(reqs, xen_blkif_reqs, int, 0);
> >> MODULE_PARM_DESC(reqs, "Number of blkback requests to allocate");
> >>
> >> +/*
> >> + * Maximum number of grants to map persistently in blkback. For maximum
> >> + * performance this should be the total numbers of grants that can be used
> >> + * to fill the ring, but since this might become too high, specially with
> >> + * the use of indirect descriptors, we set it to a value that provides good
> >> + * performance without using too much memory.
> >> + *
> >> + * When the list of persistent grants is full we clean it using a LRU
> >> + * algorithm.
> >> + */
> >> +
> >> +static int xen_blkif_max_pgrants = 352;
> >
> > And a little blurb saying why 352.
>
> Yes, this is (as you probably already realized) RING_SIZE *
> BLKIF_MAX_SEGMENTS_PER_REQUEST
>
> >
> >> +module_param_named(max_persistent_grants, xen_blkif_max_pgrants, int, 0644);
> >> +MODULE_PARM_DESC(max_persistent_grants,
> >> + "Maximum number of grants to map persistently");
> >> +
> >> +/*
> >> + * The LRU mechanism to clean the lists of persistent grants needs to
> >> + * be executed periodically. The time interval between consecutive executions
> >> + * of the purge mechanism is set in ms.
> >> + */
> >> +
> >> +static int xen_blkif_lru_interval = 100;
> >
> > So every second? What is the benefit of having the user modify this? Would
> > it better if there was an watermark system in xen-blkfront to automatically
> > figure this out? (This could be a TODO of course)
>
> Every 100ms, so every 0.1 seconds. This can have an impact on
> performance as implemented right now (if we move the purge to a separate
> thread, it's not going to have such an impact), but anyway I feel we can
> let the user tune it.
Why not automatically tune it in the backend? So it can do this by itself?
>
> >> +module_param_named(lru_interval, xen_blkif_lru_interval, int, 0644);
> >> +MODULE_PARM_DESC(lru_interval,
> >> +"Execution interval (in ms) of the LRU mechanism to clean the list of persistent grants");
> >> +
> >> +/*
> >> + * When the persistent grants list is full we will remove unused grants
> >> + * from the list. The number of grants to be removed at each LRU execution
> >> + * can be set dynamically.
> >> + */
> >> +
> >> +static int xen_blkif_lru_num_clean = BLKIF_MAX_SEGMENTS_PER_REQUEST;
> >> +module_param_named(lru_num_clean, xen_blkif_lru_num_clean, int, 0644);
> >> +MODULE_PARM_DESC(lru_num_clean,
> >> +"Number of persistent grants to unmap when the list is full");
> >
> > Again, what does that mean to the system admin? Why would they need to modify
> > the contents of that value?
> >
>
> Well if you set the maximum number of grants to 1024 you might want to
> increase this to 64 maybe, or on the other hand, if you set the maximum
> number of grants to 10, you may wish to set this to 1, so I think it is
> indeed relevant for system admins.
So why not make this automatic? A value blkback can automatically
adjust as there are less or more grants. This of course does not have
to be part of this patch.
>
> > Now if this is a debug related one for developing, then this could all be
> > done in DebugFS.
> >
> >> +
> >> /* Run-time switchable: /sys/module/blkback/parameters/ */
> >> static unsigned int log_stats;
> >> module_param(log_stats, int, 0644);
> >> @@ -81,7 +119,7 @@ struct pending_req {
> >> unsigned short operation;
> >> int status;
> >> struct list_head free_list;
> >> - DECLARE_BITMAP(unmap_seg, BLKIF_MAX_SEGMENTS_PER_REQUEST);
> >> + struct persistent_gnt *persistent_gnts[BLKIF_MAX_SEGMENTS_PER_REQUEST];
> >> };
> >>
> >> #define BLKBACK_INVALID_HANDLE (~0)
> >> @@ -102,36 +140,6 @@ struct xen_blkbk {
> >> static struct xen_blkbk *blkbk;
> >>
> >> /*
> >> - * Maximum number of grant pages that can be mapped in blkback.
> >> - * BLKIF_MAX_SEGMENTS_PER_REQUEST * RING_SIZE is the maximum number of
> >> - * pages that blkback will persistently map.
> >> - * Currently, this is:
> >> - * RING_SIZE = 32 (for all known ring types)
> >> - * BLKIF_MAX_SEGMENTS_PER_REQUEST = 11
> >> - * sizeof(struct persistent_gnt) = 48
> >> - * So the maximum memory used to store the grants is:
> >> - * 32 * 11 * 48 = 16896 bytes
> >> - */
> >> -static inline unsigned int max_mapped_grant_pages(enum blkif_protocol protocol)
> >> -{
> >> - switch (protocol) {
> >> - case BLKIF_PROTOCOL_NATIVE:
> >> - return __CONST_RING_SIZE(blkif, PAGE_SIZE) *
> >> - BLKIF_MAX_SEGMENTS_PER_REQUEST;
> >> - case BLKIF_PROTOCOL_X86_32:
> >> - return __CONST_RING_SIZE(blkif_x86_32, PAGE_SIZE) *
> >> - BLKIF_MAX_SEGMENTS_PER_REQUEST;
> >> - case BLKIF_PROTOCOL_X86_64:
> >> - return __CONST_RING_SIZE(blkif_x86_64, PAGE_SIZE) *
> >> - BLKIF_MAX_SEGMENTS_PER_REQUEST;
> >> - default:
> >> - BUG();
> >> - }
> >> - return 0;
> >> -}
> >> -
> >> -
> >> -/*
> >> * Little helpful macro to figure out the index and virtual address of the
> >> * pending_pages[..]. For each 'pending_req' we have have up to
> >> * BLKIF_MAX_SEGMENTS_PER_REQUEST (11) pages. The seg would be from 0 through
> >> @@ -251,6 +259,76 @@ static void free_persistent_gnts(struct rb_root *root, unsigned int num)
> >> BUG_ON(num != 0);
> >> }
> >>
> >> +static int purge_persistent_gnt(struct rb_root *root, int num)
> >> +{
> >> + struct gnttab_unmap_grant_ref unmap[BLKIF_MAX_SEGMENTS_PER_REQUEST];
> >> + struct page *pages[BLKIF_MAX_SEGMENTS_PER_REQUEST];
> >> + struct persistent_gnt *persistent_gnt;
> >> + struct rb_node *n;
> >> + int ret, segs_to_unmap = 0;
> >> + int requested_num = num;
> >> + int preserve_used = 1;
> >
> > Boolean? And perhaps 'scan_dirty' ?
>
> Sure
>
> >
> >
> >> +
> >> + pr_debug("Requested the purge of %d persistent grants\n", num);
> >> +
> >> +purge_list:
> >
> > This could be written a bit differently to also run outside the xen_blkif_schedule
> > (so a new thread). This would require using the lock mechanism and converting
> > this big loop to two smaller loops:
> > 1) - one quick that holds the lock - to take the items of the list,
> > 2) second one to do the grant_set_unmap_op operations and all the heavy
> > free_xenballooned_pages call.
>
> Yes, I could add a list_head to persistent_gnt, so we can take them out
> of the red-black tree and queue them in a list to be processed (unmap +
> free) after we have looped thought the list, without holding the lock.
>
> >
> > .. As this function ends up (presumarily?) causing xen_blkif_schedule to be doing
> > this for some time every second. Irregardless of how utilized the ring is - so
> > if we are 100% busy - we should not need to call this function. But if we do,
> > then we end up walking the persistent_gnt twice - once with preserve_used set
> > to true, and the other with it set to false.
> >
> > We don't really want that - so is there a way for xen_blkif_schedule to
> > do a quick determintion of this caliber:
> >
> >
> > if (RING_HAS_UNCONSUMED_REQUESTS(x) >= some_value)
> > wait_up(blkif->purgarator)
>
> It's not possible to tell if all grants will be in use just by looking
> at the number of active requests, since all requests might just be using
> one segment, and thus the list of persistent grants could be purged
> without problems. We could keep a count of the number of active grants
> at each time and use that to check if we can kick the purge or not.
>
> if (grants_in_use > (persistent_gnt_c - num_purge))
> wait(...)
Sure.
>
> > And the thread would just sit there until kicked in action?
>
> And when a request frees some grants it could be kicked back to action.
OK.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists