[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 07 Jul 2008 13:51:12 +1000
From: Aaron Carroll <aaronc@....unsw.edu.au>
To: ngupta@...gle.com
CC: linux-kernel@...r.kernel.org, jens.axboe@...cle.com,
akpm@...ux-foundation.org
Subject: Re: [PATCH] Priorities in Anticipatory I/O scheduler
Hi Naveen,
I have a few observations after a quick read through your patch.
ngupta@...gle.com wrote:
> Modifications to the Anticipatory I/O scheduler to add multiple priority
> levels. It makes use of anticipation and batching in current
> anticipatory scheduler to implement priorities.
>
> - Minimizes the latency of highest priority level.
> - Low priority requests wait for high priority requests.
> - Higher priority request break any anticipating low priority request.
> - If single priority level is used the scheduler behaves as an
> anticipatory scheduler. So no change for existing users.
>
> With this change, it is possible for a latency sensitive job to coexist
> with background job.
>
> Other possible use of this patch is in context of I/O subsystem controller.
> It can add another dimension to the parameters controlling a particular cgroup.
> While we can easily divide b/w among existing croups, setting a bound on
> latency is not a feasible solution. Hence in context of storage devices
> bandwidth and priority can be two parameters controlling I/O.
>
> In this patch I have added a new class IOPRIO_CLASS_LATENCY to differentiate
> notion of absolute priority over existing uses of various time-slice based
> priority classes in cfq. Though internally within anticipatory scheduler all
> of them map to best-effort levels. Hence, one can also use various best-effort
> priority levels.
I don't see the point of this new priority class; I think ``latency sensitive''
is a reasonable definition of real-time. Especially since you don't actually
use it.
> @@ -21,6 +21,14 @@ config IOSCHED_AS
> deadline I/O scheduler, it can also be slower in some cases
> especially some database loads.
>
> +config IOPRIO_AS_MAX
> + int "Number of valid i/o priority levels"
> + depends on IOSCHED_AS
> + default "4"
> + help
> + This option controls number of priority levels in anticipatory
> + I/O scheduler.
Does this need to be configurable? There are two ``natural'' choices for this
value; the number of iopriorities (10), or the number of priority classes (3).
Why is any intermediate value useful?
> /*
> * rb tree support functions
> */
> -#define RQ_RB_ROOT(ad, rq) (&(ad)->sort_list[rq_is_sync((rq))])
> +static inline struct rb_root *rq_rb_root(struct as_data *ad,
> + struct request *rq)
> +{
> + return (&(ad)->prio_q[rq_prio_level(rq)].sort_list[rq_is_sync(rq)]);
> +}
This change (and the related ones below) is a separate patch which could
also be applied to deadline.
> @@ -996,6 +1074,31 @@ static void as_move_to_dispatch(struct a
> ad->nr_dispatched++;
> }
>
> +static unsigned int select_priority_level(struct as_data *ad)
> +{
> + unsigned int i, best_ioprio = 0, ioprio, found_alt = 0;
> +
> + for (ioprio = 0; ioprio < IOPRIO_AS_MAX; ioprio++) {
> + if (!as_has_request_at_priority(ad, ioprio)) {
> + continue;
> + }
Unnecessary braces.
> @@ -1022,21 +1126,32 @@ static int as_dispatch_request(struct re
> ad->changed_batch = 0;
> ad->new_batch = 0;
>
> - while (ad->next_rq[REQ_SYNC]) {
> - as_move_to_dispatch(ad, ad->next_rq[REQ_SYNC]);
> - dispatched++;
> - }
> - ad->last_check_fifo[REQ_SYNC] = jiffies;
> -
> - while (ad->next_rq[REQ_ASYNC]) {
> - as_move_to_dispatch(ad, ad->next_rq[REQ_ASYNC]);
> - dispatched++;
> + for (ioprio = 0; ioprio < IOPRIO_AS_MAX; ioprio++) {
> + while (ad->prio_q[ioprio].next_rq[REQ_SYNC]) {
> + as_move_to_dispatch(ad,
> + ad->prio_q[ioprio].next_rq[REQ_SYNC]);
> + dispatched++;
> + }
> + ad->last_check_fifo[REQ_SYNC] = jiffies;
> +
> + while (ad->prio_q[ioprio].next_rq[REQ_ASYNC]) {
> + as_move_to_dispatch(ad,
> + ad->prio_q[ioprio].next_rq[REQ_ASYNC]);
> + dispatched++;
> + }
> + ad->last_check_fifo[REQ_ASYNC] = jiffies;
> }
> - ad->last_check_fifo[REQ_ASYNC] = jiffies;
>
> return dispatched;
> }
>
> + ioprio = select_priority_level(ad);
> + if (ioprio >= IOPRIO_AS_MAX)
> + return 0;
Why should this ever happen?
> @@ -1049,14 +1164,16 @@ static int as_dispatch_request(struct re
> || ad->changed_batch)
> return 0;
>
> + changed_ioprio = (ad->batch_ioprio != ioprio)?1:0;
Redundant conditional.
> @@ -1216,9 +1341,39 @@ static void as_deactivate_request(struct
> static int as_queue_empty(struct request_queue *q)
> {
> struct as_data *ad = q->elevator->elevator_data;
> + unsigned short ioprio;
>
> - return list_empty(&ad->fifo_list[REQ_ASYNC])
> - && list_empty(&ad->fifo_list[REQ_SYNC]);
> + for (ioprio = 0; ioprio < IOPRIO_AS_MAX; ioprio++) {
> + if (as_has_request_at_priority(ad, ioprio))
> + return 0;
> + }
> + return 1;
> +}
> +
> +static unsigned short as_mapped_priority(unsigned short ioprio)
> +{
> + unsigned short class = IOPRIO_PRIO_CLASS(ioprio);
> + unsigned short data = IOPRIO_PRIO_DATA(ioprio);
> +
> + if (class == IOPRIO_CLASS_BE)
> + return ((data < IOPRIO_AS_MAX)? ioprio:
Doesn't this mean that requests in the BE class with prio level 0 will map to
the same queues as RT requests?
> + IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE,
> + (IOPRIO_AS_MAX - 1)));
> + else if (class == IOPRIO_CLASS_LATENCY)
> + return ((data < IOPRIO_AS_MAX)?
> + IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, data):
Likewise.
> + IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE,
> + (IOPRIO_AS_MAX - 1)));
> + else if (class == IOPRIO_CLASS_RT)
> + return IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, 0);
> + else if (class == IOPRIO_CLASS_IDLE)
> + return IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, (IOPRIO_AS_MAX - 1));
> + else if (class == IOPRIO_CLASS_NONE) {
> + return IOPRIO_AS_DEFAULT;
> + } else {
> + WARN_ON(1);
> + return IOPRIO_AS_DEFAULT;
> + }
It looks like you're mapping all ioprios to a prio level in the BE class, and using
that internally. It would be simpler to use integers in [0, IOPRIO_AS_MAX) internally,
and convert back to ``real'' ioprios where necessary; you do a lot of conversions...
This would also be better expressed as a switch statement.
> @@ -1351,10 +1512,20 @@ static void *as_init_queue(struct reques
> init_timer(&ad->antic_timer);
> INIT_WORK(&ad->antic_work, as_work_handler);
>
> - INIT_LIST_HEAD(&ad->fifo_list[REQ_SYNC]);
> - INIT_LIST_HEAD(&ad->fifo_list[REQ_ASYNC]);
> - ad->sort_list[REQ_SYNC] = RB_ROOT;
> - ad->sort_list[REQ_ASYNC] = RB_ROOT;
> + for (i = IOPRIO_AS_MAX - 1; i >= 0; i--) {
> + INIT_LIST_HEAD(&ad->prio_q[i].fifo_list[REQ_SYNC]);
> + INIT_LIST_HEAD(&ad->prio_q[i].fifo_list[REQ_ASYNC]);
> + ad->prio_q[i].sort_list[REQ_SYNC] = RB_ROOT;
> + ad->prio_q[i].sort_list[REQ_ASYNC] = RB_ROOT;
> + ad->prio_q[i].serviced = 0;
> + if (i == 0)
> + ad->prio_q[i].ioprio_wt = 100;
> + else if (i == 1)
> + ad->prio_q[i].ioprio_wt = 5;
> + else
> + ad->prio_q[i].ioprio_wt = 1;
This seems a bit arbitrary, and means IOPRIO_AS_MAX > 3 is useless unless the weights are
changed manually.
Thanks,
-- Aaron
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists