linux-kernel - Re: [PATCH 1/3] cfq-iosched: Improve time slice charging logic

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <x49mxtntikn.fsf@segfault.boston.devel.redhat.com>
Date:	Mon, 19 Jul 2010 14:47:20 -0400
From:	Jeff Moyer <jmoyer@...hat.com>
To:	Vivek Goyal <vgoyal@...hat.com>
Cc:	linux-kernel@...r.kernel.org, axboe@...nel.dk, nauman@...gle.com,
	dpshah@...gle.com, guijianfeng@...fujitsu.com, czoccolo@...il.com
Subject: Re: [PATCH 1/3] cfq-iosched: Improve time slice charging logic

Vivek Goyal <vgoyal@...hat.com> writes:

> - Currently in CFQ there are many situations where don't know how
>   much time slice has been consumed by a queue. For example, all
>   the random reader/writer queues where we don't idle on
>   individual queues and we expire the queue either immediately
>   after the request dispatch.
>
> - In this case time consumed by a queue is just a memory copy
>   operation. Actually time measurement is possible only if we
>   idle on a queue and allow dispatch from a queue for significant
>   amount of time.
>
> - As of today, in such cases we calculate the time since the
>   dispatch from the queue started and charge all that time.
>   Generally this rounds to 1 jiffy but in some cases it can
>   be more. For example, if we are driving high request queue
>   depth and driver is too busy and does not ask for new
>   requests for 8-10 jiffies. In such cases, the active queue
>   gets charged very unfairly.
>
> - So fundamentally, whole notion of charging for time slice
>   is valid only if we have been idling on the queue. Otherwise
>   in an NCQ queue, there might be other requests on the queue
>   and we can not do the time slice calculation.
>
> - This patch tweaks the slice charging logic a bit so that
>   in the cases where we can't know the amount of time, we
>   start charging in terms of number of requests dispatched
>   (IOPS). This practically switching CFQ fairness model to
>   fairness in terms of IOPS with slice_idle=0.
>
> - As of today this will primarily be useful only with
>   group_idle patches so that we get fairness in terms of
>   IOPS across groups. The idea is that on fast storage
>   one can run CFQ with slice_idle=0 and still get IO
>   controller working without losing too much of
>   throughput.

I'm not fluent in the cgroup code, my apologies for that.  However, just
trying to make sense of this is giving me a headache.  Now, in some
cases you are using IOPS *in place of* jiffies.  How are we to know
which is which and in what cases?

It sounds like this is addressing an important problem, but I'm having a
hard time picking out what that problem is.  Is this problem noticable
for competing sync-noidle workloads (competing between groups, that is)?
If not, then what?

Thanks,
Jeff

> Signed-off-by: Vivek Goyal <vgoyal@...hat.com>
> ---
>  block/cfq-iosched.c |   24 +++++++++++++++++++++---
>  1 files changed, 21 insertions(+), 3 deletions(-)
>
> diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
> index 7982b83..f44064c 100644
> --- a/block/cfq-iosched.c
> +++ b/block/cfq-iosched.c
> @@ -896,16 +896,34 @@ static inline unsigned int cfq_cfqq_slice_usage(struct cfq_queue *cfqq)
>  		 * if there are mutiple queues in the group, each can dispatch
>  		 * a single request on seeky media and cause lots of seek time
>  		 * and group will never know it.
> +		 *
> +		 * If drive is NCQ and we are driving deep queue depths, then
> +		 * it is not reasonable to charge the slice since dispatch
> +		 * started because this time will include time taken by all
> +		 * the other requests in the queue.
> +		 *
> +		 * Actually there is no reasonable way to know the disk time
> +		 * here and we need to come up with some approximation. If
> +		 * disk is non NCQ, we should be driving request queue depth
> +		 * 1, then charge for time since dispatch start and this will
> +		 * account for seek time properly on seeky media. If request
> +		 * queue depth is high, then charge for number of requests
> +		 * dispatched from the queue. This will sort of becoming
> +		 * charging in terms of IOPS.
>  		 */
> -		slice_used = max_t(unsigned, (jiffies - cfqq->dispatch_start),
> -					1);
> +		if (cfqq->cfqd->hw_tag == 0)
> +			slice_used = max_t(unsigned,
> +					(jiffies - cfqq->dispatch_start), 1);
> +		else
> +			slice_used = cfqq->slice_dispatch;
>  	} else {
>  		slice_used = jiffies - cfqq->slice_start;
>  		if (slice_used > cfqq->allocated_slice)
>  			slice_used = cfqq->allocated_slice;
>  	}
>  
> -	cfq_log_cfqq(cfqq->cfqd, cfqq, "sl_used=%u", slice_used);
> +	cfq_log_cfqq(cfqq->cfqd, cfqq, "sl_used=%u, sl_disp=%u", slice_used,
> +			cfqq->slice_dispatch);
>  	return slice_used;
>  }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/