[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100406151621.GB3029@redhat.com>
Date: Tue, 6 Apr 2010 11:16:21 -0400
From: Vivek Goyal <vgoyal@...hat.com>
To: Divyesh Shah <dpshah@...gle.com>
Cc: jens.axboe@...cle.com, linux-kernel@...r.kernel.org,
nauman@...gle.com, ctalbott@...gle.com
Subject: Re: [PATCH 2/3][v3] blkio: Add io controller stats like
On Mon, Apr 05, 2010 at 08:37:01PM -0700, Divyesh Shah wrote:
> - io_service_time (the time between request dispatch and completion for IOs
> in the cgroup)
> - io_wait_time (the time spent waiting in the IO shceduler queues before
> getting serviced)
> - io_serviced (number of IOs serviced from this blkio_group)
> - io_service_bytes (Number of bytes served for this cgroup)
>
> These stats are accumulated per operation type helping us to distinguish between
> read and write, and sync and async IO. This patch does not increment any of
> these stats.
>
> Signed-off-by: Divyesh Shah<dpshah@...gle.com>
> ---
>
> Documentation/cgroups/blkio-controller.txt | 40 +++++++
> block/blk-cgroup.c | 166 +++++++++++++++++++++++++---
> block/blk-cgroup.h | 60 ++++++++--
> block/cfq-iosched.c | 3 -
> 4 files changed, 239 insertions(+), 30 deletions(-)
>
> diff --git a/Documentation/cgroups/blkio-controller.txt b/Documentation/cgroups/blkio-controller.txt
> index 630879c..087925b 100644
> --- a/Documentation/cgroups/blkio-controller.txt
> +++ b/Documentation/cgroups/blkio-controller.txt
> @@ -77,7 +77,6 @@ Details of cgroup files
> =======================
> - blkio.weight
> - Specifies per cgroup weight.
> -
> Currently allowed range of weights is from 100 to 1000.
>
> - blkio.time
> @@ -92,6 +91,41 @@ Details of cgroup files
> third field specifies the number of sectors transferred by the
> group to/from the device.
>
> +- blkio.io_service_bytes
> + - Number of bytes transferred to/from the disk by the group. These
> + are further divided by the type of operation - read or write, sync
> + or async. First two fields specify the major and minor number of the
> + device, third field specifies the operation type and the fourth field
> + specifies the number of bytes.
> +
> +- blkio.io_serviced
> + - Number of IOs completed to/from the disk by the group. These
> + are further divided by the type of operation - read or write, sync
> + or async. First two fields specify the major and minor number of the
> + device, third field specifies the operation type and the fourth field
> + specifies the number of IOs.
> +
Hi Divyesh,
V3 looks much better. Just couple of minor nits.
> +- blkio.io_service_time
> + - Total amount of time between request dispatch and request completion
> + for the IOs done by this cgroup. This is in nanoseconds to make it
> + meaningful for flash devices too. For devices with queue depth of 1,
> + this time represents the actual service time. When queue_depth > 1,
> + that is no longer true as requests may be served out of order.
Like io_wait_time, can you also mention here that this time is cumulative
time and on NCQ disks it can be more than actual time elapsed.
I did a quick run with your patches. I ran a sequential workload for 30s
in two groups of weight 100 and 200. Following is the output of
blkio.io_service_time (I have kept stats for only disk 253:3 in output).
# cat test1/blkio.io_service_time
253:3 Read 18019970625
253:3 Write 0
253:3 Sync 18019970625
253:3 Async 0
253:3 Total 18019970625
# cat test2/blkio.io_service_time
253:3 Read 35479070171
253:3 Write 0
253:3 Sync 35479070171
253:3 Async 0
This storage supports NCQ. We see that though I ran test only for
30 seconds, total service ime for cgroup is close to 18+35=53 seconds.
> + This time is further divided by the type of operation -
> + read or write, sync or async. First two fields specify the major and
> + minor number of the device, third field specifies the operation type
> + and the fourth field specifies the io_service_time in ns.
> +
> +- blkio.io_wait_time
> + - Total amount of time the IO spent waiting in the scheduler queues for
> + service. This can be greater than the total time elapsed since it is
> + cumulative io_wait_time for all IOs. This is in nanoseconds to make it
> + meaningful for flash devices too. This time is further divided by the
> + type of operation - read or write, sync or async. First two fields
> + specify the major and minor number of the device, third field
> + specifies the operation type and the fourth field specifies the
> + io_wait_time in ns.
> +
> - blkio.dequeue
> - Debugging aid only enabled if CONFIG_DEBUG_CFQ_IOSCHED=y. This
> gives the statistics about how many a times a group was dequeued
> @@ -99,6 +133,10 @@ Details of cgroup files
> and minor number of the device and third field specifies the number
> of times a group was dequeued from a particular device.
>
> +- blkio.reset_stats
> + - Writing an int to this file will result in resetting all the stats
> + for that cgroup.
> +
Personally, I like adding a separate file to reset the stats. Now one does
not get surprised by the fact that writting to blkio.io_service_time, also
reset rest of the stats.
> CFQ sysfs tunable
> =================
> /sys/block/<disk>/queue/iosched/group_isolation
> diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
> index 5be3981..d585a05 100644
> --- a/block/blk-cgroup.c
> +++ b/block/blk-cgroup.c
> @@ -17,6 +17,8 @@
> #include <linux/err.h>
> #include "blk-cgroup.h"
>
> +#define MAX_KEY_LEN 100
> +
> static DEFINE_SPINLOCK(blkio_list_lock);
> static LIST_HEAD(blkio_list);
>
> @@ -55,12 +57,21 @@ struct blkio_cgroup *cgroup_to_blkio_cgroup(struct cgroup *cgroup)
> }
> EXPORT_SYMBOL_GPL(cgroup_to_blkio_cgroup);
>
> -void blkiocg_update_blkio_group_stats(struct blkio_group *blkg,
> - unsigned long time)
> +void blkio_group_init(struct blkio_group *blkg)
> +{
> + spin_lock_init(&blkg->stats_lock);
> +}
> +EXPORT_SYMBOL_GPL(blkio_group_init);
> +
> +void blkiocg_update_timeslice_used(struct blkio_group *blkg, unsigned long time)
> {
> - blkg->time += time;
> + unsigned long flags;
> +
> + spin_lock_irqsave(&blkg->stats_lock, flags);
> + blkg->stats.time += time;
> + spin_unlock_irqrestore(&blkg->stats_lock, flags);
> }
> -EXPORT_SYMBOL_GPL(blkiocg_update_blkio_group_stats);
> +EXPORT_SYMBOL_GPL(blkiocg_update_timeslice_used);
>
> void blkiocg_add_blkio_group(struct blkio_cgroup *blkcg,
> struct blkio_group *blkg, void *key, dev_t dev)
> @@ -170,13 +181,107 @@ blkiocg_weight_write(struct cgroup *cgroup, struct cftype *cftype, u64 val)
> return 0;
> }
>
> -#define SHOW_FUNCTION_PER_GROUP(__VAR) \
> +static int
> +blkiocg_reset_write(struct cgroup *cgroup, struct cftype *cftype, u64 val)
> +{
I guess we can rename this function to blkiocg_reset_stats().
[..]
> @@ -217,16 +331,36 @@ struct cftype blkio_files[] = {
> },
> {
> .name = "time",
> - .read_seq_string = blkiocg_time_read,
> + .read_map = blkiocg_time_read,
> },
> {
> .name = "sectors",
> - .read_seq_string = blkiocg_sectors_read,
> + .read_map = blkiocg_sectors_read,
> + },
> + {
> + .name = "io_service_bytes",
> + .read_map = blkiocg_io_service_bytes_read,
> + },
> + {
> + .name = "io_serviced",
> + .read_map = blkiocg_io_serviced_read,
> + },
> + {
> + .name = "io_service_time",
> + .read_map = blkiocg_io_service_time_read,
> + },
> + {
> + .name = "io_wait_time",
> + .read_map = blkiocg_io_wait_time_read,
> + },
> + {
> + .name = "reset_stats",
> + .write_u64 = blkiocg_reset_write,
use blkiocg_reset_stats?
Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists