[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090513171734.GA18371@redhat.com>
Date: Wed, 13 May 2009 13:17:34 -0400
From: Vivek Goyal <vgoyal@...hat.com>
To: Gui Jianfeng <guijianfeng@...fujitsu.com>
Cc: nauman@...gle.com, dpshah@...gle.com, lizf@...fujitsu.com,
mikew@...gle.com, fchecconi@...il.com, paolo.valente@...more.it,
jens.axboe@...cle.com, ryov@...inux.co.jp, fernando@....ntt.co.jp,
s-uchida@...jp.nec.com, taka@...inux.co.jp, jmoyer@...hat.com,
dhaval@...ux.vnet.ibm.com, balbir@...ux.vnet.ibm.com,
linux-kernel@...r.kernel.org,
containers@...ts.linux-foundation.org, righi.andrea@...il.com,
agk@...hat.com, dm-devel@...hat.com, snitzer@...hat.com,
m-ikeda@...jp.nec.com, akpm@...ux-foundation.org
Subject: Re: [PATCH] IO Controller: Add per-device weight and ioprio_class
handling
On Wed, May 13, 2009 at 10:00:21AM +0800, Gui Jianfeng wrote:
> Hi Vivek,
>
> This patch enables per-cgroup per-device weight and ioprio_class handling.
> A new cgroup interface "policy" is introduced. You can make use of this
> file to configure weight and ioprio_class for each device in a given cgroup.
> The original "weight" and "ioprio_class" files are still available. If you
> don't do special configuration for a particular device, "weight" and
> "ioprio_class" are used as default values in this device.
>
> You can use the following format to play with the new interface.
> #echo DEV:weight:ioprio_class > /patch/to/cgroup/policy
> weight=0 means removing the policy for DEV.
>
> Examples:
> Configure weight=300 ioprio_class=2 on /dev/hdb in this cgroup
> # echo /dev/hdb:300:2 > io.policy
> # cat io.policy
> dev weight class
> /dev/hdb 300 2
>
> Configure weight=500 ioprio_class=1 on /dev/hda in this cgroup
> # echo /dev/hda:500:1 > io.policy
> # cat io.policy
> dev weight class
> /dev/hda 500 1
> /dev/hdb 300 2
>
> Remove the policy for /dev/hda in this cgroup
> # echo /dev/hda:0:1 > io.policy
> # cat io.policy
> dev weight class
> /dev/hdb 300 2
>
> Signed-off-by: Gui Jianfeng <guijianfeng@...fujitsu.com>
> ---
> block/elevator-fq.c | 239 +++++++++++++++++++++++++++++++++++++++++++++++++-
> block/elevator-fq.h | 11 +++
> 2 files changed, 245 insertions(+), 5 deletions(-)
>
> diff --git a/block/elevator-fq.c b/block/elevator-fq.c
> index 69435ab..7c95d55 100644
> --- a/block/elevator-fq.c
> +++ b/block/elevator-fq.c
> @@ -12,6 +12,9 @@
> #include "elevator-fq.h"
> #include <linux/blktrace_api.h>
> #include <linux/biotrack.h>
> +#include <linux/seq_file.h>
> +#include <linux/genhd.h>
> +
>
> /* Values taken from cfq */
> const int elv_slice_sync = HZ / 10;
> @@ -1045,12 +1048,30 @@ struct io_group *io_lookup_io_group_current(struct request_queue *q)
> }
> EXPORT_SYMBOL(io_lookup_io_group_current);
>
> -void io_group_init_entity(struct io_cgroup *iocg, struct io_group *iog)
> +static struct policy_node *policy_search_node(const struct io_cgroup *iocg,
> + void *key);
> +
> +void io_group_init_entity(struct io_cgroup *iocg, struct io_group *iog,
> + void *key)
> {
> struct io_entity *entity = &iog->entity;
> + struct policy_node *pn;
> +
> + spin_lock_irq(&iocg->lock);
> + pn = policy_search_node(iocg, key);
> + if (pn) {
> + entity->weight = pn->weight;
> + entity->new_weight = pn->weight;
> + entity->ioprio_class = pn->ioprio_class;
> + entity->new_ioprio_class = pn->ioprio_class;
> + } else {
> + entity->weight = iocg->weight;
> + entity->new_weight = iocg->weight;
> + entity->ioprio_class = iocg->ioprio_class;
> + entity->new_ioprio_class = iocg->ioprio_class;
> + }
> + spin_unlock_irq(&iocg->lock);
>
I think we need to use spin_lock_irqsave() and spin_lock_irqrestore()
version above because it can be called with request queue lock held and we
don't want to enable the interrupts unconditionally here.
I hit following lock validator warning.
[ 81.521242] =================================
[ 81.522127] [ INFO: inconsistent lock state ]
[ 81.522127] 2.6.30-rc4-ioc #47
[ 81.522127] ---------------------------------
[ 81.522127] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
[ 81.522127] io-group-bw-tes/4138 [HC0[0]:SC0[0]:HE1:SE1] takes:
[ 81.522127] (&q->__queue_lock){+.?...}, at: [<ffffffff811d7b2e>] __make_request+0x35/0x396
[ 81.522127] {IN-SOFTIRQ-W} state was registered at:
[ 81.522127] [<ffffffffffffffff>] 0xffffffffffffffff
[ 81.522127] irq event stamp: 1006
[ 81.522127] hardirqs last enabled at (1005): [<ffffffff810c1198>] kmem_cache_alloc+0x9d/0x105
[ 81.522127] hardirqs last disabled at (1006): [<ffffffff8150343f>] _spin_lock_irq+0x12/0x3e
[ 81.522127] softirqs last enabled at (286): [<ffffffff81042039>] __do_softirq+0x17a/0x187
[ 81.522127] softirqs last disabled at (271): [<ffffffff8100ccfc>] call_softirq+0x1c/0x34
[ 81.522127]
[ 81.522127] other info that might help us debug this:
[ 81.522127] 3 locks held by io-group-bw-tes/4138:
[ 81.522127] #0: (&type->i_mutex_dir_key#4){+.+.+.}, at: [<ffffffff810cfd2c>] do_lookup+0x82/0x15f
[ 81.522127] #1: (&q->__queue_lock){+.?...}, at: [<ffffffff811d7b2e>] __make_request+0x35/0x396
[ 81.522127] #2: (rcu_read_lock){.+.+..}, at: [<ffffffff811e55bb>] __rcu_read_lock+0x0/0x30
[ 81.522127]
[ 81.522127] stack backtrace:
[ 81.522127] Pid: 4138, comm: io-group-bw-tes Not tainted 2.6.30-rc4-ioc #47
[ 81.522127] Call Trace:
[ 81.522127] [<ffffffff8105edad>] valid_state+0x17c/0x18f
[ 81.522127] [<ffffffff8105eb8a>] ? check_usage_backwards+0x0/0x52
[ 81.522127] [<ffffffff8105ee9b>] mark_lock+0xdb/0x1ff
[ 81.522127] [<ffffffff8105f00c>] mark_held_locks+0x4d/0x6b
[ 81.522127] [<ffffffff8150331a>] ? _spin_unlock_irq+0x2b/0x31
[ 81.522127] [<ffffffff8105f13e>] trace_hardirqs_on_caller+0x114/0x138
[ 81.522127] [<ffffffff8105f16f>] trace_hardirqs_on+0xd/0xf
[ 81.522127] [<ffffffff8150331a>] _spin_unlock_irq+0x2b/0x31
[ 81.522127] [<ffffffff811e5534>] ? io_group_init_entity+0x2a/0xb1
[ 81.522127] [<ffffffff811e5597>] io_group_init_entity+0x8d/0xb1
[ 81.522127] [<ffffffff811e688e>] ? io_group_chain_alloc+0x49/0x167
[ 81.522127] [<ffffffff811e68fe>] io_group_chain_alloc+0xb9/0x167
[ 81.522127] [<ffffffff811e6a04>] io_find_alloc_group+0x58/0x85
[ 81.522127] [<ffffffff811e6aec>] io_get_io_group+0x6e/0x94
[ 81.522127] [<ffffffff811e6d8c>] io_group_get_request_list+0x10/0x21
[ 81.522127] [<ffffffff811d7021>] blk_get_request_list+0x9/0xb
[ 81.522127] [<ffffffff811d7ab0>] get_request_wait+0x132/0x17b
[ 81.522127] [<ffffffff811d7dc1>] __make_request+0x2c8/0x396
[ 81.522127] [<ffffffff811d6806>] generic_make_request+0x1f2/0x28c
[ 81.522127] [<ffffffff810e9ee7>] ? bio_init+0x18/0x32
[ 81.522127] [<ffffffff811d8019>] submit_bio+0xb1/0xbc
[ 81.522127] [<ffffffff810e61c1>] submit_bh+0xfb/0x11e
[ 81.522127] [<ffffffff8111f554>] __ext3_get_inode_loc+0x263/0x2c2
[ 81.522127] [<ffffffff81122286>] ext3_iget+0x69/0x399
[ 81.522127] [<ffffffff81125b92>] ext3_lookup+0x81/0xd0
[ 81.522127] [<ffffffff810cfd81>] do_lookup+0xd7/0x15f
[ 81.522127] [<ffffffff810d15c2>] __link_path_walk+0x319/0x67f
[ 81.522127] [<ffffffff810d1976>] path_walk+0x4e/0x97
[ 81.522127] [<ffffffff810d1b48>] do_path_lookup+0x115/0x15a
[ 81.522127] [<ffffffff810d0fec>] ? getname+0x19d/0x1bf
[ 81.522127] [<ffffffff810d252a>] user_path_at+0x52/0x8c
[ 81.522127] [<ffffffff811ee668>] ? __up_read+0x1c/0x8c
[ 81.522127] [<ffffffff8150379b>] ? _spin_unlock_irqrestore+0x3f/0x47
[ 81.522127] [<ffffffff8105f13e>] ? trace_hardirqs_on_caller+0x114/0x138
[ 81.522127] [<ffffffff810cb6c1>] vfs_fstatat+0x35/0x62
[ 81.522127] [<ffffffff811ee6d0>] ? __up_read+0x84/0x8c
[ 81.522127] [<ffffffff810cb7bb>] vfs_stat+0x16/0x18
[ 81.522127] [<ffffffff810cb7d7>] sys_newstat+0x1a/0x34
[ 81.522127] [<ffffffff8100c5e9>] ? retint_swapgs+0xe/0x13
[ 81.522127] [<ffffffff8105f13e>] ? trace_hardirqs_on_caller+0x114/0x138
[ 81.522127] [<ffffffff8107f771>] ? audit_syscall_entry+0xfe/0x12a
[ 81.522127] [<ffffffff8100bb2b>] system_call_fastpath+0x16/0x1b
Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists