[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <23904f641003111121i15550251v471ccf2446492ef5@mail.gmail.com>
Date: Thu, 11 Mar 2010 11:21:50 -0800
From: Manuel Benitez <rickyb@...gle.com>
To: Vivek Goyal <vgoyal@...hat.com>
Cc: Chad Talbott <ctalbott@...gle.com>,
Gui Jianfeng <guijianfeng@...fujitsu.com>,
Nauman Rafique <nauman@...gle.com>, jens.axboe@...cle.com,
linux-kernel@...r.kernel.org, Li Zefan <lizf@...fujitsu.com>
Subject: Re: [PATCH 1/2 V3] io-controller: Add a new interface "weight_device"
for IO-Controller
On a closely related topic, I've just recently made a change to one of
my branches that exposes the blkio.time and blkio.sectors information
for the root cgroup. These stats would not show because the major and
minor information for the root blkio_croup structures is zero. This
information is not available at the when the root blkio_cgroup
structures are instantiated, so they are left without major and minor
information.
I have a simple fix that updates the major and minor information for
the root structures at a later time. It looks something like this:
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index cd79be0..b34c952 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -956,6 +956,11 @@ cfq_find_alloc_cfqg(struct cfq_data *cfqd, struct cgroup *c
return NULL;;
cfqg = cfqg_of_blkg(blkiocg_lookup_group(blkcg, key));
+ if (cfqg && !cfqg->blkg.dev && bdi->dev && dev_name(bdi->dev)) {
+ sscanf(dev_name(bdi->dev), "%u:%u", &major, &minor);
+ cfqg->blkg.dev = MKDEV(major, minor);
+ goto done;
+ }
if (cfqg || !create)
goto done;
If folks think that this would be of interest, I can submit a formal
patch. If someone can suggest a better way to do it that doesn't
require extensive changes elsewhere, I'm open to working that up as
well.
-Ricky
On Wed, Mar 10, 2010 at 12:31 PM, Vivek Goyal <vgoyal@...hat.com> wrote:
> On Wed, Mar 10, 2010 at 01:03:36PM -0500, Vivek Goyal wrote:
>> On Wed, Mar 10, 2010 at 09:38:35AM -0800, Chad Talbott wrote:
>> > On Wed, Mar 10, 2010 at 7:30 AM, Vivek Goyal <vgoyal@...hat.com> wrote:
>> > > This still leaves the issue of reaching a gendisk object from request
>> > > queue. Looking into it.
>> >
>> > It looks like we have that pairing way back in blk_register_queue()
>> > which takes a gendisk. Is there any reason we don't hold onto the
>> > gendisk there? Eyeballing add_disk() and unlink_gendisk() seems to
>> > confirm that gendisk lifetime spans request_queue.
>> >
>>
>> Yes, looking at the code, it looks like gendisk and request_queue object's
>> lifetime is same and probably we can store a pointer to gendisk in
>> request_queue at blk_register_queue() time. And then use this pointer to
>> retrieve gendisk->disk_name to report stats.
>>
>
> Well, gendisk and request_queue have little different life span. Following
> seems to be the sequence a block driver follows.
>
> blk_init_queue()
> alloc_disk() and add_disk()
> device_removed
> del_gendisk()
> blk_cleanup_queue()
>
> So first we cleaup the gendisk structure and later driver calls to cleanup
> the request queue.
>
>> > Nauman and I were also wondering why blkio_group and blkio_policy_node
>> > store a dev_t, rather than a direct pointer to gendisk. dev_t seems
>> > more like a userspace<->kernel interface than an inside-the-kernel
>> > interface.
>> >
>>
>> blkio_policy_node currently can't store a pointer to gendisk because there
>> is no mechanism to call back into blkio if device is removed. So if we
>> implement something so that once device is removed, blkio layer gets a
>> callback and we cleanup any state/rules associated with that device, then
>> I think we should be able to store the pointer to gendisk.
>>
>> I am still trying to figure out how elevator/ioscheduler state is cleaned
>> up if a device is removed while some IO is happening to it.
>>
>
> So blk_cleanup_queue() will do this. That means few things.
>
> - We can't store pointers to gendisk in blkio_policy_node or blkio_group
> because gendisk might have gone away but request queue is still there.
> May be one can try saving a pointer and taking a reference, but I guess
> that becomes littles complicated.
>
> - If we are using disk name for rules and reporting stats, then we also
> need to make sure that these rules are cleared from cgroups once device
> has disappeared. Otherwise, following might happen.
>
> - Create a rule for sda (x,y) for cgroup test1. x,y are major and
> minor numbers.
> - sda goes away. Rules still remains in blkio cgroup.
> - Another device gets plugged in and i guess following can happen.
> - device name is different but dev_t is same as sda.
> - device name is same (sda) but device number is
> different.
>
> In both the cases a user will be confused with stale rules
> in cgroups.
>
> Cleaning up cgroups rules can get little complicated. I guess we need to
> create a function in blkio-cgroup.c to traverse through all the cgroups
> and cleanup any blkio_policy_nodes belonging to device going away.
>
> In a nutshell, it probably is doable. You are welcome to write a patch. At
> the same time I am not against deivce major/minor number based interface,
> because it keeps things little simple.
>
> Thanks
> Vivek
>
> -
>> OTOH, Gui, may be one can use blk_lookup_devt() to lookup the dev_t of a
>> device using the disk name (sda). I just noticed it while reading the
>> code.
>>
>> Thanks
>> Vivek
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists