lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <g2haf41c7c41004121137x955b8ecfpac848e477657db12@mail.gmail.com>
Date:	Mon, 12 Apr 2010 11:37:18 -0700
From:	Divyesh Shah <dpshah@...gle.com>
To:	Vivek Goyal <vgoyal@...hat.com>
Cc:	jens.axboe@...cle.com, linux-kernel@...r.kernel.org,
	nauman@...gle.com, ctalbott@...gle.com
Subject: Re: [PATCH 3/3] blkio: Add more debug-only per-cgroup stats

On Mon, Apr 12, 2010 at 7:04 AM, Vivek Goyal <vgoyal@...hat.com> wrote:
> On Fri, Apr 09, 2010 at 05:09:07PM -0700, Divyesh Shah wrote:
>> On Fri, Apr 9, 2010 at 12:21 PM, Vivek Goyal <vgoyal@...hat.com> wrote:
>> > On Thu, Apr 08, 2010 at 09:15:35PM -0700, Divyesh Shah wrote:
>> >> 1) group_wait_time - This is the amount of time the cgroup had to wait to get a
>> >> á timeslice for one of its queues from when it became busy, i.e., went from 0
>> >> á to 1 request queued. This is different from the io_wait_time which is the
>> >> á cumulative total of the amount of time spent by each IO in that cgroup waiting
>> >> á in the scheduler queue. This stat is a great way to find out any jobs in the
>> >> á fleet that are being starved or waiting for longer than what is expected (due
>> >> á to an IO controller bug or any other issue).
>> >> 2) empty_time - This is the amount of time a cgroup spends w/o any pending
>> >> á árequests. This stat is useful when a job does not seem to be able to use its
>> >> á áassigned disk share by helping check if that is happening due to an IO
>> >> á ácontroller bug or because the job is not submitting enough IOs.
>> >
>> > Divyesh,
>> >
>> > This empty_time looks like a redundant stats. One can just look at
>> > group_wait_time. If group_wait_time is ánot increasing, then application
>> > in cgroup is not doing any IO. If it is increasing then one can look
>> > at so many other stas like blkio.time, blkio.sectors etc to find out
>> > if cgroup is making any progress or not.
>>
>> Vivek,
>>         group_wait_time could not be increasing even when the cgroup
>> is dominating the disk by sending tons of IO all the time and you
>> can't conclude that the cgroup is not doing any IO.
>
> If group is dominating the disk, then blkio.time and blkio.sectors and
> other stats are increasing continuously and we know group is being served
> which in turn implies that application is sending tons of IO.
>
>> Even when it is
>> increasing, any of the other stats don't give what we are looking for
>> with this stat.. which is how busy is the application really able to
>> keep its IO queues over a given period of time.
>
> will blkio.avg_queue_size not tell you how an application is keeping
> a group busy over a period of time?

Its an indicator of how deep is the application able to drive its IO
queues but samples are taken only on timeslice start so it doesn't
cover the empty case.

> When group_wait time is increasing (that means applicatio is doing IO but
> group has not yet been scheduled in), then will blkio.io_queued will
> give will give you a good idea how busy a group is?
It will tell you how deep the IO queue is.

>
> If due to IO controller bug, group is not being scheduled, then
> avg_queue_size will not have enough samples, but blkio.io_queued will
> still be increasing or will be a big number and will tell you that
> group is busy but is not making any progress.
>
> But I do realize that rest of the stats don't exactly tell how long
> group was empty. So lets keep it as you seem to be finding it useful. Also
> avg_queue_size doe not account for time you are not doing any IO.

Thanks! Yes you're right. I wanted to avoid any periodic timer based
accounting for all cgroups and only do event based accounting.

>
> [..]
>> >> ávoid blkiocg_update_set_active_queue_stats(struct blkio_group *blkg)
>> >> á{
>> >> á á á unsigned long flags;
>> >> @@ -116,9 +186,14 @@ void blkiocg_update_set_active_queue_stats(struct blkio_group *blkg)
>> >> á á á á á á á á á á á stats->stat_arr[BLKIO_STAT_QUEUED][BLKIO_STAT_READ] +
>> >> á á á á á á á á á á á stats->stat_arr[BLKIO_STAT_QUEUED][BLKIO_STAT_WRITE];
>> >> á á á stats->avg_queue_size_samples++;
>> >> + á á blkio_update_group_wait_time(stats);
>> >
>> > Will cfq_choose_cfqg() be a better place to call this function. Why should
>> > we call it when an active queue is set. Instead when a group is chosen
>> > to dispatch request from, we can update the wait time.
>>
>> Does that cover pre-emption cases too?
>>
>
> That's a good point. In preemption path we don't use cfq_choose_cfqg(). So
> it is fine.
>
> [..]
>> >> ástruct blkio_cgroup {
>> >> á á á struct cgroup_subsys_state css;
>> >> á á á unsigned int weight;
>> >> @@ -74,6 +84,21 @@ struct blkio_group_stats {
>> >> á á á uint64_t avg_queue_size_samples;
>> >> á á á /* How many times this group has been removed from service tree */
>> >> á á á unsigned long dequeue;
>> >> +
>> >> + á á /* Total time spent waiting for it to be assigned a timeslice. */
>> >> + á á uint64_t group_wait_time;
>> >> + á á uint64_t start_group_wait_time;
>> >> +
>> >> + á á /* Time spent idling for this blkio_group */
>> >> + á á uint64_t idle_time;
>> >> + á á uint64_t start_idle_time;
>> >> + á á /*
>> >> + á á á* Total time when we have requests queued and do not contain the
>> >> + á á á* current active queue.
>> >> + á á á*/
>> >> + á á uint64_t empty_time;
>> >> + á á uint64_t start_empty_time;
>> >> + á á uint16_t flags;
>> >
>> > Does this flags has to be inside blkio_group_stats? May be a better
>> > place is inside blkio_group as it represents the blkio_group status.
>> > That's a differnet thing that as of today all three flags are helping out
>> > only for stats purposes but in future we could add more flags?
>>
>> I agree with you in principle. Would it make sense to move this to
>> blkg when we have a use-case for it beyond stats?
>>
>
> Ok, that's fine with me.
>
> [..]
>> >> ástatic void cfq_group_served(struct cfq_data *cfqd, struct cfq_group *cfqg,
>> >> - á á á á á á á á á á á á á á struct cfq_queue *cfqq)
>> >> + á á á á á á á á á á á á á á struct cfq_queue *cfqq, bool forced)
>> >> á{
>> >> á á á struct cfq_rb_root *st = &cfqd->grp_service_tree;
>> >> á á á unsigned int used_sl, charge_sl;
>> >> @@ -916,6 +916,7 @@ static void cfq_group_served(struct cfq_data *cfqd, struct cfq_group *cfqg,
>> >> á á á cfq_log_cfqg(cfqd, cfqg, "served: vt=%llu min_vt=%llu", cfqg->vdisktime,
>> >> á á á á á á á á á á á á á á á á á á á st->min_vdisktime);
>> >> á á á blkiocg_update_timeslice_used(&cfqg->blkg, used_sl);
>> >> + á á blkiocg_set_start_empty_time(&cfqg->blkg, forced);
>> >
>> > Why this needs to be called from CFQ? Can't we just internally call this
>> > from blkiocg_update_request_remove_stats() internally. So when we remove
>> > a request, we update the stats in blkio. That time blkio can checek if
>> > group became empty and start the time?
>>
>> >From Documentation/cgroups/blkio-controller.txt:
>> This is the amount of time a cgroup spends without any pending
>> requests when not being served, i.e., it does not include any time
>> spent idling for one of the queues of the cgroup.
>>
>> Calling this from remove_stats will add the idle time in here too.
>
> Ok, so to get a real sense of how long a group was empty, I need to add up
> empty_time and idle_time.

Yes that is correct.

>
> Thanks
> Vivek
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ