[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <538FD933.5000202@kernel.dk>
Date: Wed, 04 Jun 2014 20:42:59 -0600
From: Jens Axboe <axboe@...nel.dk>
To: Shaohua Li <shli@...nel.org>
CC: Matias Bjørling <m@...rling.me>,
"Sam Bradshaw (sbradshaw)" <sbradshaw@...ron.com>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] block: per-cpu counters for in-flight IO accounting
On 2014-06-04 20:33, Shaohua Li wrote:
> On Wed, Jun 04, 2014 at 08:16:32PM -0600, Jens Axboe wrote:
>> On 2014-06-04 20:09, Shaohua Li wrote:
>>> On Wed, Jun 04, 2014 at 02:08:46PM -0600, Jens Axboe wrote:
>>>> On 06/04/2014 05:29 AM, Matias Bjørling wrote:
>>>>> It's in
>>>>>
>>>>> blk_io_account_start
>>>>> part_round_stats
>>>>> part_round_state_single
>>>>> part_in_flight
>>>>>
>>>>> I like the granularity idea.
>>>>
>>>> And similarly from blk_io_account_done() - which makes it even worse,
>>>> since it at both ends of the IO chain.
>>>
>>> But part_round_state_single is supposed to only call part_in_flight every
>>> jiffery. Maybe we need something below:
>>> 1. set part->stamp immediately
>>> 2. fixed granularity
>>> Untested though.
>>>
>>>
>>> diff --git a/block/blk-core.c b/block/blk-core.c
>>> index 40d6548..5f0acaa 100644
>>> --- a/block/blk-core.c
>>> +++ b/block/blk-core.c
>>> @@ -1270,17 +1270,19 @@ static void part_round_stats_single(int cpu, struct hd_struct *part,
>>> unsigned long now)
>>> {
>>> int inflight;
>>> + unsigned long old_stamp;
>>>
>>> - if (now == part->stamp)
>>> + if (time_before(now, part->stamp + msecs_to_jiffies(10)))
>>> return;
>>> + old_stamp = part->stamp;
>>> + part->stamp = now;
>>>
>>> inflight = part_in_flight(part);
>>> if (inflight) {
>>> __part_stat_add(cpu, part, time_in_queue,
>>> - inflight * (now - part->stamp));
>>> - __part_stat_add(cpu, part, io_ticks, (now - part->stamp));
>>> + inflight * (now - old_stamp));
>>> + __part_stat_add(cpu, part, io_ticks, (now - old_stamp));
>>> }
>>> - part->stamp = now;
>>> }
>>>
>>> /**
>>
>> It'd be a good improvement, and one we should be able to do without
>> screwing anything up. It'd be identical to anyone running at HZ==100
>> right now.
>>
>> So the above we can easily do, and arguably should just do. We wont
>> see real scaling in the IO stats path before we fixup the hd_struct
>> referencing as well, however.
>
> That's true. maybe a percpu_ref works here.
Maybe, but it would require more than a direct replacement. The
hd_struct stuff currently relies on things like atomic_inc_not_zero(),
which would not be cheap to do. And this does happen for every new IO,
so can't be amortized over time like the part stats rounding.
--
Jens Axboe
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists