linux-kernel - Re: [PATCH] block: per-cpu counters for in-flight IO accounting

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <538FD933.5000202@kernel.dk>
Date:	Wed, 04 Jun 2014 20:42:59 -0600
From:	Jens Axboe <axboe@...nel.dk>
To:	Shaohua Li <shli@...nel.org>
CC:	Matias Bjørling <m@...rling.me>,
	"Sam Bradshaw (sbradshaw)" <sbradshaw@...ron.com>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] block: per-cpu counters for in-flight IO accounting

On 2014-06-04 20:33, Shaohua Li wrote:
> On Wed, Jun 04, 2014 at 08:16:32PM -0600, Jens Axboe wrote:
>> On 2014-06-04 20:09, Shaohua Li wrote:
>>> On Wed, Jun 04, 2014 at 02:08:46PM -0600, Jens Axboe wrote:
>>>> On 06/04/2014 05:29 AM, Matias Bjørling wrote:
>>>>> It's in
>>>>>
>>>>> blk_io_account_start
>>>>>    part_round_stats
>>>>>      part_round_state_single
>>>>>        part_in_flight
>>>>>
>>>>> I like the granularity idea.
>>>>
>>>> And similarly from blk_io_account_done() - which makes it even worse,
>>>> since it at both ends of the IO chain.
>>>
>>> But part_round_state_single is supposed to only call part_in_flight every
>>> jiffery. Maybe we need something below:
>>> 1. set part->stamp immediately
>>> 2. fixed granularity
>>> Untested though.
>>>
>>>
>>> diff --git a/block/blk-core.c b/block/blk-core.c
>>> index 40d6548..5f0acaa 100644
>>> --- a/block/blk-core.c
>>> +++ b/block/blk-core.c
>>> @@ -1270,17 +1270,19 @@ static void part_round_stats_single(int cpu, struct hd_struct *part,
>>>   				    unsigned long now)
>>>   {
>>>   	int inflight;
>>> +	unsigned long old_stamp;
>>>
>>> -	if (now == part->stamp)
>>> +	if (time_before(now, part->stamp + msecs_to_jiffies(10)))
>>>   		return;
>>> +	old_stamp = part->stamp;
>>> +	part->stamp = now;
>>>
>>>   	inflight = part_in_flight(part);
>>>   	if (inflight) {
>>>   		__part_stat_add(cpu, part, time_in_queue,
>>> -				inflight * (now - part->stamp));
>>> -		__part_stat_add(cpu, part, io_ticks, (now - part->stamp));
>>> +				inflight * (now - old_stamp));
>>> +		__part_stat_add(cpu, part, io_ticks, (now - old_stamp));
>>>   	}
>>> -	part->stamp = now;
>>>   }
>>>
>>>   /**
>>
>> It'd be a good improvement, and one we should be able to do without
>> screwing anything up. It'd be identical to anyone running at HZ==100
>> right now.
>>
>> So the above we can easily do, and arguably should just do. We wont
>> see real scaling in the IO stats path before we fixup the hd_struct
>> referencing as well, however.
>
> That's true. maybe a percpu_ref works here.

Maybe, but it would require more than a direct replacement. The 
hd_struct stuff currently relies on things like atomic_inc_not_zero(), 
which would not be cheap to do. And this does happen for every new IO, 
so can't be amortized over time like the part stats rounding.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/