linux-kernel - Re: [PATCH v6 3/3] blk-cgroup: Optimize blkcg_rstat

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <9272d9ba-1b34-991b-78b5-beca5e170a23@redhat.com>
Date:   Sun, 5 Jun 2022 22:58:15 -0400
From:   Waiman Long <longman@...hat.com>
To:     Ming Lei <ming.lei@...hat.com>
Cc:     Tejun Heo <tj@...nel.org>, Jens Axboe <axboe@...nel.dk>,
        cgroups@...r.kernel.org, linux-block@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v6 3/3] blk-cgroup: Optimize blkcg_rstat_flush()

On 6/5/22 22:23, Ming Lei wrote:
> On Sun, Jun 05, 2022 at 09:59:50PM -0400, Waiman Long wrote:
>> On 6/5/22 21:39, Ming Lei wrote:
>>> On Sun, Jun 05, 2022 at 07:15:27PM -0400, Waiman Long wrote:
>>>> On 6/3/22 23:58, Ming Lei wrote:
>>>>
>>>>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>>>>> index abec50f31fe6..8c4f204dbf5b 100644
>>>>> --- a/mm/memcontrol.c
>>>>> +++ b/mm/memcontrol.c
>>>>> @@ -622,7 +622,7 @@ static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val)
>>>>>     {
>>>>>     	unsigned int x;
>>>>> -	cgroup_rstat_updated(memcg->css.cgroup, smp_processor_id());
>>>>> +	cgroup_rstat_updated(memcg->css.cgroup, smp_processor_id(), NULL);
>>>>>     	x = __this_cpu_add_return(stats_updates, abs(val));
>>>>>     	if (x > MEMCG_CHARGE_BATCH) {
>>>> I think the rstat set of functions are doing that already. So flush will
>>>> only call CPUs that have called cgroup_rstat_updated() before. However, one
>>> Yeah, I guess the detail is in cgroup_rstat_cpu_pop_updated(), but the
>>> percpu lock(raw_spin_lock_irqsave) is still required, and cgroup_rstat_cpu_pop_updated()
>>> is still called even through there isn't any update on this CPU.
>> Yes, I think we may need to add a bitmask of what controllers have updates
>> in cgroup_rstat_cpu structure.
>>>> deficiency that I am aware of is that there is no bitmap of which controller
>>>> have update. The problem that I saw in cgroup v2 is that in a cgroup with
>>>> both memory controller and block controller enabled, a
>>>> cgroup_rstat_updated() call from memory cgroup later causes the rstat
>>>> function to call into block cgroup flush method even though there is no
>>>> update in the block controller. This is an area that needs improvement.
>>>>
>>>> Your code does allow the block controller to be aware of that and avoid
>>>> further action, but I think it has to be done in the rstat code to be
>>>> applicable to all controllers instead of just specific to block controller.
>>> I guess it can be done by adding one percpu variable to 'struct cgroup'.
>>>
>>>> There is another problem that this approach. Suppose the system have 20
>>>> block devices and one of them has an IO operation. Now the flush method
>>>> still needs to iterate all the 20 blkg's to do an update. The block
>>>> controller is kind of special that the number of per-cgroup IO stats depends
>>>> on the number of block devices present. Other controllers just have one set
>>>> of stats per cgroup.
>>> Yeah, and this one is really blkio specific issue, and your patch does
>>> cover this one. Maybe you can add one callback to
>>> cgroup_rstat_updated(), so the "blkg_iostat_set" instance is added into
>>> percpu list under percpu lock of cgroup_rstat_cpu_lock, then the lockless
>>> list isn't needed.
>> The rstat API is generic. It may not be a good idea to put controller
>> specific information into it. Yes, cgroup_rstat_cpu_lock is taken at the
>> read side (flush). It may not taken on the write side (update). So it may
> Both cgroup_rstat_flush_locked()/cgroup_rstat_updated() take the percpu
> cgroup_rstat_cpu_lock, so the new invented lockless list can be
> replaced with plain list.

cgroup_rstat_updated() should only take the percpu cgroup_rstat_cpu_lock 
the first time it transition from "!updated" to "updated". After that, 
it returns immediately without the the lock. With a regular list, you 
will have to take the lock every time a new block device has an update. 
So there isn't much saving on the update side. In general, the 
lock/unlock sequence has a bit more overhead than the lockless 
insertion. On the flush side, there may be a bit of saving, but it is 
not the fast path.

Cheers,
Longman