linux-kernel - Re: [PATCH v2 2/2] thermal: power_allocator: update once cooling devices when temp is low

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <0fc57590-cc7c-9e04-16bc-13b7b787ad2f@arm.com>
Date:   Tue, 20 Apr 2021 21:01:50 +0100
From:   Lukasz Luba <lukasz.luba@....com>
To:     Daniel Lezcano <daniel.lezcano@...aro.org>
Cc:     linux-kernel@...r.kernel.org, linux-pm@...r.kernel.org,
        amitk@...nel.org, rui.zhang@...el.com
Subject: Re: [PATCH v2 2/2] thermal: power_allocator: update once cooling
 devices when temp is low



On 4/20/21 4:24 PM, Daniel Lezcano wrote:
> On 20/04/2021 16:21, Lukasz Luba wrote:
>> Hi Daniel,
>>
>> On 4/20/21 2:30 PM, Daniel Lezcano wrote:
>>> On 19/04/2021 10:45, Lukasz Luba wrote:
>>
>> [snip]
>>
>>>> -        instance->cdev->updated = false;
>>>> +        if (update)
>>>> +            instance->cdev->updated = false;
>>>> +
>>>>            mutex_unlock(&instance->cdev->lock);
>>>> -        (instance->cdev);
>>>> +
>>>> +        if (update)
>>>> +            thermal_cdev_update(instance->cdev);
>>>
>>> This cdev update has something bad IMHO. It is protected by a mutex but
>>> the 'updated' field is left unprotected before calling
>>> thermal_cdev_update().
>>>
>>> It is not the fault of this code but how the cooling device are updated
>>> and how it interacts with the thermal instances.
>>>
>>> IMO, part of the core code needs to revisited.
>>
>> I agree, but please check my comments below.
>>
>>>
>>> This change tight a bit more the knot.
>>>
>>> Would it make sense to you if we create a function eg.
>>> __thermal_cdev_update()
>>
>> I'm not sure if I assume it right that the function would only have the:
>> list_for_each_entry(instance, &cdev->thermal_instances, cdev_node)
>>
>> loop from the thermal_cdev_update(). But if it has only this loop then
>> it's too little.
>>
>>>
>>> And then we have:
>>>
>>> void thermal_cdev_update(struct thermal_cooling_device *cdev)
>>> {
>>>           mutex_lock(&cdev->lock);
>>>           /* cooling device is updated*/
>>>           if (cdev->updated) {
>>>                   mutex_unlock(&cdev->lock);
>>>                   return;
>>>           }
>>>
>>>      __thermal_cdev_update(cdev);
>>>
>>>           thermal_cdev_set_cur_state(cdev, target);
>>
>> Here we are actually setting the 'target' state via:
>> cdev->ops->set_cur_state(cdev, target)
>>
>> then we notify, then updating stats.
>>
>>>
>>>           cdev->updated = true;
>>>           mutex_unlock(&cdev->lock);
>>>           trace_cdev_update(cdev, target);
>>
>> Also this trace is something that I'm using in my tests...
> 
> Yeah, I noticed right after sending the comments. All that should be
> moved in the lockless function.

Agree

> 
> So this function becomes:
> 
> void thermal_cdev_update(struct thermal_cooling_device *cdev)
> {
> 	mutex_lock(&cdev->lock);
> 	if (!cdev->updated) {
> 		__thermal_cdev_update(cdev);
> 		cdev->updated = true;
> 	}
> 	mutex_unlock(&cdev->lock);
> 
> 	dev_dbg(&cdev->device, "set to state %lu\n", target);
> }
> 
> We end up with the trace_cdev_update(cdev, target) inside the mutex
> section but that should be fine.

True, this shouldn't be an issue.

> 
>>>           dev_dbg(&cdev->device, "set to state %lu\n", target);
>>> }
>>>
>>> And in this file we do instead:
>>>
>>> -        instance->cdev->updated = false;
>>> +        if (update)
>>> +            __thermal_cdev_update(instance->cdev);
>>>             mutex_unlock(&instance->cdev->lock);
>>> -        thermal_cdev_update(instance->cdev);
>>
>> Without the line above, we are not un-throttling the devices.
> 
> Is it still true with the amended function thermal_cdev_update() ?
> 
> 

That new approach should work. I can test your patch with this new
functions and re-base my work on top of it.
Or you like me to write such patch and send it?