linux-kernel - Re: timer list corruption in devfreq

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <364309dc-60d6-3250-b77b-f27935ab41a0@quicinc.com>
Date:   Tue, 14 Nov 2023 18:00:58 +0530
From:   Mukesh Ojha <quic_mojha@...cinc.com>
To:     Tejun Heo <tj@...nel.org>
CC:     <myungjoo.ham@...sung.com>, <kyungmin.park@...sung.com>,
        <cw00.choi@...sung.com>, <jstultz@...gle.com>,
        <tglx@...utronix.de>, <sboyd@...nel.org>, <jiangshanlai@...il.com>,
        <linux-kernel@...r.kernel.org>, <linux-pm@...r.kernel.org>
Subject: Re: timer list corruption in devfreq



On 11/10/2023 12:38 AM, Tejun Heo wrote:
> Hello,
> 
> On Wed, Nov 08, 2023 at 09:39:57PM +0530, Mukesh Ojha wrote:
>> We are facing an issue on 6.1 kernel while using devfreq framework
>> and looks like the devfreq_monitor_stop()/devfreq_monitor_start is
>> vulnerable if frequent governor change is being done from user space
>> in a loop.
>>
>> echo simple_ondemand > /sys/class/devfreq/1d84000.ufshc/governor
>> echo performance > /sys/class/devfreq/1d84000.ufshc/governor
>>
>> Here, we are using ufs device, but could be any device.
>>
>> Issue is because same instance of timer is being queued from two
>> places one from devfreq_monitor() and one from devfreq_monitor_start() as
>> cancel_delayed_work_sync() from devfreq_monitor_stop() was not
>> able to delete the delayed work time completely due to which
>> devfreq_monitor() work rearmed the same timer.
>>
>> But there looks to be issue in the timer framework where
>> it was initially discussed in [1] and later fixed in [2]
>> but not sure being whether is it issue in cancel_delayed_work_sync()
>> where del_timer() inside try_to_grab_pending() need to be replaced
>> with timer_delete[_sync]() or devfreq_monitor_stop() need to use
>> this api's and then delete the work.
> 
> So, having shutdown can be more convenient in some cases and that'd be a
> useful addition to workqueue both for immediate and delayed work items. That
> said, that's usually not essential in fixing these issues - e.g. Can't you
> just synchronize devfreq_monitor_start() and stop()?

Thanks for the feedback..

This issue can be fixed with synchronizing devfreq_monitor_[start/stop()].

Posted here,
https://lore.kernel.org/all/1699957648-31299-1-git-send-email-quic_mojha@quicinc.com/

However, It forces the client to have a check in delayed work callback
to not queue the new delayed work timer. It is also possible if
del_timer in below sequence[1] return 'false' but do not want
another instance of the timer to be queued after a call to
cancel_delayed_work_sync() which is what can be achieved with
timer_shutdown() version of __cancel_work_timer or may be a
separate __cancel_work_timer_shutdown() introduction.

[1]
__cancel_work_timer=>try_to_grab_pending=>del_timer()

Let me know if anything wrong with my understanding.

-Mukesh



> 
> Thanks.
>