lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 9 May 2022 23:06:01 +0800
From:   Schspa Shi <schspa@...il.com>
To:     Viresh Kumar <viresh.kumar@...aro.org>
Cc:     rafael@...nel.org, linux-pm@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] cpufreq: fix race on cpufreq online

Viresh Kumar <viresh.kumar@...aro.org> writes:

> I had to dig the old patch first before starting to review what your
> next one does.
>
> On 21-04-22, 03:15, Schspa Shi wrote:
>> When cpufreq online failed, policy->cpus are not empty while
>> cpufreq sysfs file available, we may access some data freed.
>>
>> Take policy->clk as an example:
>>
>> static int cpufreq_online(unsigned int cpu)
>> {
>>   ...
>>   // policy->cpus != 0 at this time
>>   down_write(&policy->rwsem);
>>   ret = cpufreq_add_dev_interface(policy);
>
> Please keep some code to help understand where we went from here. I am
> sure you meant that we will error out in this case, but you removed
> the relevant code.
>

Yes, I will add this to the next version of patch.

>>   up_write(&policy->rwsem);
>>
>>   return 0;
>>
>> out_destroy_policy:
>>      for_each_cpu(j, policy->real_cpus)
>>              remove_cpu_dev_symlink(policy, get_cpu_device(j));
>>     up_write(&policy->rwsem);
>> ...
>> out_exit_policy:
>>   if (cpufreq_driver->exit)
>>     cpufreq_driver->exit(policy);
>>       clk_put(policy->clk);
>>       // policy->clk is a wild pointer
>> ...
>>                                     ^
>>                                     |
>>                             Another process access
>>                             __cpufreq_get
>>                               cpufreq_verify_current_freq
>>                                 cpufreq_generic_get
>>                                   // acces wild pointer of policy->clk;
>>                                     |
>>                                     |
>> out_offline_policy:                 |
>>   cpufreq_policy_free(policy);      |
>>     // deleted here, and will wait for no body reference
>>     cpufreq_policy_put_kobj(policy);
>> }
>>
>> Signed-off-by: Schspa Shi <schspa@...il.com>
>> ---
>>  drivers/cpufreq/cpufreq.c | 5 +++--
>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
>> index 80f535cc8a75..0d58b0f8f3af 100644
>> --- a/drivers/cpufreq/cpufreq.c
>> +++ b/drivers/cpufreq/cpufreq.c
>> @@ -1533,8 +1533,6 @@ static int cpufreq_online(unsigned int cpu)
>>      for_each_cpu(j, policy->real_cpus)
>>              remove_cpu_dev_symlink(policy, get_cpu_device(j));
>>
>> -    up_write(&policy->rwsem);
>> -
>>  out_offline_policy:
>>      if (cpufreq_driver->offline)
>>              cpufreq_driver->offline(policy);
>> @@ -1543,6 +1541,9 @@ static int cpufreq_online(unsigned int cpu)
>>      if (cpufreq_driver->exit)
>>              cpufreq_driver->exit(policy);
>>
>> +    cpumask_clear(policy->cpus);
>> +    up_write(&policy->rwsem);
>
> This is simply buggy as now an error out to out_offline_policy or
> out_exit_policy will try to release a semaphore which was never taken
> in the first place. This works fine only if we failed late, i.e. via
> out_destroy_policy.
>

I am very sorry for this oversight.

To fix this issue, there is no need to move cpufreq_driver->exit(policy)
and cpufreq_driver->offline(policy) to inside of &policy->rwsem.
I made this change because they are inside of &policy->rwsem write lock
at cpufreq_offline. I think we should keep offline & exit call inside of
policy->rwsem for better symmetry.

static int cpufreq_offline(unsigned int cpu)
{
        ...
        down_write(&policy->rwsem);
    ...
        /*
         * Perform the ->offline() during light-weight tear-down, as
         * that allows fast recovery when the CPU comes back.
         */
        if (cpufreq_driver->offline) {
                cpufreq_driver->offline(policy);
        } else if (cpufreq_driver->exit) {
                cpufreq_driver->exit(policy);
                policy->freq_table = NULL;
        }

unlock:
        up_write(&policy->rwsem);
        return 0;
}

> The very first thing we need to do now is revert this patch. Lemme
> send a patch for that and you can send a fresh fix over that once you
> have a stable fix.

For the next version of the stable fix, I'd be willing to keep exit and
offline calls inside of policy->rwsem. But it's OK for me to keep offline
& exit calls outside of policy->rwsem.

---
BRs


Schspa Shi

Powered by blists - more mailing lists