[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJZ5v0j6b=JF7SuQGsBiYTZp7hCt8BfiOUNwxX-=TTSQOzZgvQ@mail.gmail.com>
Date: Wed, 11 May 2022 15:42:31 +0200
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Schspa Shi <schspa@...il.com>,
Viresh Kumar <viresh.kumar@...aro.org>
Cc: Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Linux PM <linux-pm@...r.kernel.org>,
"Rafael J. Wysocki" <rafael@...nel.org>
Subject: Re: [PATCH v3] cpufreq: fix race on cpufreq online
On Wed, May 11, 2022 at 3:19 PM Rafael J. Wysocki <rafael@...nel.org> wrote:
>
> On Wed, May 11, 2022 at 2:59 PM Rafael J. Wysocki <rafael@...nel.org> wrote:
> >
> > On Wed, May 11, 2022 at 2:21 PM Viresh Kumar <viresh.kumar@...aro.org> wrote:
> > >
> > > On 11-05-22, 16:10, Schspa Shi wrote:
> > > > Viresh Kumar <viresh.kumar@...aro.org> writes:
> > > > > I am not sure, but maybe there were issues in calling init() with rwsem held, as
> > > > > it may want to call some API from there.
> > > > >
> > > >
> > > > I have checked all the init() implement of the fellowing files, It should be OK.
> > > > Function find command:
> > > > ag "init[\s]+=" drivers/cpufreq
> > > >
> > > > All the init() implement only initialize policy object without holding this lock
> > > > and won't call cpufreq APIs need to hold this lock.
> > >
> > > Okay, we can see if someone complains later then :)
> > >
> > > > > I don't think you can do that safely. offline() or exit() may depend on
> > > > > policy->cpus being set to all CPUs.
> > > > OK, I will move this after exit(). and there will be no effect with those
> > > > two APIs. But policy->cpus must be clear before release policy->rwsem.
> > >
> > > Hmm, I don't think depending on the values of policy->cpus is a good idea to be
> > > honest. This design is inviting bugs to come in at another place. We need a
> > > clear flag for this, a new flag or something like policy_list.
>
> Why?
>
> > > Also I see the same bug happening while the policy is removed. The kobject is
> > > put after the rwsem is dropped.
>
> This shouldn't be a problem because of the wait_for_completion() in
> cpufreq_policy_put_kobj(). It is known that cpufreq_sysfs_release()
> has run when cpufreq_policy_put_kobj() returns, so it is safe to free
> the policy then.
>
> > > > > static inline bool policy_is_inactive(struct cpufreq_policy *policy)
> > > > > {
> > > > > - return cpumask_empty(policy->cpus);
> > > > > + return unlikely(cpumask_empty(policy->cpus) ||
> > > > > + list_empty(&policy->policy_list));
> > > > > }
> > > > >
> > > >
> > > > I don't think this fully solves my problem.
> > > > 1. There is some case which cpufreq_online failed after the policy is added to
> > > > cpufreq_policy_list.
> > >
> > > And I missed that :(
> > >
> > > > 2. policy->policy_list is not protected by &policy->rwsem, and we
> > > > can't relay on this to
> > > > indict the policy is fine.
> > >
> > > Ahh..
> > >
> > > > >From this point of view, we can fix this problem through the state of
> > > > this linked list.
> > > > But the above two problems need to be solved first.
> > >
> > > I feel overriding policy_list for this is going to make it complex/messy.
> > >
> > > Maybe something like this then:
> >
> > There are two things.
> >
> > One is the possible race with respect to the sysfs access occurring
> > during failing initialization and the other is that ->offline() or
> > ->exit() can be called with or without holding the policy rwsem
> > depending on the code path.
> >
> > Namely, cpufreq_offline() calls them under the policy rwsem, but
> > cpufreq_remove_dev() calls ->exit() outside the rwsem. Also they are
> > called outside the rwsem in cpufreq_online().
> >
> > Moreover, ->offline() and ->exit() cannot expect policy->cpus to be
> > populated, because they are called when it is empty from
> > cpufreq_offline().
> >
> > So the $subject patch is correct AFAICS even though it doesn't address
> > all of the above.
Actually, I meant the last version of it, that is:
https://patchwork.kernel.org/project/linux-pm/patch/20220510154236.88753-1-schspa@gmail.com/
> TBH, I'm not sure why show() doesn't check policy_is_inactive() under the rwsem.
>
> Moreover, I'm not sure why the locking dance in store() is necessary.
Powered by blists - more mailing lists