[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3866963.E19P3EGEHd@vostro.rjw.lan>
Date: Fri, 05 Feb 2016 04:54:44 +0100
From: "Rafael J. Wysocki" <rjw@...ysocki.net>
To: Saravana Kannan <skannan@...eaurora.org>,
Viresh Kumar <viresh.kumar@...aro.org>
Cc: "Rafael J. Wysocki" <rafael@...nel.org>,
Shilpa Bhat <shilpabhatppc@...il.com>,
Juri Lelli <juri.lelli@....com>,
Lists linaro-kernel <linaro-kernel@...ts.linaro.org>,
"linux-pm@...r.kernel.org" <linux-pm@...r.kernel.org>,
Peter Zijlstra <peterz@...radead.org>,
Michael Turquette <mturquette@...libre.com>,
Steve Muckle <steve.muckle@...aro.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
Morten Rasmussen <morten.rasmussen@....com>,
dietmar.eggemann@....com,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups
On Thursday, February 04, 2016 07:18:32 PM Rafael J. Wysocki wrote:
> On Thu, Feb 4, 2016 at 6:44 PM, Saravana Kannan <skannan@...eaurora.org> wrote:
> > On 02/04/2016 09:43 AM, Saravana Kannan wrote:
> >>
> >> On 02/04/2016 03:09 AM, Viresh Kumar wrote:
> >>>
> >>> On 04-02-16, 00:50, Rafael J. Wysocki wrote:
> >>>>
> >>>> This is exactly right. We've avoided one deadlock only to trip into
> >>>> another one.
> >>>>
> >>>> This happens because update_sampling_rate() acquires
> >>>> od_dbs_cdata.mutex which is held around cpufreq_governor_exit() by
> >>>> cpufreq_governor_dbs().
> >>>>
> >>>> Worse yet, a deadlock can still happen without (the new)
> >>>> dbs_data->mutex, just between s_active and od_dbs_cdata.mutex if
> >>>> update_sampling_rate() runs in parallel with
> >>>> cpufreq_governor_dbs()->cpufreq_governor_exit() and the latter wins
> >>>> the race.
> >>>>
> >>>> It looks like we need to drop the governor mutex before putting the
> >>>> kobject in cpufreq_governor_exit().
> >>>
>
> [cut]
>
> >>
> >> No no no no! Let's not open up this can of worms of queuing up the work
> >> to handle a write to a sysfs file. It *MIGHT* work for this specific
> >> tunable (I haven't bothered to analyze), but this makes it impossible to
> >> return a useful/proper error value.
> >
> >
> > Sent too soon. Not only that, but it can also cause the writes to the sysfs
> > files to get processed in a different order and I don't know what other
> > issues/races THAT will open up.
>
> Well, I don't like this too.
>
> I actually do have an idea about how to fix these deadlocks, but it is
> on top of my cleanup series.
>
> I'll write more about it later today.
Having actually posted that series again after cleaning it up I can say
what I'm thinking about hopefully without confusing anyone too much. So
please bear in mind that I'm going to refer to this series below:
http://marc.info/?l=linux-pm&m=145463901630950&w=4
Also this is more of a brain dump rather than actual design description,
so there may be holes etc in it. Please let me know if you can see any.
The problem at hand is that policy->rwsem needs to be held around *all*
operations in cpufreq_set_policy(). In particular, it cannot be dropped
around invocations of __cpufreq_governor() with the event arg equal to
_EXIT as that leads to interesting races.
Unfortunately, we know that holding policy->rwsem in those places leads
to a deadlock with governor sysfs attributes removal in cpufreq_governor_exit().
Viresh attempted to fix this by avoiding to acquire policy->rwsem for governor
attributes access (as holding it is not necessary for them in principle). That
was a nice try, but it turned out to be insufficient because of another deadlock
scenario uncovered by it. Namely, since the ondemand governor's update_sampling_rate()
acquires the governor mutex (called dbs_data_mutex after my patches mentioned
above), it may deadlock with exactly the same piece of code in cpufreq_governor_exit()
in almost exactly the same way.
To avoid that other deadlock, we'd either need to drop dbs_data_mutex from
update_sampling_rate(), or drop it for the removal of the governor sysfs
attributes in cpufreq_governor_exit(). I don't think the former is an option
at least at this point, so it looks like we pretty much have to do the latter.
With that in mind, I'd start with the changes made by Viresh (maybe without the
first patch which really isn't essential here). That is, introduce a separate
kobject type for the governor attributes kobject and register that in
cpufreq_governor_init(). The show/store callbacks for that kobject type won't
acquire policy->rwsem so the first deadlock will be avoided.
But in addition to that, I'd drop dbs_data_mutex before the removal of governor
sysfs attributes. That actually happens in two places, in cpufreq_governor_exit()
and in the error path of cpufreq_governor_init().
To that end, I'd move the locking from cpufreq_governor_dbs() to the functions
called by it. That should be readily doable and they can do all of the
necessary checks themselves. cpufreq_governor_dbs() would become a pure mux then,
but that's not such a big deal.
With that, cpufreq_governor_exit() may just drop the lock before it does the
final kobject_put(). The danger here is that the sysfs show/store callbacks of
the governor attributes kobject may see invalid dbs_data for a while, after the
lock has been dropped and before the kobject is deleted. That may be addressed
by checking, for example, the presence of the dbs_data's "tuners" pointer in those
callbacks. If it is NULL, they can simply return -EAGAIN or similar.
Now, that means, though, that they need to acquire the same lock as
cpufreq_governor_exit(), or they may see things go away while they are running.
The simplest approach here would be to take dbs_data_mutex in them too, although
that's a bit of a sledgehammer. It might be better to have a per-policy lock
in struct policy_dbs_info for that, for example, but then the governor attribute
sysfs callbacks would need to get that object instead of dbs_data.
On the flip side, it might be possible to migrate update_sampling_rate() to
that lock too. And maybe we can get rid of dbs_data_mutex even, who knows?
Thanks,
Rafael
Powered by blists - more mailing lists