linux-kernel - Re: [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160204055108.GY3469@vireshk>
Date:	Thu, 4 Feb 2016 11:21:08 +0530
From:	Viresh Kumar <viresh.kumar@...aro.org>
To:	"Rafael J. Wysocki" <rafael@...nel.org>
Cc:	Shilpa Bhat <shilpabhatppc@...il.com>,
	Juri Lelli <juri.lelli@....com>,
	Rafael Wysocki <rjw@...ysocki.net>,
	Lists linaro-kernel <linaro-kernel@...ts.linaro.org>,
	"linux-pm@...r.kernel.org" <linux-pm@...r.kernel.org>,
	Saravana Kannan <skannan@...eaurora.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Michael Turquette <mturquette@...libre.com>,
	Steve Muckle <steve.muckle@...aro.org>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Morten Rasmussen <morten.rasmussen@....com>,
	dietmar.eggemann@....com,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups

On 04-02-16, 00:50, Rafael J. Wysocki wrote:
> On Thu, Feb 4, 2016 at 12:31 AM, Shilpa Bhat <shilpabhatppc@...il.com> wrote:
> > Sorry for the delayed report. But I see the below backtrace on Power8 box. It
> > has 4 chips with 128 cpus.

Honestly, I wasn't expecting you to test this stuff, but I really
appreciate you doing that.

Thanks a lot ..

> > [  906.765768]  Possible unsafe locking scenario:
> >
> > [  906.765880]        CPU0                    CPU1
> > [  906.765969]        ----                    ----

This race scenario is perhaps incomplete and difficult to understand
without below lines:

                          Governor's EXIT         Update sampling rate from sysfs

                                                  lock(s_active#91);

> > [  906.766058]   lock(od_dbs_cdata.mutex);
> > [  906.766170]                                lock(&dbs_data->mutex);
> > [  906.766304]                                lock(od_dbs_cdata.mutex);
> > [  906.766461]   lock(s_active#91);
> > [  906.766572]
> >  *** DEADLOCK ***
> 
> This is exactly right.  We've avoided one deadlock only to trip into
> another one.

As we discussed on IRC, we haven't introduced this deadlock with the
current series.  But this is what Juri has reported some days back,
while he tested linus/master on TC2.

> This happens because update_sampling_rate() acquires
> od_dbs_cdata.mutex which is held around cpufreq_governor_exit() by
> cpufreq_governor_dbs().
> 
> Worse yet, a deadlock can still happen without (the new)
> dbs_data->mutex, just between s_active and od_dbs_cdata.mutex if
> update_sampling_rate() runs in parallel with
> cpufreq_governor_dbs()->cpufreq_governor_exit() and the latter wins
> the race.
> 
> It looks like we need to drop the governor mutex before putting the
> kobject in cpufreq_governor_exit().

That wouldn't be trivial to implement as we discussed.

Okay, here is a proposal for the current series and the series's you
have post Rafael:

- Firstly, I would like to clarify that I don't have any issues with
  rebasing on top of your series, it should be easy enough.

- One thing is for sure that nothing from these 3 series's is getting
  merged in 4.5, as we aren't fixing the real issue Shilpa/Juril have
  reported.

- I think the first 4 patches here are just fine and don't need any
  updates. They actually do the right thing and makes code so much
  cleaner.

- So, can we apply the first 4 patches (which  you have already
  applied to bleeding-edge) now and do all work on top of that ?

Again, I can rebase if you merge your patches first, no issues at all
:)

-- 
viresh