lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sat, 31 Aug 2013 02:55:57 +0200
From:	"Rafael J. Wysocki" <rjw@...k.pl>
To:	Stephen Boyd <sboyd@...eaurora.org>
Cc:	Viresh Kumar <viresh.kumar@...aro.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	"cpufreq@...r.kernel.org" <cpufreq@...r.kernel.org>,
	"linux-pm@...r.kernel.org" <linux-pm@...r.kernel.org>,
	Kukjin Kim <kgene.kim@...sung.com>
Subject: Re: mutex warning in cpufreq + RFC patch

On Friday, August 30, 2013 05:36:41 PM Stephen Boyd wrote:
> On 08/29, Viresh Kumar wrote:
> > On 28 August 2013 22:22, Stephen Boyd <sboyd@...eaurora.org> wrote:
> > >
> > > I've applied these patches on top of v3.10
> > >
> > > f51e1eb63d9c28cec188337ee656a13be6980cfd (cpufreq: Fix cpufreq regression after suspend/resume
> > > aae760ed21cd690fe8a6db9f3a177ad55d7e12ab (cpufreq: Revert commit a66b2e to fix suspend/resume regression)
> > > e8d05276f236ee6435e78411f62be9714e0b9377 (cpufreq: Revert commit 2f7021a8 to fix CPU hotplug regression)
> > > 2a99859932281ed6c2ecdd988855f8f6838f6743 (cpufreq: Fix cpufreq driver module refcount balance after suspend/resume)
> > > 419e172145cf6c51d436a8bf4afcd17511f0ff79 (cpufreq: don't leave stale policy pointer in cdbs->cur_policy)
> > > 95731ebb114c5f0c028459388560fc2a72fe5049 (cpufreq: Fix governor start/stop race condition)
> > >
> > > That second to last one causes a NULL pointer exception after the mutex
> > > warning above because the limits case does
> > >
> > >     if (policy->max < cpu_cdbs->cur_policy->cur)
> > >
> > > and that dereferences a NULL cur_policy pointer.
> > 
> > I have seen something similar and the error checking patch that
> > I mentioned earlier came as solution to that only..
> 
> Yes that patch may reduce the chance of the race condition but I
> don't believe it removes it entirely. I believe this bug still
> exists in linux-next. Consider the scenario where CPU1 is going
> down.
> 
> __cpufreq_remove_dev()
>  ret = __cpufreq_governor(policy, CPUFREQ_GOV_STOP);
>   __cpufreq_governor()
>    policy->governor->governor(policy, CPUFREQ_GOV_STOP);
>     cpufreq_governor_dbs()
>      case CPUFREQ_GOV_STOP:
>       mutex_destroy(&cpu_cdbs->timer_mutex)
>       cpu_cdbs->cur_policy = NULL;
>   <PREEMPT>
> store()
>  __cpufreq_set_policy()
>   ret = __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS);
>    __cpufreq_governor()
>     policy->governor->governor(policy, CPUFREQ_GOV_LIMITS);
>      case CPUFREQ_GOV_LIMITS:
>       mutex_lock(&cpu_cdbs->timer_mutex); <-- Warning (destroyed mutex)
>        if (policy->max < cpu_cdbs->cur_policy->cur) <- cur_policy == NULL
> 
> Once we stop the governor I don't see how another thread can't
> race in and get all the way down into the GOV_LIMITS case. Even
> if we wanted to lock out that thread with some mutex or semaphore
> it will have to continue running eventually and so we really need
> to wait until all the sysfs files are gone before we stop the
> governor (in the case of the last cpu for the policy) or we need
> to stop and start the governor while holding the policy semaphore
> to prevent a race.
> 
> > 
> > > Are there any fixes that I'm missing? I see that some things are
> > > changing in linux-next but they don't look like fixes, more like
> > > optimizations.
> > 
> > Getting patches over 3.10 would be tricky.. You are two kernel
> > version back and that's not going to help much.. There are too many
> > patches in between linux-next and 3.10..
> >
> > 
> > I really can't tell you which specific ones to include, as I am lost in them :)
> 
> That's a problem. 3.10 is the next long term stable kernel and so we need to
> backport any fixes to 3.10 for the next two years. Hopefully these bugs I'm
> finding in the 3.10 stable kernel's cpufreq code aren't known issues on
> 3.11/next.

No, they aren't.

Well, that's the main reason why I've been pushing back against more churn in
the cpuidle subsystem recently.  I think we went too far with changes that
were not entirely understood and now we're seeing the fallout.

It would be great if you could identify the 3.11 changes that fix problems
you're seeing in 3.10.y (doing a reverse bisect of drivers/cpufreq/ changes
might help, since you have reproducers it seems).

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ