lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 09 Jul 2013 13:51:19 +0200
From:	Bartlomiej Zolnierkiewicz <b.zolnierkie@...sung.com>
To:	Michael Wang <wangyun@...ux.vnet.ibm.com>
Cc:	"Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
	Viresh Kumar <viresh.kumar@...aro.org>,
	Borislav Petkov <bp@...en8.de>, Jiri Kosina <jkosina@...e.cz>,
	Tomasz Figa <t.figa@...sung.com>, linux-kernel@...r.kernel.org,
	linux-pm@...r.kernel.org
Subject: Re: Re: [v3.10 regression] deadlock on cpu hotplug


Hi,

On Tuesday, July 09, 2013 10:15:43 AM Michael Wang wrote:
> Hi, Bartlomiej
> 
> On 07/08/2013 11:26 PM, Bartlomiej Zolnierkiewicz wrote:
> [snip]
> > 
> > # echo 0 > /sys/devices/system/cpu/cpu3/online
> > # echo 0 > /sys/devices/system/cpu/cpu2/online
> > # echo 0 > /sys/devices/system/cpu/cpu1/online
> > # while true;do echo 1 > /sys/devices/system/cpu/cpu1/online;echo 0 > /sys/devices/system/cpu/cpu1/online;done
> > 
> > The commit in question (2f7021a8) was merged in v3.10-rc5 as a fix for
> > commit 031299b ("cpufreq: governors: Avoid unnecessary per cpu timer
> > interrupts") which was causing a kernel warning to show up.
> > 
> > Michael/Viresh: do you have some idea how to fix the issue?
> 
> Thanks for the report :) would you like to take a try
> on below patch and see whether it solve the issue?

It doesn't help and unfortunately it just can't help as it only
addresses lockdep functionality while the issue is not a lockdep
problem but a genuine locking problem. CPU hot-unplug invokes
_cpu_down() which calls cpu_hotplug_begin() which in turn takes
&cpu_hotplug.lock. The lock is then hold during __cpu_notify()
call. Notifier chain goes up to cpufreq_governor_dbs() which for
CPUFREQ_GOV_STOP event does gov_cancel_work(). This function
flushes pending work and waits for it to finish. The all above
happens in one kernel thread. At the same time the other kernel
thread is doing the work we are waiting to complete and it also
happens to do gov_queue_work() which calls get_online_cpus().
Then the code tries to take &cpu_hotplug.lock which is already
held by the first thread and deadlocks.

Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung R&D Institute Poland
Samsung Electronics

> diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
> index 5af40ad..aa05eaa 100644
> --- a/drivers/cpufreq/cpufreq_governor.c
> +++ b/drivers/cpufreq/cpufreq_governor.c
> @@ -229,6 +229,8 @@ static void set_sampling_rate(struct dbs_data *dbs_data,
>         }
>  }
>  
> +static struct lock_class_key j_cdbs_key;
> +
>  int cpufreq_governor_dbs(struct cpufreq_policy *policy,
>                 struct common_dbs_data *cdata, unsigned int event)
>  {
> @@ -366,6 +368,8 @@ int (struct cpufreq_policy *policy,
>                                         kcpustat_cpu(j).cpustat[CPUTIME_NICE];
>  
>                         mutex_init(&j_cdbs->timer_mutex);
> +                       lockdep_set_class(&j_cdbs->timer_mutex, &j_cdbs_key);
> +
>                         INIT_DEFERRABLE_WORK(&j_cdbs->work,
>                                              dbs_data->cdata->gov_dbs_timer);
>                 }
> 
> Regards,
> Michael Wang

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ