linux-kernel - Re: [RFC PATCH] cpu hotplug: rework cpu_hotplug locking (was [LOCKDEP] cpufreq: possible circular locking dependency detected)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20130628100426.GA2228@swordfish.minsk.epam.com>
Date:	Fri, 28 Jun 2013 13:04:26 +0300
From:	Sergey Senozhatsky <sergey.senozhatsky@...il.com>
To:	"Srivatsa S. Bhat" <srivatsa.bhat@...ux.vnet.ibm.com>
Cc:	Viresh Kumar <viresh.kumar@...aro.org>,
	Michael Wang <wangyun@...ux.vnet.ibm.com>,
	Jiri Kosina <jkosina@...e.cz>, Borislav Petkov <bp@...en8.de>,
	"Rafael J. Wysocki" <rjw@...k.pl>, linux-kernel@...r.kernel.org,
	cpufreq@...r.kernel.org, linux-pm@...r.kernel.org
Subject: Re: [RFC PATCH] cpu hotplug: rework cpu_hotplug locking (was
 [LOCKDEP] cpufreq: possible circular locking dependency detected)

On (06/28/13 15:01), Srivatsa S. Bhat wrote:
> On 06/28/2013 01:14 PM, Sergey Senozhatsky wrote:
> > On (06/28/13 10:13), Viresh Kumar wrote:
> >> On 26 June 2013 02:45, Sergey Senozhatsky <sergey.senozhatsky@...il.com> wrote:
> >>>
> >>> [   60.277396] ======================================================
> >>> [   60.277400] [ INFO: possible circular locking dependency detected ]
> >>> [   60.277407] 3.10.0-rc7-dbg-01385-g241fd04-dirty #1744 Not tainted
> >>> [   60.277411] -------------------------------------------------------
> >>> [   60.277417] bash/2225 is trying to acquire lock:
> >>> [   60.277422]  ((&(&j_cdbs->work)->work)){+.+...}, at: [<ffffffff810621b5>] flush_work+0x5/0x280
> >>> [   60.277444]
> >>> but task is already holding lock:
> >>> [   60.277449]  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81042d8b>] cpu_hotplug_begin+0x2b/0x60
> >>> [   60.277465]
> >>> which lock already depends on the new lock.
> >>
> >> Hi Sergey,
> >>
> >> Can you try reverting this patch?
> >>
> >> commit 2f7021a815f20f3481c10884fe9735ce2a56db35
> >> Author: Michael Wang <wangyun@...ux.vnet.ibm.com>
> >> Date:   Wed Jun 5 08:49:37 2013 +0000
> >>
> >>     cpufreq: protect 'policy->cpus' from offlining during __gov_queue_work()
> >>
> > 
> > Hello,
> > Yes, this helps, of course, but at the same time it returns the previous
> > problem -- preventing cpu_hotplug in some places.
> > 
> > 
> > I have a bit different (perhaps naive) RFC patch and would like to hear
> > comments.
> > 
> > 
> > 
> > The idead is to brake existing lock dependency chain by not holding
> > cpu_hotplug lock mutex across the calls. In order to detect active
> > refcount readers or active writer, refcount now may have the following
> > values:
> > 
> > -1: active writer -- only one writer may be active, readers are blocked
> >  0: no readers/writer
> >> 0: active readers -- many readers may be active, writer is blocked
> > 
> > "blocked" reader or writer goes to wait_queue. as soon as writer finishes
> > (refcount becomes 0), it wakeups all existing processes in a wait_queue.
> > reader perform wakeup call only when it sees that pending writer is present
> > (active_writer is not NULL).
> > 
> > cpu_hotplug lock now only required to protect refcount cmp, inc, dec
> > operations so it can be changed to spinlock.
> > 
> 
> Its best to avoid changing the core infrastructure in order to fix some
> call-site, unless that scenario is really impossible to handle with the
> current infrastructure.
> 
> I have a couple of suggestions below, to solve this issue, without touching
> the core hotplug code:
> 
> You can perhaps try cancelling the work item in two steps:
>   a. using cancel_delayed_work() under CPU_DOWN_PREPARE
>   b. using cancel_delayed_work_sync() under CPU_POST_DEAD
> 
> And of course, destroy the resources associated with that work (like
> the timer_mutex) only after the full tear-down.
> 
> Or perhaps you might find a way to perform the tear-down in just one step
> at the CPU_POST_DEAD stage. Whatever works correctly.
> 
> The key point here is that the core CPU hotplug code provides us with the
> CPU_POST_DEAD stage, where the hotplug lock is _not_ held. Which is exactly
> what you want in solving the issue with cpufreq.
> 

Thanks for your ideas, I'll take a look.

cpu_hotplug mutex seems to be a troubling part in several places, not only
cpufreq. for example:
	https://lkml.org/lkml/2012/12/20/357


	-ss

> Regards,
> Srivatsa S. Bhat
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/