linux-kernel - Re: [Bug #13475] suspend/hibernate lockdep warning

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1244727561.5350.32.camel@odie.local>
Date:	Thu, 11 Jun 2009 15:39:21 +0200
From:	Simon Holm Thøgersen <odie@...aau.dk>
To:	Dave Jones <davej@...hat.com>
Cc:	Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>,
	Pekka Enberg <penberg@...helsinki.fi>,
	Dave Young <hidave.darkstar@...il.com>,
	"Rafael J. Wysocki" <rjw@...k.pl>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Kernel Testers List <kernel-testers@...r.kernel.org>,
	cpufreq@...r.kernel.org, Rusty Russell <rusty@...tcorp.com.au>,
	trenn@...e.de, sven.wegener@...aler.net,
	Venkatesh Pallipadi <venkatesh.pallipadi@...el.com>
Subject: Re: [Bug #13475] suspend/hibernate lockdep warning

man, 08 06 2009 kl. 10:32 -0400, skrev Dave Jones: 
> On Mon, Jun 08, 2009 at 08:48:45AM -0400, Mathieu Desnoyers wrote:
>  
>  > > > >> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=13475
>  > > > >> Subject         : suspend/hibernate lockdep warning
>  > > > >> References      : http://marc.info/?l=linux-kernel&m=124393723321241&w=4
>  > > > 
>  > > > I suspect the following commit, after revert this patch I test 5 times
>  > > > without lockdep warnings.
>  > > > 
>  > > > commit b14893a62c73af0eca414cfed505b8c09efc613c
>  > > > Author: Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
>  > > > Date:   Sun May 17 10:30:45 2009 -0400
>  > > > 
>  > > > 	[CPUFREQ] fix timer teardown in ondemand governor
>  > > 
>  > > The patch is probably not at fault here. I suspect it's some latent bug
>  > > that simply got exposed by the change to cancel_delayed_work_sync(). In
>  > > any case, Mathieu, can you take a look at this please?
>  > 
>  > Yes, it's been looked at and discussed on the cpufreq ML. The short
>  > answer is that they plan to re-engineer cpufreq and remove the policy
>  > rwlock taken around almost every operations at the cpufreq level.
>  > 
>  > The short-term solution, which is recognised as ugly, would be do to the
>  > following before doing the cancel_delayed_work_sync() :
>  > 
>  > unlock policy rwlock write lock
>  > 
>  > lock policy rwlock write lock
>  > 
>  > It basically works because this rwlock is unneeded for teardown, hence
>  > the future re-work planned.
>  > 
>  > I'm sorry I cannot prepare a patch current... I've got quite a few pages
>  > of Ph.D. thesis due for the beginning of July.
>  
> I'm kinda scared to touch this code at all for .30 due to the number of
> unexpected gotchas we seem to run into every time we touch something
> locking related.  So I'm inclined to just live with the lockdep warning
> for .30, and see how the real fixes look for .31, and push them back
> as -stable updates if they work out.

Unfortunately I don't think it is just theoretical, I've actually hit
the following (that haven't got anything to do with suspend/hibernate)

INFO: task cpufreqd:4676 blocked for more than 120 seconds.
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 cpufreqd      D eee2ac60     0  4676      1
  ee01bd68 00000086 eee2aad0 eee2ac60 00000533 eee2aad0 eee2ac60 0002b16f
  00000000 eee2ac60 7fffffff 7fffffff eee2ac60 7fffffff 7fffffff 00000000
  ee01bd70 c03117ee ee01bdbc c0311c0c eee2aad0 eecf6900 eee2aad0 eecf6900
 Call Trace:
  [<c03117ee>] schedule+0x12/0x24
  [<c0311c0c>] schedule_timeout+0x17/0x170
  [<c011a4f7>] ? __wake_up+0x2b/0x51
  [<c0311afd>] wait_for_common+0xc4/0x135
  [<c011a694>] ? default_wake_function+0x0/0xd
  [<c0311be0>] wait_for_completion+0x12/0x14
  [<c012bc6a>] __cancel_work_timer+0xfe/0x129
  [<c012b635>] ? wq_barrier_func+0x0/0xd
  [<c012bca0>] cancel_delayed_work_sync+0xb/0xd
  [<f20948f9>] cpufreq_governor_dbs+0x22e/0x291 [cpufreq_ondemand]
  [<c02af857>] __cpufreq_governor+0x65/0x9d
  [<c02af960>] __cpufreq_set_policy+0xd1/0x11f
  [<c02b02ae>] store_scaling_governor+0x18a/0x1b2
  [<c02b09a5>] ? handle_update+0x0/0xd
  [<c02b0124>] ? store_scaling_governor+0x0/0x1b2
  [<c02b08c9>] store+0x48/0x61
  [<c01acbf4>] sysfs_write_file+0xb4/0xdf
  [<c01acb40>] ? sysfs_write_file+0x0/0xdf
  [<c0175535>] vfs_write+0x8a/0x104
  [<c0175648>] sys_write+0x3b/0x60
  [<c0103110>] sysenter_do_call+0x12/0x2c
 INFO: task kondemand/0:4956 blocked for more than 120 seconds.
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 kondemand/0   D 00000533     0  4956      2
  ee1d9efc 00000046 c011815f 00000533 071148de ee1e0080 ee1e0210 00000000
  c03ff478 9189e633 00000082 c03ff478 ee1e0210 c04159f4 c04159f0 00000000
  ee1d9f04 c03117ee ee1d9f28 c0313104 ee1d9f30 c04159f4 ee1e0080 c01183be
 Call Trace:
  [<c011815f>] ? update_curr+0x6c/0x14b
  [<c03117ee>] schedule+0x12/0x24
  [<c0313104>] rwsem_down_failed_common+0x150/0x16e
  [<c01183be>] ? dequeue_task_fair+0x51/0x56
  [<c031313d>] rwsem_down_write_failed+0x1b/0x23
  [<c031317e>] call_rwsem_down_write_failed+0x6/0x8
  [<c03125dd>] ? down_write+0x14/0x16
  [<c02b0460>] lock_policy_rwsem_write+0x1d/0x33
  [<f20944aa>] do_dbs_timer+0x45/0x266 [cpufreq_ondemand]
  [<c012b8f7>] worker_thread+0x165/0x212
  [<f2094465>] ? do_dbs_timer+0x0/0x266 [cpufreq_ondemand]
  [<c012e639>] ? autoremove_wake_function+0x0/0x33
  [<c012b792>] ? worker_thread+0x0/0x212
  [<c012e278>] kthread+0x42/0x67
  [<c012e236>] ? kthread+0x0/0x67
  [<c01038eb>] kernel_thread_helper+0x7/0x10

I've only seen it once in 5 boots and CONFIG_PROVELOCKING does not give any
warnings about this, though it does yell when switching governor as reported
by others in bug #13493.

Let's hope Mathieu nails it, though I know he's busy with his thesis.


Simon Holm Thøgersen

View attachment "config-non-debug" of type "text/plain" (56981 bytes)