lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <57162b7c-cf34-2b55-a17c-40d96a0ab105@linaro.org>
Date:   Tue, 3 Dec 2019 18:15:55 +0100
From:   Daniel Lezcano <daniel.lezcano@...aro.org>
To:     Wei Wang <wvw@...gle.com>
Cc:     wei.vince.wang@...il.com, Zhang Rui <rui.zhang@...el.com>,
        Eduardo Valentin <edubezval@...il.com>,
        Amit Kucheria <amit.kucheria@...durent.com>,
        linux-pm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] thermal: Fix deadlock in thermal
 thermal_zone_device_check

On 12/11/2019 21:42, Wei Wang wrote:
> commit [2] changed cancel_delayed_work to cancel_delayed_work_sync to
> avoid a use-after-free issue. However, cancel_delayed_work_sync could be
> called insides the WQ causing deadlock [1].
> 
> [1]
> [54109.642398] c0   1162 kworker/u17:1   D    0 11030      2 0x00000000
> [54109.642437] c0   1162 Workqueue: thermal_passive_wq thermal_zone_device_check
> [54109.642447] c0   1162 Call trace:
> [54109.642456] c0   1162  __switch_to+0x138/0x158
> [54109.642467] c0   1162  __schedule+0xba4/0x1434
> [54109.642480] c0   1162  schedule_timeout+0xa0/0xb28
> [54109.642492] c0   1162  wait_for_common+0x138/0x2e8
> [54109.642511] c0   1162  flush_work+0x348/0x40c
> [54109.642522] c0   1162  __cancel_work_timer+0x180/0x218
> [54109.642544] c0   1162  handle_thermal_trip+0x2c4/0x5a4
> [54109.642553] c0   1162  thermal_zone_device_update+0x1b4/0x25c
> [54109.642563] c0   1162  thermal_zone_device_check+0x18/0x24
> [54109.642574] c0   1162  process_one_work+0x3cc/0x69c
> [54109.642583] c0   1162  worker_thread+0x49c/0x7c0
> [54109.642593] c0   1162  kthread+0x17c/0x1b0
> [54109.642602] c0   1162  ret_from_fork+0x10/0x18
> [54109.643051] c0   1162 kworker/u17:2   D    0 16245      2 0x00000000
> [54109.643067] c0   1162 Workqueue: thermal_passive_wq thermal_zone_device_check
> [54109.643077] c0   1162 Call trace:
> [54109.643085] c0   1162  __switch_to+0x138/0x158
> [54109.643095] c0   1162  __schedule+0xba4/0x1434
> [54109.643104] c0   1162  schedule_timeout+0xa0/0xb28
> [54109.643114] c0   1162  wait_for_common+0x138/0x2e8
> [54109.643122] c0   1162  flush_work+0x348/0x40c
> [54109.643131] c0   1162  __cancel_work_timer+0x180/0x218
> [54109.643141] c0   1162  handle_thermal_trip+0x2c4/0x5a4
> [54109.643150] c0   1162  thermal_zone_device_update+0x1b4/0x25c
> [54109.643159] c0   1162  thermal_zone_device_check+0x18/0x24
> [54109.643167] c0   1162  process_one_work+0x3cc/0x69c
> [54109.643177] c0   1162  worker_thread+0x49c/0x7c0
> [54109.643186] c0   1162  kthread+0x17c/0x1b0
> [54109.643195] c0   1162  ret_from_fork+0x10/0x18
> [54109.644500] c0   1162 cat             D    0  7766      1 0x00000001
> [54109.644515] c0   1162 Call trace:
> [54109.644524] c0   1162  __switch_to+0x138/0x158
> [54109.644536] c0   1162  __schedule+0xba4/0x1434
> [54109.644546] c0   1162  schedule_preempt_disabled+0x80/0xb0
> [54109.644555] c0   1162  __mutex_lock+0x3a8/0x7f0
> [54109.644563] c0   1162  __mutex_lock_slowpath+0x14/0x20
> [54109.644575] c0   1162  thermal_zone_get_temp+0x84/0x360
> [54109.644586] c0   1162  temp_show+0x30/0x78
> [54109.644609] c0   1162  dev_attr_show+0x5c/0xf0
> [54109.644628] c0   1162  sysfs_kf_seq_show+0xcc/0x1a4
> [54109.644636] c0   1162  kernfs_seq_show+0x48/0x88
> [54109.644656] c0   1162  seq_read+0x1f4/0x73c
> [54109.644664] c0   1162  kernfs_fop_read+0x84/0x318
> [54109.644683] c0   1162  __vfs_read+0x50/0x1bc
> [54109.644692] c0   1162  vfs_read+0xa4/0x140
> [54109.644701] c0   1162  SyS_read+0xbc/0x144
> [54109.644708] c0   1162  el0_svc_naked+0x34/0x38
> [54109.845800] c0   1162 D 720.000s 1->7766->7766 cat [panic]
> 
> Fixes commit 1851799e1d29 ("thermal: Fix use-after-free when
> unregistering thermal zone device") [2]

It does not fix the problem actually. It is preferable to revert the
commit 1851799e1d29.

The cancel delayed work sync takes the mutex while a cat
/sys/class/thermal/thermal_zone0/temp happens which takes the mutex also
and at the same moment a thermal_zone_device_update() call is done in
the workqueue context. This one blocks because cancel_delayed_work_sync
is owning the lock which in turn waits for the work to end.

IMO, that deserves a deeper investigation with the mutex. Probably the
fix would be to refcount the thermal zone device and see if we can get
rid of the mutex in some places. If the refcounting is used, the
function thermal_zone_device_unregister() will be called with the
guarantee nobody will use it and the mutex can be removed.

> Signed-off-by: Wei Wang <wvw@...gle.com>
> ---
>  drivers/thermal/thermal_core.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> index d4481cc8958f..c28271817e43 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -304,7 +304,7 @@ static void thermal_zone_device_set_polling(struct thermal_zone_device *tz,
>  				 &tz->poll_queue,
>  				 msecs_to_jiffies(delay));
>  	else
> -		cancel_delayed_work_sync(&tz->poll_queue);
> +		cancel_delayed_work(&tz->poll_queue);
>  }
>  
>  static void monitor_thermal_zone(struct thermal_zone_device *tz)
> @@ -1414,7 +1414,7 @@ void thermal_zone_device_unregister(struct thermal_zone_device *tz)
>  
>  	mutex_unlock(&thermal_list_lock);
>  
> -	thermal_zone_device_set_polling(tz, 0);
> +	cancel_delayed_work_sync(&tz->poll_queue);
>  
>  	thermal_set_governor(tz, NULL);
>  
> 


-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ