linux-kernel - Re: [PATCH] PM: sleep: Avoid calling put_device() under dpm_list

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <77c95e21-747a-87d8-4145-f37f75e76e18@linux.intel.com>
Date:   Thu, 16 Dec 2021 14:27:29 +0100
From:   Thomas Hellström 
        <thomas.hellstrom@...ux.intel.com>
To:     "Rafael J. Wysocki" <rafael.j.wysocki@...el.com>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Intel Graphics Development <intel-gfx@...ts.freedesktop.org>
Subject: Re: [PATCH] PM: sleep: Avoid calling put_device() under dpm_list_mtx

Hi, Rafael,

On 11/4/21 18:26, Rafael J. Wysocki wrote:
> It is generally unsafe to call put_device() with dpm_list_mtx held,
> because the given device's release routine may carry out an action
> depending on that lock which then may deadlock, so modify the
> system-wide suspend and resume of devices to always drop dpm_list_mtx
> before calling put_device() (and adjust white space somewhat while
> at it).
>
> For instance, this prevents the following splat from showing up in
> the kernel log after a system resume in certain configurations:


<snip>


> @@ -1748,21 +1769,27 @@ int dpm_suspend(pm_message_t state)
>   		struct device *dev = to_device(dpm_prepared_list.prev);
>   
>   		get_device(dev);
> +
>   		mutex_unlock(&dpm_list_mtx);
>   
>   		error = device_suspend(dev);
>   
>   		mutex_lock(&dpm_list_mtx);
> +
>   		if (error) {
>   			pm_dev_err(dev, state, "", error);
>   			dpm_save_failed_dev(dev_name(dev));
> -			put_device(dev);
> -			break;
> -		}
> -		if (!list_empty(&dev->power.entry))
> +		} else if (!list_empty(&dev->power.entry)) {
>   			list_move(&dev->power.entry, &dpm_suspended_list);
> +		}
> +
> +		mutex_unlock(&dpm_list_mtx);
> +
>   		put_device(dev);
> -		if (async_error)
> +
> +		mutex_lock(&dpm_list_mtx);
> +
> +		if (error || async_error)
>   			break;
>   	}
>   	mutex_unlock(&dpm_list_mtx);
> @@ -1879,6 +1906,7 @@ int dpm_prepare(pm_message_t state)
>   		struct device *dev = to_device(dpm_list.next);
>   
>   		get_device(dev);
> +
>   		mutex_unlock(&dpm_list_mtx);
>   
>   		trace_device_pm_callback_start(dev, "", state.event);
> @@ -1886,21 +1914,23 @@ int dpm_prepare(pm_message_t state)
>   		trace_device_pm_callback_end(dev, error);
>   
>   		mutex_lock(&dpm_list_mtx);
> -		if (error) {
> -			if (error == -EAGAIN) {
> -				put_device(dev);
> -				error = 0;
> -				continue;
> -			}
> +
> +		if (!error) {
> +			dev->power.is_prepared = true;
> +			if (!list_empty(&dev->power.entry))
> +				list_move_tail(&dev->power.entry, &dpm_prepared_list);
> +		} else if (error == -EAGAIN) {
> +			error = 0;
> +		} else {
>   			dev_info(dev, "not prepared for power transition: code %d\n",
>   				 error);
> -			put_device(dev);
> -			break;

It appears the above break disappeared.


>   		}
> -		dev->power.is_prepared = true;
> -		if (!list_empty(&dev->power.entry))
> -			list_move_tail(&dev->power.entry, &dpm_prepared_list);
> +
> +		mutex_unlock(&dpm_list_mtx);
> +
>   		put_device(dev);

Should be

                  if (error)

                         break;

Here?

Symptoms is if we return an error from the device prepare callback, we 
end up spinning forever with little clue what's going on.


> +
> +		mutex_lock(&dpm_list_mtx);
>   	}
>   	mutex_unlock(&dpm_list_mtx);
>   	trace_suspend_resume(TPS("dpm_prepare"), state.event, false);

Thanks,

Thomas