linux-kernel - RE: [PATCH v2] acpi/ghes: Prevent sleeping with spinlock held

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <65d111eb87115_6c745294ac@dwillia2-xfh.jf.intel.com.notmuch>
Date: Sat, 17 Feb 2024 12:07:07 -0800
From: Dan Williams <dan.j.williams@...el.com>
To: Ira Weiny <ira.weiny@...el.com>, "Rafael J. Wysocki" <rafael@...nel.org>,
	Dan Williams <dan.j.williams@...el.com>, Jonathan Cameron
	<jonathan.cameron@...wei.com>, Smita Koralahalli
	<Smita.KoralahalliChannabasappa@....com>
CC: <linux-acpi@...r.kernel.org>, <linux-cxl@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>, Dan Carpenter <dan.carpenter@...aro.org>,
	"Ira Weiny" <ira.weiny@...el.com>
Subject: RE: [PATCH v2] acpi/ghes: Prevent sleeping with spinlock held

Ira Weiny wrote:
> Smatch caught that cxl_cper_post_event() is called with a spinlock held
> or preemption disabled.[1]  The callback takes the device lock to
> perform address translation and therefore might sleep.  The record data
> is released back to BIOS in ghes_clear_estatus() which requires it to be
> copied for use in the workqueue.
> 
> Copy the record to a lockless list and schedule a work item to process
> the record outside of atomic context.
> 
> [1] https://lore.kernel.org/all/b963c490-2c13-4b79-bbe7-34c6568423c7@moroto.mountain/
> 
> Reported-by: Dan Carpenter <dan.carpenter@...aro.org>
> Signed-off-by: Ira Weiny <ira.weiny@...el.com>
> ---
> Changes in v2:
> - djbw: device_lock() sleeps so we need to call the callback in process context
> - iweiny: create work queue to handle processing the callback
> - Link to v1: https://lore.kernel.org/r/20240202-cxl-cper-smatch-v1-1-7a4103c7f5a0@intel.com
> ---
>  drivers/acpi/apei/ghes.c | 44 +++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 41 insertions(+), 3 deletions(-)
> 
[..]
> +static DECLARE_WORK(cxl_cper_work, cxl_cper_work_fn);
> +
>  static void cxl_cper_post_event(enum cxl_event_type event_type,
>  				struct cxl_cper_event_rec *rec)
>  {
> +	struct cxl_cper_work_item *wi;
> +
>  	if (rec->hdr.length <= sizeof(rec->hdr) ||
>  	    rec->hdr.length > sizeof(*rec)) {
>  		pr_err(FW_WARN "CXL CPER Invalid section length (%u)\n",
> @@ -721,9 +752,16 @@ static void cxl_cper_post_event(enum cxl_event_type event_type,
>  		return;
>  	}
>  
> -	guard(rwsem_read)(&cxl_cper_rw_sem);
> -	if (cper_callback)
> -		cper_callback(event_type, rec);

Given a work function can be set atomically there is no need to create /
manage a registration lock. Set a 'struct work' instance to a CXL
provided routine on cxl_pci module load and restore it to a nop function
+ cancel_work_sync() on cxl_pci module exit.

> +	wi = kmalloc(sizeof(*wi), GFP_ATOMIC);

The system is already under distress trying to report an error it should
not dip into emergency memory reserves to report errors. Use a kfifo()
similar to how memory_failure_queue() avoids memory allocation in the
error reporting path.