[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <65d111eb87115_6c745294ac@dwillia2-xfh.jf.intel.com.notmuch>
Date: Sat, 17 Feb 2024 12:07:07 -0800
From: Dan Williams <dan.j.williams@...el.com>
To: Ira Weiny <ira.weiny@...el.com>, "Rafael J. Wysocki" <rafael@...nel.org>,
Dan Williams <dan.j.williams@...el.com>, Jonathan Cameron
<jonathan.cameron@...wei.com>, Smita Koralahalli
<Smita.KoralahalliChannabasappa@....com>
CC: <linux-acpi@...r.kernel.org>, <linux-cxl@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, Dan Carpenter <dan.carpenter@...aro.org>,
"Ira Weiny" <ira.weiny@...el.com>
Subject: RE: [PATCH v2] acpi/ghes: Prevent sleeping with spinlock held
Ira Weiny wrote:
> Smatch caught that cxl_cper_post_event() is called with a spinlock held
> or preemption disabled.[1] The callback takes the device lock to
> perform address translation and therefore might sleep. The record data
> is released back to BIOS in ghes_clear_estatus() which requires it to be
> copied for use in the workqueue.
>
> Copy the record to a lockless list and schedule a work item to process
> the record outside of atomic context.
>
> [1] https://lore.kernel.org/all/b963c490-2c13-4b79-bbe7-34c6568423c7@moroto.mountain/
>
> Reported-by: Dan Carpenter <dan.carpenter@...aro.org>
> Signed-off-by: Ira Weiny <ira.weiny@...el.com>
> ---
> Changes in v2:
> - djbw: device_lock() sleeps so we need to call the callback in process context
> - iweiny: create work queue to handle processing the callback
> - Link to v1: https://lore.kernel.org/r/20240202-cxl-cper-smatch-v1-1-7a4103c7f5a0@intel.com
> ---
> drivers/acpi/apei/ghes.c | 44 +++++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 41 insertions(+), 3 deletions(-)
>
[..]
> +static DECLARE_WORK(cxl_cper_work, cxl_cper_work_fn);
> +
> static void cxl_cper_post_event(enum cxl_event_type event_type,
> struct cxl_cper_event_rec *rec)
> {
> + struct cxl_cper_work_item *wi;
> +
> if (rec->hdr.length <= sizeof(rec->hdr) ||
> rec->hdr.length > sizeof(*rec)) {
> pr_err(FW_WARN "CXL CPER Invalid section length (%u)\n",
> @@ -721,9 +752,16 @@ static void cxl_cper_post_event(enum cxl_event_type event_type,
> return;
> }
>
> - guard(rwsem_read)(&cxl_cper_rw_sem);
> - if (cper_callback)
> - cper_callback(event_type, rec);
Given a work function can be set atomically there is no need to create /
manage a registration lock. Set a 'struct work' instance to a CXL
provided routine on cxl_pci module load and restore it to a nop function
+ cancel_work_sync() on cxl_pci module exit.
> + wi = kmalloc(sizeof(*wi), GFP_ATOMIC);
The system is already under distress trying to report an error it should
not dip into emergency memory reserves to report errors. Use a kfifo()
similar to how memory_failure_queue() avoids memory allocation in the
error reporting path.
Powered by blists - more mailing lists