[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <SJ0PR11MB5008B5F59CA94D2C3B0E50CDD76B9@SJ0PR11MB5008.namprd11.prod.outlook.com>
Date: Fri, 28 Apr 2023 14:24:13 +0000
From: "Kumar, M Chetan" <m.chetan.kumar@...el.com>
To: Samuel Wein PhD <sam@...wein.com>
CC: Jakub Kicinski <kuba@...nel.org>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
linuxwwan <linuxwwan@...el.com>
Subject: RE: NULL pointer dereference when removing xmm7360 PCI device
> -----Original Message-----
> From: Samuel Wein PhD <sam@...wein.com>
> Sent: Friday, April 28, 2023 3:18 PM
> To: Kumar, M Chetan <m.chetan.kumar@...el.com>
> Cc: Jakub Kicinski <kuba@...nel.org>; netdev@...r.kernel.org; linuxwwan
> <linuxwwan@...el.com>
> Subject: RE: NULL pointer dereference when removing xmm7360 PCI device
>
> I've added the full set of logs to the gist. The sequence was:
> boot PC
> suspend PC
> resume PC
> repeat the following several (it varies) times until the error:
> `sudo echo 1 > /sys/devices/pci0000:00/0000:00:1c.0/0000:02:00.0/remove`
> `echo 1 > /sys/devices/pci0000:00/0000:00:1c.0/rescan`
>
> The XMM7360 card doesn't resume properly from suspend, with the AT
> devices not responding to any commands. I've been working off of some of
> Shane Parslow's code to try to get ModemManager control of the device and
> was basically trying a bunch of ways to try to reset the card, this one clearly
> didn't work, but it also clearly exposed some weirdness in how the kernel
> handled the attempt. The fact that there is not a proper MBIM interface for
> this card really makes managing it difficult.
>
>
> ------- Original Message -------
> On Friday, April 28th, 2023 at 11:21 AM, Kumar, M Chetan
> <m.chetan.kumar@...el.com> wrote:
>
>
> >
> >
> > > -----Original Message-----
> >
> > > From: Jakub Kicinski kuba@...nel.org
> > > Sent: Friday, April 28, 2023 2:38 AM
> > > To: Kumar, M Chetan m.chetan.kumar@...el.com
> > > Cc: Samuel Wein PhD sam@...wein.com; netdev@...r.kernel.org;
> > > linuxwwan linuxwwan@...el.com
> > > Subject: Re: NULL pointer dereference when removing xmm7360 PCI
> > > device
> > >
> > > On Thu, 27 Apr 2023 10:31:29 +0000 Samuel Wein PhD wrote:
> > >
> > > > Hi Folks,
> > > > I've been trying to get the xmm7360 working with IOSM and the
> > > > ModemManager. This has been what my highschool advisor would call
> > > > a "learning process".
> > > > When trying `echo 1 >
> > > > /sys/devices/pci0000:00/0000:00:1c.0/0000:02:00.0/remove` I get a
> > > > variety of errors. One of these is a kernel error
> > > > `2023-04-27T12:23:38.937223+02:00 Nase kernel: [ 587.997430] BUG:
> > > > kernel NULL pointer dereference, address: 0000000000000048
> > > > 2023-04-27T12:23:38.937237+02:00 Nase kernel: [ 587.997447] #PF:
> > > > supervisor read access in kernel mode
> > > > 2023-04-27T12:23:38.937238+02:00 Nase kernel: [ 587.997455] #PF:
> > > > error_code(0x0000) - not-present page
> > > > 2023-04-27T12:23:38.937241+02:00 Nase kernel: [ 587.997463] PGD 0
> > > > P4D 0 2023-04-27T12:23:38.937242+02:00 Nase kernel: [ 587.997476]
> > > > Oops: 0000 [#1] PREEMPT SMP NOPTI 2023-04-
> 27T12:23:38.937242+02:00
> > > > Nase kernel: [ 587.997489] CPU: 1 PID: 4767 Comm: bash Not tainted
> > > > 6.3.0-060300-generic #202304232030 ...` the full log is available
> > > > at
> > > > https://gist.github.com/poshul/0c5ffbde6106a71adcbc132d828dbcd7
> > > >
> > > > Steps to reproduce: Boot device with xmm7360 installed and in PCI
> > > > mode, place into suspend. Resume, and start issuing reset/remove
> > > > commands to the PCI interface (without properly unloading the IOSM
> module first).
> > > >
> > > > I'm not sure how widely applicable this is but wanted to at least report
> it.
> > >
> > > Intel folks, PTAL.
> >
> >
> > I tried reproducing the issue by following the steps you mentioned but
> > so far could not reproduce it. Could you please share the logs from boot-up
> and procedure you carried out in steps.
> >
> > Once you boot-up the laptop, driver will be in working condition why
> device removal ?
There seems to be some issue at device side due to which communication is not
getting restored after suspend exit.
At some point device communication is broken and wwan is not initialized.
In remove path, accessing wwan to release resource is leading to NULL pointer
exception.
Error handling need to be corrected. Try the patch [1] it should fix.
Another thing for restoring the communication from such state, remove and rescan is
not enough you may have to follow below procedure for a graceful recover.
echo 1 > /sys/bus/pci/drivers/iosm/0000:01:00.0/reset -> replace BDF
echo 1 > /sys/bus/pci/drivers/iosm/0000:01:00.0/remove
echo 1 > /sys/bus/pci/rescan
[1]
diff --git a/drivers/net/wwan/iosm/iosm_ipc_imem.c b/drivers/net/wwan/iosm/iosm_ipc_imem.c
index c066b0040a3f..0b4efd2571dd 100644
--- a/drivers/net/wwan/iosm/iosm_ipc_imem.c
+++ b/drivers/net/wwan/iosm/iosm_ipc_imem.c
@@ -565,24 +565,32 @@ static void ipc_imem_run_state_worker(struct work_struct *instance)
struct ipc_mux_config mux_cfg;
struct iosm_imem *ipc_imem;
u8 ctrl_chl_idx = 0;
+ int ret;
ipc_imem = container_of(instance, struct iosm_imem, run_state_worker);
if (ipc_imem->phase != IPC_P_RUN) {
dev_err(ipc_imem->dev,
"Modem link down. Exit run state worker.");
- return;
+ goto err_cp_phase;
}
if (test_and_clear_bit(IOSM_DEVLINK_INIT, &ipc_imem->flag))
ipc_devlink_deinit(ipc_imem->ipc_devlink);
- if (!ipc_imem_setup_cp_mux_cap_init(ipc_imem, &mux_cfg))
- ipc_imem->mux = ipc_mux_init(&mux_cfg, ipc_imem);
+ ret = ipc_imem_setup_cp_mux_cap_init(ipc_imem, &mux_cfg);
+ if (ret < 0)
+ goto err_cap_init;
+
+ ipc_imem->mux = ipc_mux_init(&mux_cfg, ipc_imem);
+ if (!ipc_imem->mux)
+ goto err_mux_init;
+
+ ret = ipc_imem_wwan_channel_init(ipc_imem, mux_cfg.protocol);
+ if (ret < 0)
+ goto err_channel_init;
- ipc_imem_wwan_channel_init(ipc_imem, mux_cfg.protocol);
- if (ipc_imem->mux)
- ipc_imem->mux->wwan = ipc_imem->wwan;
+ ipc_imem->mux->wwan = ipc_imem->wwan;
while (ctrl_chl_idx < IPC_MEM_MAX_CHANNELS) {
if (!ipc_chnl_cfg_get(&chnl_cfg_port, ctrl_chl_idx)) {
@@ -622,6 +630,15 @@ static void ipc_imem_run_state_worker(struct work_struct *instance)
/* Complete all memory stores after setting bit */
smp_mb__after_atomic();
+
+ return;
+
+err_channel_init:
+ ipc_mux_deinit(ipc_imem->mux);
+err_mux_init:
+err_cap_init:
+err_cp_phase:
+ ipc_uevent_send(ipc_imem->dev, UEVENT_CD_READY_LINK_DOWN);
}
static void ipc_imem_handle_irq(struct iosm_imem *ipc_imem, int irq)
diff --git a/drivers/net/wwan/iosm/iosm_ipc_imem_ops.c b/drivers/net/wwan/iosm/iosm_ipc_imem_ops.c
index 66b90cc4c346..109cf8930488 100644
--- a/drivers/net/wwan/iosm/iosm_ipc_imem_ops.c
+++ b/drivers/net/wwan/iosm/iosm_ipc_imem_ops.c
@@ -77,8 +77,8 @@ int ipc_imem_sys_wwan_transmit(struct iosm_imem *ipc_imem,
}
/* Initialize wwan channel */
-void ipc_imem_wwan_channel_init(struct iosm_imem *ipc_imem,
- enum ipc_mux_protocol mux_type)
+int ipc_imem_wwan_channel_init(struct iosm_imem *ipc_imem,
+ enum ipc_mux_protocol mux_type)
{
struct ipc_chnl_cfg chnl_cfg = { 0 };
@@ -87,7 +87,7 @@ void ipc_imem_wwan_channel_init(struct iosm_imem *ipc_imem,
/* If modem version is invalid (0xffffffff), do not initialize WWAN. */
if (ipc_imem->cp_version == -1) {
dev_err(ipc_imem->dev, "invalid CP version");
- return;
+ return -EIO;
}
ipc_chnl_cfg_get(&chnl_cfg, ipc_imem->nr_of_channels);
@@ -104,9 +104,13 @@ void ipc_imem_wwan_channel_init(struct iosm_imem *ipc_imem,
/* WWAN registration. */
ipc_imem->wwan = ipc_wwan_init(ipc_imem, ipc_imem->dev);
- if (!ipc_imem->wwan)
+ if (!ipc_imem->wwan) {
dev_err(ipc_imem->dev,
"failed to register the ipc_wwan interfaces");
+ return -ENOMEM;
+ }
+
+ return 0;
}
/* Map SKB to DMA for transfer */
diff --git a/drivers/net/wwan/iosm/iosm_ipc_imem_ops.h b/drivers/net/wwan/iosm/iosm_ipc_imem_ops.h
index f8afb217d9e2..026c5bd0f999 100644
--- a/drivers/net/wwan/iosm/iosm_ipc_imem_ops.h
+++ b/drivers/net/wwan/iosm/iosm_ipc_imem_ops.h
@@ -91,9 +91,11 @@ int ipc_imem_sys_wwan_transmit(struct iosm_imem *ipc_imem, int if_id,
* MUX.
* @ipc_imem: Pointer to iosm_imem struct.
* @mux_type: Type of mux protocol.
+ *
+ * Return: 0 on success and failure value on error
*/
-void ipc_imem_wwan_channel_init(struct iosm_imem *ipc_imem,
- enum ipc_mux_protocol mux_type);
+int ipc_imem_wwan_channel_init(struct iosm_imem *ipc_imem,
+ enum ipc_mux_protocol mux_type);
/**
* ipc_imem_sys_devlink_open - Open a Flash/CD Channel link to CP
Powered by blists - more mailing lists