lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <XLBOI5Re2TlLQ3z7cM_QQNNUm7LPesjdaoQ0xUS7mZbWkbQo72lVm8Re2Lmh-8RxBPNprLJD1KZb-bfzOkfuvN1FVgqTpbLk3JHjkaXpe44=@samwein.com>
Date:   Mon, 01 May 2023 16:00:19 +0000
From:   Samuel Wein PhD <sam@...wein.com>
To:     "Kumar, M Chetan" <m.chetan.kumar@...el.com>
Cc:     Jakub Kicinski <kuba@...nel.org>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        linuxwwan <linuxwwan@...el.com>
Subject: RE: NULL pointer dereference when removing xmm7360 PCI device

This appears to do the trick. I'm still having issues bringing the device back after suspend (even with the proper reset->reload->rescan sequence), so I will let you know if I find any more kernel issues.

Thanks!

Sam 




------- Original Message -------
On Friday, April 28th, 2023 at 4:24 PM, Kumar, M Chetan <m.chetan.kumar@...el.com> wrote:


> 
> 
> > -----Original Message-----
> 
> > From: Samuel Wein PhD sam@...wein.com
> > Sent: Friday, April 28, 2023 3:18 PM
> > To: Kumar, M Chetan m.chetan.kumar@...el.com
> > Cc: Jakub Kicinski kuba@...nel.org; netdev@...r.kernel.org; linuxwwan
> > linuxwwan@...el.com
> > Subject: RE: NULL pointer dereference when removing xmm7360 PCI device
> > 
> > I've added the full set of logs to the gist. The sequence was:
> > boot PC
> > suspend PC
> > resume PC
> > repeat the following several (it varies) times until the error:
> > `sudo echo 1 > /sys/devices/pci0000:00/0000:00:1c.0/0000:02:00.0/remove`
> > `echo 1 > /sys/devices/pci0000:00/0000:00:1c.0/rescan`
> > 
> > The XMM7360 card doesn't resume properly from suspend, with the AT
> > devices not responding to any commands. I've been working off of some of
> > Shane Parslow's code to try to get ModemManager control of the device and
> > was basically trying a bunch of ways to try to reset the card, this one clearly
> > didn't work, but it also clearly exposed some weirdness in how the kernel
> > handled the attempt. The fact that there is not a proper MBIM interface for
> > this card really makes managing it difficult.
> > 
> > ------- Original Message -------
> > On Friday, April 28th, 2023 at 11:21 AM, Kumar, M Chetan
> > m.chetan.kumar@...el.com wrote:
> > 
> > > > -----Original Message-----
> > > 
> > > > From: Jakub Kicinski kuba@...nel.org
> > > > Sent: Friday, April 28, 2023 2:38 AM
> > > > To: Kumar, M Chetan m.chetan.kumar@...el.com
> > > > Cc: Samuel Wein PhD sam@...wein.com; netdev@...r.kernel.org;
> > > > linuxwwan linuxwwan@...el.com
> > > > Subject: Re: NULL pointer dereference when removing xmm7360 PCI
> > > > device
> > > > 
> > > > On Thu, 27 Apr 2023 10:31:29 +0000 Samuel Wein PhD wrote:
> > > > 
> > > > > Hi Folks,
> > > > > I've been trying to get the xmm7360 working with IOSM and the
> > > > > ModemManager. This has been what my highschool advisor would call
> > > > > a "learning process".
> > > > > When trying `echo 1 > /sys/devices/pci0000:00/0000:00:1c.0/0000:02:00.0/remove` I get a
> > > > > variety of errors. One of these is a kernel error
> > > > > `2023-04-27T12:23:38.937223+02:00 Nase kernel: [ 587.997430] BUG: kernel NULL pointer dereference, address: 0000000000000048 2023-04-27T12:23:38.937237+02:00 Nase kernel: [ 587.997447] #PF: supervisor read access in kernel mode 2023-04-27T12:23:38.937238+02:00 Nase kernel: [ 587.997455] #PF: error_code(0x0000) - not-present page 2023-04-27T12:23:38.937241+02:00 Nase kernel: [ 587.997463] PGD 0 P4D 0 2023-04-27T12:23:38.937242+02:00 Nase kernel: [ 587.997476] Oops: 0000 [#1] PREEMPT SMP NOPTI 2023-04- 27T12:23:38.937242+02:00 Nase kernel: [ 587.997489] CPU: 1 PID: 4767 Comm: bash Not tainted 6.3.0-060300-generic #202304232030 ...` the full log is available
> > > > > at
> > > > > https://gist.github.com/poshul/0c5ffbde6106a71adcbc132d828dbcd7
> > > > > 
> > > > > Steps to reproduce: Boot device with xmm7360 installed and in PCI
> > > > > mode, place into suspend. Resume, and start issuing reset/remove
> > > > > commands to the PCI interface (without properly unloading the IOSM
> > > > > module first).
> > > > > 
> > > > > I'm not sure how widely applicable this is but wanted to at least report
> > > > > it.
> > > > 
> > > > Intel folks, PTAL.
> > > 
> > > I tried reproducing the issue by following the steps you mentioned but
> > > so far could not reproduce it. Could you please share the logs from boot-up
> > > and procedure you carried out in steps.
> > > 
> > > Once you boot-up the laptop, driver will be in working condition why
> > > device removal ?
> 
> 
> There seems to be some issue at device side due to which communication is not
> getting restored after suspend exit.
> 
> At some point device communication is broken and wwan is not initialized.
> In remove path, accessing wwan to release resource is leading to NULL pointer
> exception.
> 
> Error handling need to be corrected. Try the patch [1] it should fix.
> 
> Another thing for restoring the communication from such state, remove and rescan is
> not enough you may have to follow below procedure for a graceful recover.
> 
> echo 1 > /sys/bus/pci/drivers/iosm/0000:01:00.0/reset -> replace BDF
> 
> echo 1 > /sys/bus/pci/drivers/iosm/0000:01:00.0/remove
> 
> echo 1 > /sys/bus/pci/rescan
> 
> 
> [1]
> diff --git a/drivers/net/wwan/iosm/iosm_ipc_imem.c b/drivers/net/wwan/iosm/iosm_ipc_imem.c
> index c066b0040a3f..0b4efd2571dd 100644
> --- a/drivers/net/wwan/iosm/iosm_ipc_imem.c
> +++ b/drivers/net/wwan/iosm/iosm_ipc_imem.c
> @@ -565,24 +565,32 @@ static void ipc_imem_run_state_worker(struct work_struct *instance)
> struct ipc_mux_config mux_cfg;
> struct iosm_imem *ipc_imem;
> u8 ctrl_chl_idx = 0;
> + int ret;
> 
> ipc_imem = container_of(instance, struct iosm_imem, run_state_worker);
> 
> if (ipc_imem->phase != IPC_P_RUN) {
> 
> dev_err(ipc_imem->dev,
> 
> "Modem link down. Exit run state worker.");
> - return;
> + goto err_cp_phase;
> }
> 
> if (test_and_clear_bit(IOSM_DEVLINK_INIT, &ipc_imem->flag))
> 
> ipc_devlink_deinit(ipc_imem->ipc_devlink);
> 
> 
> - if (!ipc_imem_setup_cp_mux_cap_init(ipc_imem, &mux_cfg))
> - ipc_imem->mux = ipc_mux_init(&mux_cfg, ipc_imem);
> 
> + ret = ipc_imem_setup_cp_mux_cap_init(ipc_imem, &mux_cfg);
> + if (ret < 0)
> + goto err_cap_init;
> +
> + ipc_imem->mux = ipc_mux_init(&mux_cfg, ipc_imem);
> 
> + if (!ipc_imem->mux)
> 
> + goto err_mux_init;
> +
> + ret = ipc_imem_wwan_channel_init(ipc_imem, mux_cfg.protocol);
> + if (ret < 0)
> + goto err_channel_init;
> 
> - ipc_imem_wwan_channel_init(ipc_imem, mux_cfg.protocol);
> - if (ipc_imem->mux)
> 
> - ipc_imem->mux->wwan = ipc_imem->wwan;
> 
> + ipc_imem->mux->wwan = ipc_imem->wwan;
> 
> 
> while (ctrl_chl_idx < IPC_MEM_MAX_CHANNELS) {
> if (!ipc_chnl_cfg_get(&chnl_cfg_port, ctrl_chl_idx)) {
> @@ -622,6 +630,15 @@ static void ipc_imem_run_state_worker(struct work_struct instance)
> 
> / Complete all memory stores after setting bit */
> smp_mb__after_atomic();
> +
> + return;
> +
> +err_channel_init:
> + ipc_mux_deinit(ipc_imem->mux);
> 
> +err_mux_init:
> +err_cap_init:
> +err_cp_phase:
> + ipc_uevent_send(ipc_imem->dev, UEVENT_CD_READY_LINK_DOWN);
> 
> }
> 
> static void ipc_imem_handle_irq(struct iosm_imem *ipc_imem, int irq)
> diff --git a/drivers/net/wwan/iosm/iosm_ipc_imem_ops.c b/drivers/net/wwan/iosm/iosm_ipc_imem_ops.c
> index 66b90cc4c346..109cf8930488 100644
> --- a/drivers/net/wwan/iosm/iosm_ipc_imem_ops.c
> +++ b/drivers/net/wwan/iosm/iosm_ipc_imem_ops.c
> @@ -77,8 +77,8 @@ int ipc_imem_sys_wwan_transmit(struct iosm_imem ipc_imem,
> }
> 
> / Initialize wwan channel */
> -void ipc_imem_wwan_channel_init(struct iosm_imem *ipc_imem,
> - enum ipc_mux_protocol mux_type)
> +int ipc_imem_wwan_channel_init(struct iosm_imem *ipc_imem,
> + enum ipc_mux_protocol mux_type)
> {
> struct ipc_chnl_cfg chnl_cfg = { 0 };
> 
> @@ -87,7 +87,7 @@ void ipc_imem_wwan_channel_init(struct iosm_imem ipc_imem,
> / If modem version is invalid (0xffffffff), do not initialize WWAN. */
> if (ipc_imem->cp_version == -1) {
> 
> dev_err(ipc_imem->dev, "invalid CP version");
> 
> - return;
> + return -EIO;
> }
> 
> ipc_chnl_cfg_get(&chnl_cfg, ipc_imem->nr_of_channels);
> 
> @@ -104,9 +104,13 @@ void ipc_imem_wwan_channel_init(struct iosm_imem ipc_imem,
> 
> / WWAN registration. */
> ipc_imem->wwan = ipc_wwan_init(ipc_imem, ipc_imem->dev);
> 
> - if (!ipc_imem->wwan)
> 
> + if (!ipc_imem->wwan) {
> 
> dev_err(ipc_imem->dev,
> 
> "failed to register the ipc_wwan interfaces");
> + return -ENOMEM;
> + }
> +
> + return 0;
> }
> 
> /* Map SKB to DMA for transfer */
> diff --git a/drivers/net/wwan/iosm/iosm_ipc_imem_ops.h b/drivers/net/wwan/iosm/iosm_ipc_imem_ops.h
> index f8afb217d9e2..026c5bd0f999 100644
> --- a/drivers/net/wwan/iosm/iosm_ipc_imem_ops.h
> +++ b/drivers/net/wwan/iosm/iosm_ipc_imem_ops.h
> @@ -91,9 +91,11 @@ int ipc_imem_sys_wwan_transmit(struct iosm_imem *ipc_imem, int if_id,
> * MUX.
> * @ipc_imem: Pointer to iosm_imem struct.
> * @mux_type: Type of mux protocol.
> + *
> + * Return: 0 on success and failure value on error
> */
> -void ipc_imem_wwan_channel_init(struct iosm_imem ipc_imem,
> - enum ipc_mux_protocol mux_type);
> +int ipc_imem_wwan_channel_init(struct iosm_imem ipc_imem,
> + enum ipc_mux_protocol mux_type);
> 
> /
> * ipc_imem_sys_devlink_open - Open a Flash/CD Channel link to CP

Powered by blists - more mailing lists