[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID:
<DB9SPRMB00226E19F9426865663A83A28CA6A@DB9SPRMB0022.eurprd04.prod.outlook.com>
Date: Thu, 4 Dec 2025 11:21:14 +0000
From: Xu Yang <xu.yang_2@....com>
To: Michael Nazzareno Trimarchi <michael@...rulasolutions.com>
CC: "badhri@...gle.com" <badhri@...gle.com>, "heikki.krogerus@...ux.intel.com"
<heikki.krogerus@...ux.intel.com>, "gregkh@...uxfoundation.org"
<gregkh@...uxfoundation.org>, "linux@...ck-us.net" <linux@...ck-us.net>,
"linux-usb@...r.kernel.org" <linux-usb@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, Jun Li
<jun.li@....com>, "imx@...ts.linux.dev" <imx@...ts.linux.dev>
Subject: RE: [EXT] Re: [PATCH] usb: typec: tcpm: make kthread worker freezable
> Hi
>
> On Tue, Dec 2, 2025 at 9:44 AM Xu Yang <xu.yang_2@....com> wrote:
> >
> > It's observed that tcpm kthread worker may execute some works at the
> > very end of system suspend or the very beginning of system resume stage.
>
> Please clarify if this works is needed to be completed before the system suspend/resume
In this issue, the work is sending a single source capabilities message. According to Type-C PD
Spec, the source port will send source capabilities message every 150ms for at least 50 times
until it received a GoodCRC message. In my opinion, if the worker starts the work, it needs to
be completed as soon as possible. I suppose it should be completed before system suspend if
it gets scheduled before system suspend.
>
> > The kthread work itself won't bring any issues, but if it access some
> > HW resource during this period, the system may hung there because almost
> > all of the resources are inaccessible at this point.
> >
> > Take below kernel dump as example, if the source port hasn't finished
> > sending Source Capabilities message when system enters into suspend, it
> > will continue do the thing as long as it gets scheduled. However, the i2c
> > resource is inaccessible before system resume. Then the system is hung.
> >
> > Fix it by making kthread worker freezable.
> >
> > $ echo mem > /sys/power/state
> > [ 37.605215] PM: suspend entry (deep)
> > [ 37.616067] Filesystems sync: 0.007 seconds
> > [ 37.633106] Freezing user space processes
> > [ 37.639444] Freezing user space processes completed (elapsed 0.001 seconds)
> > [ 37.646496] OOM killer disabled.
> > [ 37.649745] Freezing remaining freezable tasks
> > [ 37.655695] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
> > [ 37.689794] fec 5b040000.ethernet eth0: Link is Down
> > [ 37.713391] PM: suspend devices took 0.052 seconds
> > [ 37.741175] Disabling non-boot CPUs ...
> > [ 37.747120] psci: CPU5 killed (polled 0 ms)
> > [ 37.754129] psci: CPU4 killed (polled 0 ms)
> > [ 37.762217] psci: CPU3 killed (polled 0 ms)
> > [ 37.770037] psci: CPU2 killed (polled 0 ms)
> > [ 37.776936] psci: CPU1 killed (polled 4 ms)
> > [ 37.782481] Enabling non-boot CPUs ...
> > [ 37.787991] Detected VIPT I-cache on CPU1
> > [ 37.788043] GICv3: CPU1: found redistributor 1 region 0:0x0000000051b20000
> > [ 37.788093] CPU1: Booted secondary processor 0x0000000001 [0x410fd034]
> > [ 37.789587] CPU1 is up
> > [ 37.810632] Detected VIPT I-cache on CPU2
> > [ 37.810661] GICv3: CPU2: found redistributor 2 region 0:0x0000000051b40000
> > [ 37.810689] CPU2: Booted secondary processor 0x0000000002 [0x410fd034]
> > [ 37.811714] CPU2 is up
> > [ 37.833013] Detected VIPT I-cache on CPU3
> > [ 37.833042] GICv3: CPU3: found redistributor 3 region 0:0x0000000051b60000
> > [ 37.833071] CPU3: Booted secondary processor 0x0000000003 [0x410fd034]
> > [ 37.834201] CPU3 is up
> > [ 37.856437] Detected PIPT I-cache on CPU4
> > [ 37.856469] GICv3: CPU4: found redistributor 100 region 0:0x0000000051b80000
> > [ 37.856501] CPU4: Booted secondary processor 0x0000000100 [0x410fd082]
> > [ 37.857651] CPU4 is up
> > [ 37.872890] SError Interrupt on CPU2, code 0x00000000bf000002 -- SError
> > [ 37.872902] CPU: 2 UID: 0 PID: 147 Comm: 2-0051 Tainted: G M 6.18.0-rc7-06207-gee9dedcfd432-dirty #396 PREEMPT
> > [ 37.872912] Tainted: [M]=MACHINE_CHECK
> > [ 37.872915] Hardware name: Freescale i.MX8QM MEK (DT)
> > [ 37.872919] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > [ 37.872926] pc : lpi2c_imx_xfer_common+0x150/0xff0
> > [ 37.872942] lr : lpi2c_imx_xfer_common+0x54/0xff0
> > [ 37.872949] sp : ffff80008358b960
> > [ 37.872952] x29: ffff80008358b9e0 x28: 0000000000000001 x27: ffff000810bcc080
> > [ 37.872964] x26: 0000000000000000 x25: ffff8000820e9000 x24: 0000000000000000
> > [ 37.872973] x23: 0000000000000001 x22: ffff8000820e7000 x21: 0000000000000001
> > [ 37.872981] x20: ffff80008358bae8 x19: ffff000810b4c010 x18: 000000000000000a
> > [ 37.872990] x17: ffff00081ab43f00 x16: 0000000000000002 x15: 0000000000000000
> > [ 37.872999] x14: 0000000000000001 x13: 00000000ffff0a10 x12: 0000000000000006
> > [ 37.873008] x11: ffff00081a38db07 x10: 0000000000000000 x9 : 0000000000000004
> > [ 37.873016] x8 : 0000000022b63cbf x7 : 00000000016e3600 x6 : 0000000000000000
> > [ 37.873025] x5 : 0000000000000002 x4 : 00000000000186a0 x3 : 00000000000000c0
> > [ 37.873033] x2 : 0000000000000002 x1 : 0000000000000018 x0 : 0000000000000023
> > [ 37.873044] Kernel panic - not syncing: Asynchronous SError Interrupt
> > [ 37.873050] CPU: 2 UID: 0 PID: 147 Comm: 2-0051 Tainted: G M 6.18.0-rc7-06207-gee9dedcfd432-dirty #396 PREEMPT
> > [ 37.873058] Tainted: [M]=MACHINE_CHECK
> > [ 37.873061] Hardware name: Freescale i.MX8QM MEK (DT)
> > [ 37.873064] Call trace:
> > [ 37.873068] show_stack+0x18/0x30 (C)
> > [ 37.873081] dump_stack_lvl+0x60/0x80
> > [ 37.873091] dump_stack+0x18/0x24
> > [ 37.873100] vpanic+0xf8/0x2dc
> > [ 37.873108] abort+0x0/0x4
> > [ 37.873115] nmi_panic+0x64/0x70
> > [ 37.873125] arm64_serror_panic+0x70/0x80
> > [ 37.873134] do_serror+0x34/0x74
> > [ 37.873143] el1h_64_error_handler+0x38/0x60
> > [ 37.873156] el1h_64_error+0x6c/0x70
> > [ 37.873163] lpi2c_imx_xfer_common+0x150/0xff0 (P)
> > [ 37.873172] lpi2c_imx_xfer+0x14/0x20
> > [ 37.873179] __i2c_transfer+0x1b8/0x3c8
> > [ 37.873190] i2c_transfer+0x6c/0xf8
> > [ 37.873199] i2c_transfer_buffer_flags+0x5c/0xa0
> > [ 37.873208] regmap_i2c_write+0x20/0x60
> > [ 37.873221] _regmap_raw_write_impl+0x5cc/0x660
> > [ 37.873230] _regmap_bus_raw_write+0x60/0x80
> > [ 37.873238] _regmap_write+0x58/0xc0
> > [ 37.873246] regmap_write+0x48/0x74
> > [ 37.873254] tcpci_pd_transmit+0x10c/0x1a8
> > [ 37.873264] tcpm_pd_transmit+0x60/0x164
> > [ 37.873273] tcpm_pd_send_source_caps+0x12c/0x1c4
> > [ 37.873280] tcpm_state_machine_work+0xb10/0x3574
> > [ 37.873288] kthread_worker_fn+0xc4/0x178
> > [ 37.873300] kthread+0x12c/0x204
> > [ 37.873310] ret_from_fork+0x10/0x20
> > [ 37.873322] SMP: stopping secondary CPUs
> > [ 37.875528] Kernel Offset: disabled
> > [ 37.875531] CPU features: 0x080000,04105800,40004001,0400421b
> > [ 37.875536] Memory Limit: none
> > [ 38.148805] ---[ end Kernel panic - not syncing: Asynchronous SError Interrupt ]---
> >
> > Fixes: 3ed8e1c2ac99 ("usb: typec: tcpm: Migrate workqueue to RT priority for processing events")
>
> I think that this does not Fix this sha because most probably it moves the failure window,
How do you know that? Do you think which commit should this fix?
>
> > Cc: stable@...r.kernel.org
> > Signed-off-by: Xu Yang <xu.yang_2@....com>
> > ---
> > drivers/usb/typec/tcpm/tcpm.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/usb/typec/tcpm/tcpm.c b/drivers/usb/typec/tcpm/tcpm.c
> > index 4ca2746ce16b..2196de231b9b 100644
> > --- a/drivers/usb/typec/tcpm/tcpm.c
> > +++ b/drivers/usb/typec/tcpm/tcpm.c
> > @@ -7836,7 +7836,7 @@ struct tcpm_port *tcpm_register_port(struct device *dev, struct tcpc_dev *tcpc)
> > mutex_init(&port->lock);
> > mutex_init(&port->swap_lock);
> >
> > - port->wq = kthread_run_worker(0, dev_name(dev));
> > + port->wq = kthread_run_worker(KTW_FREEZABLE, dev_name(dev));
>
> This flags as far I can see has no user in all the linux kernel, so this let me think that this general
> problem is addressed differently by other drivers
Yes, I do see no user to user this flag. For this issue, if it's caused by other drivers it depends on, do you have
any better suggestions? Should all dependent drivers be fixed? I suppose tcpm should stop further jobs when
the system is going to suspend. :)
Thanks,
Xu Yang
>
> Michael
>
>
> > if (IS_ERR(port->wq))
> > return ERR_CAST(port->wq);
> > sched_set_fifo(port->wq->task);
> > --
> > 2.34.1
> >
> >
Powered by blists - more mailing lists