linux-kernel - Re: [EXT] Re: [PATCH] usb: typec: tcpm: make kthread worker freezable

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAOf5uwkf77vA=R21tDHSqFU_ZFd_HB4mW8r-neLF_9pXqw-mNw@mail.gmail.com>
Date: Fri, 5 Dec 2025 17:01:01 +0100
From: Michael Nazzareno Trimarchi <michael@...rulasolutions.com>
To: Xu Yang <xu.yang_2@....com>
Cc: "badhri@...gle.com" <badhri@...gle.com>, 
	"heikki.krogerus@...ux.intel.com" <heikki.krogerus@...ux.intel.com>, 
	"gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>, "linux@...ck-us.net" <linux@...ck-us.net>, 
	"linux-usb@...r.kernel.org" <linux-usb@...r.kernel.org>, 
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, Jun Li <jun.li@....com>, 
	"imx@...ts.linux.dev" <imx@...ts.linux.dev>
Subject: Re: [EXT] Re: [PATCH] usb: typec: tcpm: make kthread worker freezable

Hi

On Thu, Dec 4, 2025 at 12:21 PM Xu Yang <xu.yang_2@....com> wrote:
>
>
> > Hi
> >
> > On Tue, Dec 2, 2025 at 9:44 AM Xu Yang <xu.yang_2@....com> wrote:
> > >
> > > It's observed that tcpm kthread worker may execute some works at the
> > > very end of system suspend or the very beginning of system resume stage.
> >
> > Please clarify if this works is needed to be completed before the system suspend/resume
>
> In this issue, the work is sending a single source capabilities message. According to Type-C PD
> Spec, the source port will send source capabilities message every 150ms for at least 50 times
> until it received a GoodCRC message.  In my opinion, if the worker starts the work, it needs to
> be completed as soon as possible. I suppose it should be completed before system suspend if
> it gets scheduled before system suspend.

Yes but what you are doing here is to make the thread freezable during
suspend so it freezes before
suspend and it means that the fault you get before can not happen
anymore because the thread is stopped
and can not access to driver and driver will be suspended later. Now
this question is if the thread
should finish its current job and go idle in the state machine before
allowing the machine to suspend or not.

>
> >
> > > The kthread work itself won't bring any issues, but if it access some
> > > HW resource during this period, the system may hung there because almost
> > > all of the resources are inaccessible at this point.
> > >
> > > Take below kernel dump as example, if the source port hasn't finished
> > > sending Source Capabilities message when system enters into suspend, it
> > > will continue do the thing as long as it gets scheduled. However, the i2c
> > > resource is inaccessible before system resume. Then the system is hung.
> > >
> > > Fix it by making kthread worker freezable.
> > >
> > > $ echo mem > /sys/power/state
> > > [   37.605215] PM: suspend entry (deep)
> > > [   37.616067] Filesystems sync: 0.007 seconds
> > > [   37.633106] Freezing user space processes
> > > [   37.639444] Freezing user space processes completed (elapsed 0.001 seconds)
> > > [   37.646496] OOM killer disabled.
> > > [   37.649745] Freezing remaining freezable tasks
> > > [   37.655695] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
> > > [   37.689794] fec 5b040000.ethernet eth0: Link is Down
> > > [   37.713391] PM: suspend devices took 0.052 seconds
> > > [   37.741175] Disabling non-boot CPUs ...
> > > [   37.747120] psci: CPU5 killed (polled 0 ms)
> > > [   37.754129] psci: CPU4 killed (polled 0 ms)
> > > [   37.762217] psci: CPU3 killed (polled 0 ms)
> > > [   37.770037] psci: CPU2 killed (polled 0 ms)
> > > [   37.776936] psci: CPU1 killed (polled 4 ms)
> > > [   37.782481] Enabling non-boot CPUs ...
> > > [   37.787991] Detected VIPT I-cache on CPU1
> > > [   37.788043] GICv3: CPU1: found redistributor 1 region 0:0x0000000051b20000
> > > [   37.788093] CPU1: Booted secondary processor 0x0000000001 [0x410fd034]
> > > [   37.789587] CPU1 is up
> > > [   37.810632] Detected VIPT I-cache on CPU2
> > > [   37.810661] GICv3: CPU2: found redistributor 2 region 0:0x0000000051b40000
> > > [   37.810689] CPU2: Booted secondary processor 0x0000000002 [0x410fd034]
> > > [   37.811714] CPU2 is up
> > > [   37.833013] Detected VIPT I-cache on CPU3
> > > [   37.833042] GICv3: CPU3: found redistributor 3 region 0:0x0000000051b60000
> > > [   37.833071] CPU3: Booted secondary processor 0x0000000003 [0x410fd034]
> > > [   37.834201] CPU3 is up
> > > [   37.856437] Detected PIPT I-cache on CPU4
> > > [   37.856469] GICv3: CPU4: found redistributor 100 region 0:0x0000000051b80000
> > > [   37.856501] CPU4: Booted secondary processor 0x0000000100 [0x410fd082]
> > > [   37.857651] CPU4 is up
> > > [   37.872890] SError Interrupt on CPU2, code 0x00000000bf000002 -- SError
> > > [   37.872902] CPU: 2 UID: 0 PID: 147 Comm: 2-0051 Tainted: G   M                6.18.0-rc7-06207-gee9dedcfd432-dirty #396 PREEMPT
> > > [   37.872912] Tainted: [M]=MACHINE_CHECK
> > > [   37.872915] Hardware name: Freescale i.MX8QM MEK (DT)
> > > [   37.872919] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > > [   37.872926] pc : lpi2c_imx_xfer_common+0x150/0xff0
> > > [   37.872942] lr : lpi2c_imx_xfer_common+0x54/0xff0
> > > [   37.872949] sp : ffff80008358b960
> > > [   37.872952] x29: ffff80008358b9e0 x28: 0000000000000001 x27: ffff000810bcc080
> > > [   37.872964] x26: 0000000000000000 x25: ffff8000820e9000 x24: 0000000000000000
> > > [   37.872973] x23: 0000000000000001 x22: ffff8000820e7000 x21: 0000000000000001
> > > [   37.872981] x20: ffff80008358bae8 x19: ffff000810b4c010 x18: 000000000000000a
> > > [   37.872990] x17: ffff00081ab43f00 x16: 0000000000000002 x15: 0000000000000000
> > > [   37.872999] x14: 0000000000000001 x13: 00000000ffff0a10 x12: 0000000000000006
> > > [   37.873008] x11: ffff00081a38db07 x10: 0000000000000000 x9 : 0000000000000004
> > > [   37.873016] x8 : 0000000022b63cbf x7 : 00000000016e3600 x6 : 0000000000000000
> > > [   37.873025] x5 : 0000000000000002 x4 : 00000000000186a0 x3 : 00000000000000c0
> > > [   37.873033] x2 : 0000000000000002 x1 : 0000000000000018 x0 : 0000000000000023
> > > [   37.873044] Kernel panic - not syncing: Asynchronous SError Interrupt
> > > [   37.873050] CPU: 2 UID: 0 PID: 147 Comm: 2-0051 Tainted: G   M                6.18.0-rc7-06207-gee9dedcfd432-dirty #396 PREEMPT
> > > [   37.873058] Tainted: [M]=MACHINE_CHECK
> > > [   37.873061] Hardware name: Freescale i.MX8QM MEK (DT)
> > > [   37.873064] Call trace:
> > > [   37.873068]  show_stack+0x18/0x30 (C)
> > > [   37.873081]  dump_stack_lvl+0x60/0x80
> > > [   37.873091]  dump_stack+0x18/0x24
> > > [   37.873100]  vpanic+0xf8/0x2dc
> > > [   37.873108]  abort+0x0/0x4
> > > [   37.873115]  nmi_panic+0x64/0x70
> > > [   37.873125]  arm64_serror_panic+0x70/0x80
> > > [   37.873134]  do_serror+0x34/0x74
> > > [   37.873143]  el1h_64_error_handler+0x38/0x60
> > > [   37.873156]  el1h_64_error+0x6c/0x70
> > > [   37.873163]  lpi2c_imx_xfer_common+0x150/0xff0 (P)
> > > [   37.873172]  lpi2c_imx_xfer+0x14/0x20
> > > [   37.873179]  __i2c_transfer+0x1b8/0x3c8
> > > [   37.873190]  i2c_transfer+0x6c/0xf8
> > > [   37.873199]  i2c_transfer_buffer_flags+0x5c/0xa0
> > > [   37.873208]  regmap_i2c_write+0x20/0x60
> > > [   37.873221]  _regmap_raw_write_impl+0x5cc/0x660
> > > [   37.873230]  _regmap_bus_raw_write+0x60/0x80
> > > [   37.873238]  _regmap_write+0x58/0xc0
> > > [   37.873246]  regmap_write+0x48/0x74
> > > [   37.873254]  tcpci_pd_transmit+0x10c/0x1a8
> > > [   37.873264]  tcpm_pd_transmit+0x60/0x164
> > > [   37.873273]  tcpm_pd_send_source_caps+0x12c/0x1c4
> > > [   37.873280]  tcpm_state_machine_work+0xb10/0x3574
> > > [   37.873288]  kthread_worker_fn+0xc4/0x178
> > > [   37.873300]  kthread+0x12c/0x204
> > > [   37.873310]  ret_from_fork+0x10/0x20
> > > [   37.873322] SMP: stopping secondary CPUs
> > > [   37.875528] Kernel Offset: disabled
> > > [   37.875531] CPU features: 0x080000,04105800,40004001,0400421b
> > > [   37.875536] Memory Limit: none
> > > [   38.148805] ---[ end Kernel panic - not syncing: Asynchronous SError Interrupt ]---
> > >
> > > Fixes: 3ed8e1c2ac99 ("usb: typec: tcpm: Migrate workqueue to RT priority for processing events")
> >
> > I think that this does not Fix this sha because most probably it moves the failure window,
>
> How do you know that? Do you think which commit should this fix?
>

According to the commit they move the workqueue to rt priority. So it
really depends if the problem was
even present before this commit.

> >
> > > Cc: stable@...r.kernel.org
> > > Signed-off-by: Xu Yang <xu.yang_2@....com>
> > > ---
> > >  drivers/usb/typec/tcpm/tcpm.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/usb/typec/tcpm/tcpm.c b/drivers/usb/typec/tcpm/tcpm.c
> > > index 4ca2746ce16b..2196de231b9b 100644
> > > --- a/drivers/usb/typec/tcpm/tcpm.c
> > > +++ b/drivers/usb/typec/tcpm/tcpm.c
> > > @@ -7836,7 +7836,7 @@ struct tcpm_port *tcpm_register_port(struct device *dev, struct tcpc_dev *tcpc)
> > >         mutex_init(&port->lock);
> > >         mutex_init(&port->swap_lock);
> > >
> > > -       port->wq = kthread_run_worker(0, dev_name(dev));
> > > +       port->wq = kthread_run_worker(KTW_FREEZABLE, dev_name(dev));
> >
> > This flags as far I can see has no user in all the linux kernel, so this let me think that this general
> > problem is addressed differently by other drivers
>
> Yes, I do see no user to user this flag. For this issue, if it's caused by other drivers it depends on, do you have
> any better suggestions? Should all dependent drivers be fixed? I suppose tcpm should stop further jobs when
> the system is going to suspend. :)
>

No right now, I have no idea, but I think that before was not WQ_FREEZABLE

Michael



> Thanks,
> Xu Yang
>
> >
> > Michael
> >
> >
> > >         if (IS_ERR(port->wq))
> > >                 return ERR_CAST(port->wq);
> > >         sched_set_fifo(port->wq->task);
> > > --
> > > 2.34.1
> > >
> > >