linux-kernel - Re: [PATCH] USB:ohci:fix ohci interruption problem

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <df7d8692-9c6e-3bb1-2b41-924df36469ce@huawei.com>
Date:   Tue, 6 Apr 2021 21:12:29 +0800
From:   liulongfang <liulongfang@...wei.com>
To:     Alan Stern <stern@...land.harvard.edu>
CC:     <gregkh@...uxfoundation.org>, <mathias.nyman@...el.com>,
        <linux-usb@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
        <kong.kongxinwei@...ilicon.com>, <yisen.zhuang@...wei.com>
Subject: Re: [PATCH] USB:ohci:fix ohci interruption problem

 2021/4/2 23:26, Alan Stern 写道:
> On Fri, Apr 02, 2021 at 05:27:59PM +0800, Longfang Liu wrote:
>> The operating method of the system entering S4 sleep mode:
>> echo disk > /sys/power/state
> 
> This discussion is still not right.
> 
The operating method is:
echo reboot > /sys/power/disk
echo disk > /sys/power/state

>> When OHCI enters the S4 sleep state,
> 
> To start with, you should be talking about hibernation (also known as 
> suspend-to-disk), not S4.  When the system enters hibernation -- for 
> example, when you write "disk" to /sys/power/state -- the controller may 
> go into S4 or it may go into some other power-saving state.
> 
>>  the USB sleep process will call
>> check_root_hub_suspend() and ohci_bus_suspend() instead of
>> ohci_suspend() and ohci_bus_suspend(), this causes the OHCI interrupt
>> to not be closed.
> 
> This isn't true.  The procedure _does_ call ohci_suspend, through the 
> .poweroff callback in hcd-pci.c.  That callback goes to the 
> hcd_pci_suspend routine, which calls suspend_common and then 
> ohci_suspend.
> 
> However, these calls happen after the kernel image has be written to the 
> storage area on the disk.  As a result, any log messages produced during 
> the calls do not get saved, so they don't get reloaded when the system 
> resumes from hibernation, and they aren't present in the log after the 
> system wakes up.  That's why they didn't appear in the log you included 
> in an earlier email.  The only way to observe them is to use a remote > console, such as a network console.
>After adding dump_stack to ohci_suspend, do hibernation test,
.poweroff is not called, but .freeze is called, and these logs are
presented in dmesg:
[root@...alhost power]# echo reboot > disk
[root@...alhost power]# echo disk > state
[ 1883.631163] PM: hibernation: hibernation entry
[ 1883.701199] Filesystems sync: 0.058 seconds
[ 1883.705443] Freezing user space processes ... (elapsed 0.004 seconds) done.
[ 1883.717094] OOM killer disabled.
[ 1883.730258] PM: hibernation: Preallocating image memory
[ 1889.162453] PM: hibernation: Allocated 1020044 pages for snapshot
[ 1889.168564] PM: hibernation: Allocated 4080176 kbytes in 5.42 seconds (752.80 MB/s)
[ 1889.176215] Freezing remaining freezable tasks ... (elapsed 0.099 seconds) done.
[ 1889.285477] printk: Suspending console(s) (use no_console_suspend to debug)
...
[ 1889.325720] Call trace:
[ 1889.325734]  dump_backtrace+0x0/0x1e0
[ 1889.325742]  show_stack+0x2c/0x48
[ 1889.325766]  dump_stack+0xcc/0x104
[ 1889.325789]  ohci_suspend+0x38/0xd8 [ohci_hcd]
[ 1889.325823]  suspend_common+0xe0/0x160
[ 1889.325835]  hcd_pci_freeze+0x38/0x48
[ 1889.325853]  pci_pm_freeze+0x68/0x110
[ 1889.325881]  dpm_run_callback+0x4c/0x230
[ 1889.325891]  __device_suspend+0x108/0x4d8
[ 1889.325900]  async_suspend+0x34/0xb8
[ 1889.325907]  async_run_entry_fn+0x4c/0x118
[ 1889.325919]  process_one_work+0x1f0/0x4a0
[ 1889.325926]  worker_thread+0x48/0x460
[ 1889.325936]  kthread+0x160/0x168
[ 1889.325947]  ret_from_fork+0x10/0x18
...
[ 1895.297836] Call trace:
[ 1895.297846]  dump_backtrace+0x0/0x1e0
[ 1895.297880]  show_stack+0x2c/0x48
[ 1895.297925]  dump_stack+0xcc/0x104
[ 1895.297973] usb usb3: root hub lost power or was reset
[ 1895.297997]  ohci_resume+0x50/0x1a0 [ohci_hcd]
[ 1895.298057]  resume_common+0xa0/0x120
[ 1895.298071]  hcd_pci_restore+0x24/0x30
[ 1895.298084]  pci_pm_restore+0x64/0xb0
[ 1895.298101]  dpm_run_callback+0x4c/0x230
[ 1895.298113]  device_resume+0xdc/0x1c8
[ 1895.298125]  async_resume+0x30/0x60
[ 1895.298132]  async_run_entry_fn+0x4c/0x118
[ 1895.298141]  process_one_work+0x1f0/0x4a0
[ 1895.298148]  worker_thread+0x48/0x460
[ 1895.298159]  kthread+0x160/0x168
[ 1895.298171]  ret_from_fork+0x10/0x18
...
[ 1900.939779] OOM killer enabled.
[ 1900.942930] Restarting tasks ... done.
[ 1900.962630] PM: hibernation: hibernation exit

> In fact, that's pretty much the only way to debug problems that occur 
> during a hibernation transition.
> 
>> At this time, if just one device interrupt is reported. Since rh_state
>> has been changed to OHCI_RH_SUSPENDED after ohci_bus_suspend(), the
>> driver will not process and close this device interrupt.
> 
> That's not true either.  The ohci_irq routine does indeed process 
> interrupts even when rh_state is set to OHCI_RH_SUSPENDED.  How else 
> could it handle a device's wakeup request?
> 
>> It will cause
>> the entire system to be stuck during sleep, causing the device to
>> fail to respond.
> 
> During hibernation, the system is powered off.  Obviously the kernel is 
> not capable of handling interrupts at this time.
> 
> Also, why would a device interrupt be reported at this time?  What 
> causes the interrupt request?
> 
>> When the abnormal interruption reaches 100,000 times, the system will
>> forcibly close the interruption and make the device unusable.
>>
>> Because the root cause of the problem is that ohci_suspend is not
>> called to perform normal interrupt shutdown operations when the system
>> enters S4 sleep mode.
>>
>> Therefore, our solution is to specify freeze interface in this mode to
>> perform normal suspend_common() operations, and call ohci_suspend()
>> after check_root_hub_suspend() is executed through the suspend_common()
>> operation.
> 
> No.  The freeze interface does not need to power-down the controller.  
> All it needs to do is make sure that no communication between the 
> computer and the attached USB devices takes place, and this is handled 
> by ohci_bus_suspend.
> 
> Furthermore, it is a mistake for the freeze routine to change anything 
> unless the thaw routine reverses the change.  Your patch leaves the thaw 
> callback pointer set to NULL.
> 
>> After using this solution, it is verified by the stress test of sleep
>> wake up in S4 mode for a long time that this problem no longer occurs.
> 
> Something else must be happeneing, something you don't understand.
> 
> Alan Stern
> 
>> Signed-off-by: Longfang Liu <liulongfang@...wei.com>
>> ---
>>  drivers/usb/core/hcd-pci.c | 7 ++++++-
>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/usb/core/hcd-pci.c b/drivers/usb/core/hcd-pci.c
>> index 1547aa6..c5844a3 100644
>> --- a/drivers/usb/core/hcd-pci.c
>> +++ b/drivers/usb/core/hcd-pci.c
>> @@ -509,6 +509,11 @@ static int resume_common(struct device *dev, int event)
>>  
>>  #ifdef	CONFIG_PM_SLEEP
>>  
>> +static int hcd_pci_freeze(struct device *dev)
>> +{
>> +	return suspend_common(dev, device_may_wakeup(dev));
>> +}
>> +
>>  static int hcd_pci_suspend(struct device *dev)
>>  {
>>  	return suspend_common(dev, device_may_wakeup(dev));
>> @@ -605,7 +610,7 @@ const struct dev_pm_ops usb_hcd_pci_pm_ops = {
>>  	.suspend_noirq	= hcd_pci_suspend_noirq,
>>  	.resume_noirq	= hcd_pci_resume_noirq,
>>  	.resume		= hcd_pci_resume,
>> -	.freeze		= check_root_hub_suspended,
>> +	.freeze		= hcd_pci_freeze,
>>  	.freeze_noirq	= check_root_hub_suspended,
>>  	.thaw_noirq	= NULL,
>>  	.thaw		= NULL,
>> -- 
>> 2.8.1
>>
> .
> 
Thanks,
Longfang.