lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 26 Aug 2022 09:53:31 +0200
From:   Lukas Wunner <lukas@...ner.de>
To:     Marek Szyprowski <m.szyprowski@...sung.com>
Cc:     "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        Eric Dumazet <edumazet@...gle.com>, netdev@...r.kernel.org,
        linux-usb@...r.kernel.org,
        Steve Glendinning <steve.glendinning@...well.net>,
        UNGLinuxDriver@...rochip.com, Oliver Neukum <oneukum@...e.com>,
        Andre Edich <andre.edich@...rochip.com>,
        Oleksij Rempel <linux@...pel-privat.de>,
        Martyn Welch <martyn.welch@...labora.com>,
        Gabriel Hojda <ghojda@...urs.ro>,
        Christoph Fritz <chf.fritz@...glemail.com>,
        Lino Sanfilippo <LinoSanfilippo@....de>,
        Philipp Rosenberger <p.rosenberger@...bus.com>,
        Heiner Kallweit <hkallweit1@...il.com>,
        Andrew Lunn <andrew@...n.ch>,
        Russell King <linux@...linux.org.uk>,
        Ferry Toth <fntoth@...il.com>,
        Krzysztof Kozlowski <krzk@...nel.org>,
        'Linux Samsung SOC' <linux-samsung-soc@...r.kernel.org>
Subject: Re: [PATCH net-next v3 5/7] usbnet: smsc95xx: Forward PHY interrupts
 to PHY driver to avoid polling

On Fri, Aug 26, 2022 at 09:41:46AM +0200, Marek Szyprowski wrote:
> On 26.08.2022 09:19, Lukas Wunner wrote:
> > On Fri, Aug 26, 2022 at 08:51:58AM +0200, Marek Szyprowski wrote:
> >> On 19.05.2022 23:22, Marek Szyprowski wrote:
> >>> On 19.05.2022 21:08, Lukas Wunner wrote:
> >>>> On Tue, May 17, 2022 at 12:18:45PM +0200, Marek Szyprowski wrote:
> >>>>> This patch landed in the recent linux next-20220516 as commit
> >>>>> 1ce8b37241ed ("usbnet: smsc95xx: Forward PHY interrupts to PHY
> >>>>> driver to
> >>>>> avoid polling"). Unfortunately it breaks smsc95xx usb ethernet
> >>>>> operation
> >>>>> after system suspend-resume cycle. On the Odroid XU3 board I got the
> >>>>> following warning in the kernel log:
> >>>>>
> >>>>> # time rtcwake -s10 -mmem
> >>>>> rtcwake: wakeup from "mem" using /dev/rtc0 at Tue May 17 09:16:07 2022
> >>>>> PM: suspend entry (deep)
> >>>>> Filesystems sync: 0.001 seconds
> >>>>> Freezing user space processes ... (elapsed 0.002 seconds) done.
> >>>>> OOM killer disabled.
> >>>>> Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> >>>>> printk: Suspending console(s) (use no_console_suspend to debug)
> >>>>> smsc95xx 4-1.1:1.0 eth0: entering SUSPEND2 mode
> >>>>> smsc95xx 4-1.1:1.0 eth0: Failed to read reg index 0x00000114: -113
> >>>>> smsc95xx 4-1.1:1.0 eth0: Error reading MII_ACCESS
> >>>>> smsc95xx 4-1.1:1.0 eth0: __smsc95xx_mdio_read: MII is busy
> >>>>> ------------[ cut here ]------------
> >>>>> WARNING: CPU: 2 PID: 73 at drivers/net/phy/phy.c:946
> >>>>> phy_state_machine+0x98/0x28c
> >>>> [...]
> >>>>> It looks that the driver's suspend/resume operations might need some
> >>>>> adjustments. After the system suspend/resume cycle the driver is not
> >>>>> operational anymore. Reverting the $subject patch on top of linux
> >>>>> next-20220516 restores ethernet operation after system suspend/resume.
> >>>> Thanks a lot for the report. It seems the PHY is signaling a link
> >>>> change
> >>>> shortly before system sleep and by the time the phy_state_machine()
> >>>> worker
> >>>> gets around to handle it, the device has already been suspended and thus
> >>>> refuses any further USB requests with -EHOSTUNREACH (-113):
> > [...]
> >>>> Assuming the above theory is correct, calling phy_stop_machine()
> >>>> after usbnet_suspend() would be sufficient to fix the issue.
> >>>> It cancels the phy_state_machine() worker.
> >>>>
> >>>> The small patch below does that. Could you give it a spin?
> >>> That's it. Your analysis is right and the patch fixes the issue. Thanks!
> >>>
> >>> Feel free to add:
> >>>
> >>> Reported-by: Marek Szyprowski <m.szyprowski@...sung.com>
> >>>
> >>> Tested-by: Marek Szyprowski <m.szyprowski@...sung.com>
> >> Gentle ping for the final patch...
> > Hm?  Actually this issue is supposed to be fixed by mainline commit
> > 1758bde2e4aa ("net: phy: Don't trigger state machine while in suspend").
> >
> > The initial fix attempt that you're replying to should not be necessary
> > with that commit.
> >
> > Are you still seeing issues even with 1758bde2e4aa applied?
> > Or are you maybe using a custom downstream tree which is missing that commit?
> 
> On Linux next-20220825 I still get the following warning during 
> suspend/resume cycle:
> 
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 1483 at drivers/net/phy/phy_device.c:323 
> mdio_bus_phy_resume+0x10c/0x110
> Modules linked in: exynos_gsc s5p_jpeg s5p_mfc videobuf2_dma_contig 
> v4l2_mem2mem videobuf2_memops videobuf2_v4l2 videobuf2_common videodev 
> mc s5p_cec
> CPU: 0 PID: 1483 Comm: rtcwake Not tainted 6.0.0-rc2-next-20220825 #5482
> Hardware name: Samsung Exynos (Flattened Device Tree)
>   unwind_backtrace from show_stack+0x10/0x14
>   show_stack from dump_stack_lvl+0x58/0x70
>   dump_stack_lvl from __warn+0xc8/0x220
>   __warn from warn_slowpath_fmt+0x5c/0xb4
>   warn_slowpath_fmt from mdio_bus_phy_resume+0x10c/0x110
>   mdio_bus_phy_resume from dpm_run_callback+0x94/0x208
>   dpm_run_callback from device_resume+0x124/0x21c
>   device_resume from dpm_resume+0x108/0x278
>   dpm_resume from dpm_resume_end+0xc/0x18
>   dpm_resume_end from suspend_devices_and_enter+0x208/0x70c
>   suspend_devices_and_enter from pm_suspend+0x364/0x430
>   pm_suspend from state_store+0x68/0xc8
>   state_store from kernfs_fop_write_iter+0x110/0x1d4
>   kernfs_fop_write_iter from vfs_write+0x1c4/0x2ac
>   vfs_write from ksys_write+0x5c/0xd4
>   ksys_write from ret_fast_syscall+0x0/0x1c
> Exception stack(0xf2ee5fa8 to 0xf2ee5ff0)
> 5fa0:                   00000004 0002b438 00000004 0002b438 00000004 
> 00000000
> 5fc0: 00000004 0002b438 000291b0 00000004 0002b438 00000004 befd9c1c 
> 00028160
> 5fe0: 0000006c befd9ae8 b6eb4148 b6f118a4
> irq event stamp: 58381
> hardirqs last  enabled at (58393): [<c019ff28>] vprintk_emit+0x320/0x344
> hardirqs last disabled at (58400): [<c019fedc>] vprintk_emit+0x2d4/0x344
> softirqs last  enabled at (58258): [<c0101694>] __do_softirq+0x354/0x618
> softirqs last disabled at (58247): [<c012dd18>] __irq_exit_rcu+0x140/0x1ec
> ---[ end trace 0000000000000000 ]---
> 
> The mentioned patch fixes it.

Color me confused.

With "the mentioned patch", are you referring to 1758bde2e4aa
or are you referring to the little test patch in this e-mail:
https://lore.kernel.org/netdev/20220519190841.GA30869@wunner.de/

There's a Tested-by from you on 1758bde2e4aa, so I assume the
commit fixed the issue at the time.  Does it still occur
intermittently?  Or does it occur every time?  In the latter case,
I'd assume some other change was made in the meantime and that
other change exposed the issue again...

Thanks,

Lukas

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ