lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 22 Aug 2018 11:01:02 +0800
From:   Jian-Hong Pan <jian-hong@...lessm.com>
To:     Heiner Kallweit <hkallweit1@...il.com>
Cc:     Steve Dodd <steved424@...il.com>, Lou Reed <gogen@...root.org>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Linux Upstreaming Team <linux@...lessm.com>
Subject: Re: Experimental fix for MSI-X issue on r8169

2018-08-22 5:19 GMT+08:00 Heiner Kallweit <hkallweit1@...il.com>:
> On 20.08.2018 05:47, Jian-Hong Pan wrote:
>> 2018-08-20 4:34 GMT+08:00 Heiner Kallweit <hkallweit1@...il.com>:
>>> The three of you reported an MSI-X-related error when the system
>>> resumes from suspend. This has been fixed for now by disabling MSI-X
>>> on certain chip versions. However more versions may be affected.
>>>
>>> I checked with Realtek and they confirmed that on certain chip
>>> versions a MSIX-related value in PCI config space is reset when
>>> resuming from S3.
>>>
>>> I would appreciate if you could test the following experimental patch
>>> and whether warning "MSIX address lost, re-configuring" appears in
>>> your dmesg output after resume from suspend.
>>>
>>> Thanks a lot for your efforts.
>>
>> Tested with the experiment patch on ASUS X441UAR.
>>
>> This is the information before suspend:
>>
>> dev@...less:~$ dmesg | grep r8169
>> [   10.279565] libphy: r8169: probed
>> [   10.279947] r8169 0000:02:00.0 eth0: RTL8106e, 0c:9d:92:32:67:b4,
>> XID 44900000, IRQ 127
>> [   10.445952] r8169 0000:02:00.0 enp2s0: renamed from eth0
>> [   15.676229] Generic PHY r8169-200:00: attached PHY driver [Generic
>> PHY] (mii_bus:phy_addr=r8169-200:00, irq=IGNORE)
>> [   17.455392] r8169 0000:02:00.0 enp2s0: Link is Up - 100Mbps/Full -
>> flow control off
>>
>> dev@...less:~$ ip addr show enp2s0
>> 4: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
>> state UP group default qlen 1000
>>     link/ether 0c:9d:92:32:67:b4 brd ff:ff:ff:ff:ff:ff
>>     inet 10.100.13.152/24 brd 10.100.13.255 scope global noprefixroute
>> dynamic enp2s0
>>        valid_lft 86347sec preferred_lft 86347sec
>>     inet6 fe80::2873:a2a9:6ca1:c79d/64 scope link noprefixroute
>>        valid_lft forever preferred_lft forever
>>
>> This is the information after resume:
>>
>> dev@...less:~$ dmesg | grep r8169
>> [   10.279565] libphy: r8169: probed
>> [   10.279947] r8169 0000:02:00.0 eth0: RTL8106e, 0c:9d:92:32:67:b4,
>> XID 44900000, IRQ 127
>> [   10.445952] r8169 0000:02:00.0 enp2s0: renamed from eth0
>> [   15.676229] Generic PHY r8169-200:00: attached PHY driver [Generic
>> PHY] (mii_bus:phy_addr=r8169-200:00, irq=IGNORE)
>> [   17.455392] r8169 0000:02:00.0 enp2s0: Link is Up - 100Mbps/Full -
>> flow control off
>> [   95.594265] r8169 0000:02:00.0 enp2s0: Link is Down
>> [   96.242074] Generic PHY r8169-200:00: attached PHY driver [Generic
>> PHY] (mii_bus:phy_addr=r8169-200:00, irq=IGNORE)
>>
>> dev@...less:~$ ip addr show enp2s0
>> 4: enp2s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc
>> pfifo_fast state DOWN group default qlen 1000
>>     link/ether 0c:9d:92:32:67:b4 brd ff:ff:ff:ff:ff:ff
>>
>> There is no "MSIX address lost, re-configuring" in dmesg.
>> The ethernet interface is still down after resume.
>>
>
> Thanks a lot for testing. Unfortunately I don't have test hardware
> affected by this MSI-X issue, so maybe you can help me to understand
> the issue a little better.
>
> Below is a patch printing the MSI-X table entry in different contexts,
> it's not supposed to fix anything. Could you please let me know
> what the output is on your system?
> I want to get an idea whether the issue clears the complete entry or
> just corrupts certain parts.

Here is the test result on ASUS X441UAR with this patch:

dev@...less:~$ dmesg | grep -E "(r8169|enp2s0)"
[    8.980001] r8169 0000:02:00.0: MSI-X entry: context probe: fee01004 0 40ef 1
[    8.981594] libphy: r8169: probed
[    8.981769] r8169 0000:02:00.0 eth0: RTL8106e, 0c:9d:92:32:67:b4,
XID 44900000, IRQ 127
[    9.479848] r8169 0000:02:00.0 enp2s0: renamed from eth0
[   11.332834] IPv6: ADDRCONF(NETDEV_UP): enp2s0: link is not ready
[   11.336350] Generic PHY r8169-200:00: attached PHY driver [Generic
PHY] (mii_bus:phy_addr=r8169-200:00, irq=IGNORE)
[   11.574892] IPv6: ADDRCONF(NETDEV_UP): enp2s0: link is not ready
[   11.581816] r8169 0000:02:00.0 enp2s0: Link is Down
[   13.190535] r8169 0000:02:00.0 enp2s0: Link is Up - 100Mbps/Full -
flow control off
[   13.190548] IPv6: ADDRCONF(NETDEV_CHANGE): enp2s0: link becomes ready
[   56.227974] r8169 0000:02:00.0: MSI-X entry: context suspend:
fee04004 0 4024 0
[   56.462464] r8169 0000:02:00.0: MSI-X entry: context resume:
ffffffff ffffffff ffffffff ffffffff
[   58.406713] r8169 0000:02:00.0 enp2s0: Link is Up - 100Mbps/Full -
flow control off
[   58.766740] IPv6: ADDRCONF(NETDEV_UP): enp2s0: link is not ready
[   58.767331] Generic PHY r8169-200:00: attached PHY driver [Generic
PHY] (mii_bus:phy_addr=r8169-200:00, irq=IGNORE)
[   59.003660] IPv6: ADDRCONF(NETDEV_UP): enp2s0: link is not ready

uh!  The MSI-X entry seems missed after resume on this laptop!

Ethernet interface status after resume:
dev@...less:~$ ip addr show enp2s0
3: enp2s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc
pfifo_fast state DOWN group default qlen 1000
    link/ether 0c:9d:92:32:67:b4 brd ff:ff:ff:ff:ff:ff

Regards,
Jian-Hong Pan

> That's what I get on my system (RTL8168E-VL). In your case you'll come
> only till the first suspend.
>
> [    3.743404] r8169 0000:03:00.0: MSI-X entry: context probe: fee01004 0 40ef 1
> [   29.539250] r8169 0000:03:00.0: MSI-X entry: context suspend: fee02004 0 4028 0
> [   29.837457] r8169 0000:03:00.0: MSI-X entry: context resume: fee01004 0 402b 0
> [   36.921370] r8169 0000:03:00.0: MSI-X entry: context suspend: fee01004 0 402b 0
> [   37.239407] r8169 0000:03:00.0: MSI-X entry: context resume: fee01004 0 402b 0
>
> diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
> index 54f53c8c0..f32645119 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -11,6 +11,7 @@
>  #include <linux/module.h>
>  #include <linux/moduleparam.h>
>  #include <linux/pci.h>
> +#include <linux/msi.h>
>  #include <linux/netdevice.h>
>  #include <linux/etherdevice.h>
>  #include <linux/delay.h>
> @@ -6822,6 +6823,20 @@ rtl8169_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
>         pm_runtime_put_noidle(&pdev->dev);
>  }
>
> +static void rtl_print_msix_entry(struct rtl8169_private *tp, const char *context)
> +{
> +       struct msi_desc *desc = first_pci_msi_entry(tp->pci_dev);
> +       u32 data[4];
> +
> +       data[0] = readl(desc->mask_base + PCI_MSIX_ENTRY_LOWER_ADDR);
> +       data[1] = readl(desc->mask_base + PCI_MSIX_ENTRY_UPPER_ADDR);
> +       data[2] = readl(desc->mask_base + PCI_MSIX_ENTRY_DATA);
> +       data[3] = readl(desc->mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL);
> +
> +       dev_info(tp_to_dev(tp), "MSI-X entry: context %s: %x %x %x %x\n",
> +                context, data[0], data[1], data[2], data[3]);
> +}
> +
>  static void rtl8169_net_suspend(struct net_device *dev)
>  {
>         struct rtl8169_private *tp = netdev_priv(dev);
> @@ -6846,9 +6861,12 @@ static int rtl8169_suspend(struct device *device)
>  {
>         struct pci_dev *pdev = to_pci_dev(device);
>         struct net_device *dev = pci_get_drvdata(pdev);
> +       struct rtl8169_private *tp = netdev_priv(dev);
>
>         rtl8169_net_suspend(dev);
>
> +       rtl_print_msix_entry(tp, "suspend");
> +
>         return 0;
>  }
>
> @@ -6875,6 +6893,9 @@ static int rtl8169_resume(struct device *device)
>  {
>         struct pci_dev *pdev = to_pci_dev(device);
>         struct net_device *dev = pci_get_drvdata(pdev);
> +       struct rtl8169_private *tp = netdev_priv(dev);
> +
> +       rtl_print_msix_entry(tp, "resume");
>
>         if (netif_running(dev))
>                 __rtl8169_resume(dev);
> @@ -7075,11 +7096,6 @@ static int rtl_alloc_irq(struct rtl8169_private *tp)
>                 RTL_W8(tp, Config2, RTL_R8(tp, Config2) & ~MSIEnable);
>                 RTL_W8(tp, Cfg9346, Cfg9346_Lock);
>                 flags = PCI_IRQ_LEGACY;
> -       } else if (tp->mac_version == RTL_GIGA_MAC_VER_40) {
> -               /* This version was reported to have issues with resume
> -                * from suspend when using MSI-X
> -                */
> -               flags = PCI_IRQ_LEGACY | PCI_IRQ_MSI;
>         } else {
>                 flags = PCI_IRQ_ALL_TYPES;
>         }
> @@ -7354,6 +7370,8 @@ static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
>                 return rc;
>         }
>
> +       rtl_print_msix_entry(tp, "probe");
> +
>         tp->saved_wolopts = __rtl8169_get_wol(tp);
>
>         mutex_init(&tp->wk.mutex);
> --
> 2.18.0
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ