lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAA85sZtez59wrAr8AVdbaZWvkPhVeurcznQ0kbL3UVx2FqLJ7w@mail.gmail.com>
Date:   Fri, 18 Dec 2020 00:37:16 +0100
From:   Ian Kumlien <ian.kumlien@...il.com>
To:     Bjorn Helgaas <helgaas@...nel.org>
Cc:     Kai-Heng Feng <kai.heng.feng@...onical.com>,
        linux-pci <linux-pci@...r.kernel.org>,
        Alexander Duyck <alexander.duyck@...il.com>,
        "Saheed O. Bolarinwa" <refactormyself@...il.com>,
        Puranjay Mohan <puranjay12@...il.com>,
        Jesse Brandeburg <jesse.brandeburg@...el.com>,
        Tony Nguyen <anthony.l.nguyen@...el.com>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Heiner Kallweit <hkallweit1@...il.com>,
        intel-wired-lan <intel-wired-lan@...ts.osuosl.org>,
        Linux Kernel Network Developers <netdev@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 1/3] PCI/ASPM: Use the path max in L1 ASPM latency check

On Thu, Dec 17, 2020 at 12:21 AM Bjorn Helgaas <helgaas@...nel.org> wrote:
>
> On Wed, Dec 16, 2020 at 12:20:53PM +0100, Ian Kumlien wrote:
> > On Wed, Dec 16, 2020 at 1:08 AM Bjorn Helgaas <helgaas@...nel.org> wrote:
> > > On Tue, Dec 15, 2020 at 02:09:12PM +0100, Ian Kumlien wrote:
> > > > On Tue, Dec 15, 2020 at 1:40 AM Bjorn Helgaas <helgaas@...nel.org> wrote:
> > > > > On Mon, Dec 14, 2020 at 11:56:31PM +0100, Ian Kumlien wrote:
> > > > > > On Mon, Dec 14, 2020 at 8:19 PM Bjorn Helgaas <helgaas@...nel.org> wrote:
> > > > >
> > > > > > > If you're interested, you could probably unload the Realtek drivers,
> > > > > > > remove the devices, and set the PCI_EXP_LNKCTL_LD (Link Disable) bit
> > > > > > > in 02:04.0, e.g.,
> > > > > > >
> > > > > > >   # RT=/sys/devices/pci0000:00/0000:00:01.2/0000:01:00.0/0000:02:04.0
> > > > > > >   # echo 1 > $RT/0000:04:00.0/remove
> > > > > > >   # echo 1 > $RT/0000:04:00.1/remove
> > > > > > >   # echo 1 > $RT/0000:04:00.2/remove
> > > > > > >   # echo 1 > $RT/0000:04:00.4/remove
> > > > > > >   # echo 1 > $RT/0000:04:00.7/remove
> > > > > > >   # setpci -s02:04.0 CAP_EXP+0x10.w=0x0010
> > > > > > >
> > > > > > > That should take 04:00.x out of the picture.
> > > > > >
> > > > > > Didn't actually change the behaviour, I'm suspecting an errata for AMD pcie...
> > > > > >
> > > > > > So did this, with unpatched kernel:
> > > > > > [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
> > > > > > [  5]   0.00-1.00   sec  4.56 MBytes  38.2 Mbits/sec    0   67.9 KBytes
> > > > > > [  5]   1.00-2.00   sec  4.47 MBytes  37.5 Mbits/sec    0   96.2 KBytes
> > > > > > [  5]   2.00-3.00   sec  4.85 MBytes  40.7 Mbits/sec    0   50.9 KBytes
> > > > > > [  5]   3.00-4.00   sec  4.23 MBytes  35.4 Mbits/sec    0   70.7 KBytes
> > > > > > [  5]   4.00-5.00   sec  4.23 MBytes  35.4 Mbits/sec    0   48.1 KBytes
> > > > > > [  5]   5.00-6.00   sec  4.23 MBytes  35.4 Mbits/sec    0   45.2 KBytes
> > > > > > [  5]   6.00-7.00   sec  4.23 MBytes  35.4 Mbits/sec    0   36.8 KBytes
> > > > > > [  5]   7.00-8.00   sec  3.98 MBytes  33.4 Mbits/sec    0   36.8 KBytes
> > > > > > [  5]   8.00-9.00   sec  4.23 MBytes  35.4 Mbits/sec    0   36.8 KBytes
> > > > > > [  5]   9.00-10.00  sec  4.23 MBytes  35.4 Mbits/sec    0   48.1 KBytes
> > > > > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > > > > [ ID] Interval           Transfer     Bitrate         Retr
> > > > > > [  5]   0.00-10.00  sec  43.2 MBytes  36.2 Mbits/sec    0             sender
> > > > > > [  5]   0.00-10.00  sec  42.7 MBytes  35.8 Mbits/sec                  receiver
> > > > > >
> > > > > > and:
> > > > > > echo 0 > /sys/devices/pci0000:00/0000:00:01.2/0000:01:00.0/link/l1_aspm
> > > > >
> > > > > BTW, thanks a lot for testing out the "l1_aspm" sysfs file.  I'm very
> > > > > pleased that it seems to be working as intended.
> > > >
> > > > It was nice to find it for easy disabling :)
> > > >
> > > > > > and:
> > > > > > [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
> > > > > > [  5]   0.00-1.00   sec   113 MBytes   951 Mbits/sec  153    772 KBytes
> > > > > > [  5]   1.00-2.00   sec   109 MBytes   912 Mbits/sec  276    550 KBytes
> > > > > > [  5]   2.00-3.00   sec   111 MBytes   933 Mbits/sec  123    625 KBytes
> > > > > > [  5]   3.00-4.00   sec   111 MBytes   933 Mbits/sec   31    687 KBytes
> > > > > > [  5]   4.00-5.00   sec   110 MBytes   923 Mbits/sec    0    679 KBytes
> > > > > > [  5]   5.00-6.00   sec   110 MBytes   923 Mbits/sec  136    577 KBytes
> > > > > > [  5]   6.00-7.00   sec   110 MBytes   923 Mbits/sec  214    645 KBytes
> > > > > > [  5]   7.00-8.00   sec   110 MBytes   923 Mbits/sec   32    628 KBytes
> > > > > > [  5]   8.00-9.00   sec   110 MBytes   923 Mbits/sec   81    537 KBytes
> > > > > > [  5]   9.00-10.00  sec   110 MBytes   923 Mbits/sec   10    577 KBytes
> > > > > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > > > > [ ID] Interval           Transfer     Bitrate         Retr
> > > > > > [  5]   0.00-10.00  sec  1.08 GBytes   927 Mbits/sec  1056             sender
> > > > > > [  5]   0.00-10.00  sec  1.07 GBytes   923 Mbits/sec                  receiver
> > > > > >
> > > > > > But this only confirms that the fix i experience is a side effect.
> > > > > >
> > > > > > The original code is still wrong :)
> > > > >
> > > > > What exactly is this machine?  Brand, model, config?  Maybe you could
> > > > > add this and a dmesg log to the buzilla?  It seems like other people
> > > > > should be seeing the same problem, so I'm hoping to grub around on the
> > > > > web to see if there are similar reports involving these devices.
> > > >
> > > > ASUS Pro WS X570-ACE with AMD Ryzen 9 3900X
> > >
> > > Possible similar issues:
> > >
> > >   https://forums.unraid.net/topic/94274-hardware-upgrade-woes/
> > >   https://forums.servethehome.com/index.php?threads/upgraded-my-home-server-from-intel-to-amd-virtual-disk-stuck-in-degraded-unhealty-state.25535/ (Windows)
> >
> > Could be, I suspect that we need a workaround (is there a quirk for
> > "reporting wrong latency"?) and the patches.
>
> I don't think there's currently a quirk mechanism that would work for
> correcting latencies, but there should be, and we could add one if we
> can figure out for sure what's wrong.
>
> I found this:
>
>   https://www.reddit.com/r/VFIO/comments/hgk3cz/x570_pcieclassic_pci_bridge_woes/
>
> which looks like it should be the same hardware (if you can collect a
> dmesg log or "lspci -nnvv" output we could tell for sure) and is
> interesting because it includes some lspci output that shows different
> L1 exit latencies than what you see.

I'll send both of them separately to you, no reason to push that to
everyone i assume.. =)

> > > > > https://bugzilla.kernel.org/show_bug.cgi?id=209725
> > > > >
> > > > > Here's one that is superficially similar:
> > > > > https://linux-hardware.org/index.php?probe=e5f24075e5&log=lspci_all
> > > > > in that it has a RP -- switch -- I211 path.  Interestingly, the switch
> > > > > here advertises <64us L1 exit latency instead of the <32us latency
> > > > > your switch advertises.  Of course, I can't tell if it's exactly the
> > > > > same switch.
> > > >
> > > > Same chipset it seems
> > > >
> > > > I'm running bios version:
> > > >         Version: 2206
> > > >         Release Date: 08/13/2020
> > > >
> > > > ANd latest is:
> > > > Version 3003
> > > > 2020/12/07
> > > >
> > > > Will test upgrading that as well, but it could be that they report the
> > > > incorrect latency of the switch - I don't know how many things AGESA
> > > > changes but... It's been updated twice since my upgrade.
> > >
> > > I wouldn't be surprised if the advertised exit latencies are writable
> > > by the BIOS because it probably depends on electrical characteristics
> > > outside the switch.  If so, it's possible ASUS just screwed it up.
> >
> > Not surprisingly, nothing changed.
> > (There was a lot of "stability improvements")
>
> I wouldn't be totally surprised if ASUS didn't test that I211 NIC
> under Linux, but I'm sure it must work well under Windows.  If you
> happen to have Windows, a free trial version of AIDA64 should be able
> to give us the equivalent of "lspci -vv".

I don't have windows, haven't had windows at home since '98 ;)

I'll check with some friends that dualboot om systems that might be
similar - will see what i can get


> Bjorn

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ