lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 13 Oct 2016 14:02:15 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     "'Koehrer Mathias (ETAS/ESW5)'" <mathias.koehrer@...s.com>,
        "Julia Cartwright" <julia@...com>,
        "Williams, Mitch A" <mitch.a.williams@...el.com>,
        "Kirsher, Jeffrey T" <jeffrey.t.kirsher@...el.com>
CC:     "linux-rt-users@...r.kernel.org" <linux-rt-users@...r.kernel.org>,
        Sebastian Andrzej Siewior <sebastian.siewior@...utronix.de>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "intel-wired-lan@...ts.osuosl.org" <intel-wired-lan@...ts.osuosl.org>,
        Greg <gvrose8192@...il.com>
Subject: RE: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge
 latencies in cyclictest

From: Koehrer Mathias
> Sent: 13 October 2016 11:57
..
> > The time between my trace points 700 and 701 is about 30us, the time between my
> > trace points 600 and 601 is even 37us!!
> > The code in between is
> >      tsyncrxctl = rd32(E1000_TSYNCRXCTL); resp.
> >      lvmmc = rd32(E1000_LVMMC);
> >
> > In both cases this is a single read from a register.
> > I have no idea why this single read could take that much time!
> > Is it possible that the igb hardware is in a state that delays the read access and this is
> > why the whole I/O system might be delayed?
> >
> 
> To have a proper comparison, I did the same with kernel 3.18.27-rt27.
> Also here, I instrumented the igb driver to get traces for the rd32 calls.
> However, here everything is generally much faster!
> In the idle system the maximum I got for a read was about 6us, most times it was 1-2us.

1-2us is probably about right, PCIe is high throughput high latency.
You should see the latencies we get talking to fpga!

> On the 4.8 kernel this is always much slower (see above).
> My question is now: Is there any kernel config option that has been introduced in the meantime
> that may lead to this effect and which is not set in my 4.8 config?

Have a look at the generated code for rd32().
Someone might have added a load of synchronisation instructions to it.
On x86 I don't think it needs any.

It is also possible for other PCIe accesses to slow things down
(which might be why you see 6us).

I presume you are doing these comparisons on the same hardware?
Obscure bus topologies could slow things down.

	David

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ