[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <063D6719AE5E284EB5DD2968C1650D6DB01E7AC1@AcuExch.aculab.com>
Date: Thu, 13 Oct 2016 14:02:15 +0000
From: David Laight <David.Laight@...LAB.COM>
To: "'Koehrer Mathias (ETAS/ESW5)'" <mathias.koehrer@...s.com>,
"Julia Cartwright" <julia@...com>,
"Williams, Mitch A" <mitch.a.williams@...el.com>,
"Kirsher, Jeffrey T" <jeffrey.t.kirsher@...el.com>
CC: "linux-rt-users@...r.kernel.org" <linux-rt-users@...r.kernel.org>,
Sebastian Andrzej Siewior <sebastian.siewior@...utronix.de>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"intel-wired-lan@...ts.osuosl.org" <intel-wired-lan@...ts.osuosl.org>,
Greg <gvrose8192@...il.com>
Subject: RE: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge
latencies in cyclictest
From: Koehrer Mathias
> Sent: 13 October 2016 11:57
..
> > The time between my trace points 700 and 701 is about 30us, the time between my
> > trace points 600 and 601 is even 37us!!
> > The code in between is
> > tsyncrxctl = rd32(E1000_TSYNCRXCTL); resp.
> > lvmmc = rd32(E1000_LVMMC);
> >
> > In both cases this is a single read from a register.
> > I have no idea why this single read could take that much time!
> > Is it possible that the igb hardware is in a state that delays the read access and this is
> > why the whole I/O system might be delayed?
> >
>
> To have a proper comparison, I did the same with kernel 3.18.27-rt27.
> Also here, I instrumented the igb driver to get traces for the rd32 calls.
> However, here everything is generally much faster!
> In the idle system the maximum I got for a read was about 6us, most times it was 1-2us.
1-2us is probably about right, PCIe is high throughput high latency.
You should see the latencies we get talking to fpga!
> On the 4.8 kernel this is always much slower (see above).
> My question is now: Is there any kernel config option that has been introduced in the meantime
> that may lead to this effect and which is not set in my 4.8 config?
Have a look at the generated code for rd32().
Someone might have added a load of synchronisation instructions to it.
On x86 I don't think it needs any.
It is also possible for other PCIe accesses to slow things down
(which might be why you see 6us).
I presume you are doing these comparisons on the same hardware?
Obscure bus topologies could slow things down.
David
Powered by blists - more mailing lists