netdev - RE: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b77ea8ab312149d2b20a717d3ab0a8dc@FE-MBX1012.de.bosch.com>
Date:   Thu, 13 Oct 2016 10:57:18 +0000
From:   "Koehrer Mathias (ETAS/ESW5)" <mathias.koehrer@...s.com>
To:     "Koehrer Mathias (ETAS/ESW5)" <mathias.koehrer@...s.com>,
        Julia Cartwright <julia@...com>,
        "Williams, Mitch A" <mitch.a.williams@...el.com>,
        "Kirsher, Jeffrey T" <jeffrey.t.kirsher@...el.com>
CC:     "linux-rt-users@...r.kernel.org" <linux-rt-users@...r.kernel.org>,
        Sebastian Andrzej Siewior <sebastian.siewior@...utronix.de>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "intel-wired-lan@...ts.osuosl.org" <intel-wired-lan@...ts.osuosl.org>,
        Greg <gvrose8192@...il.com>
Subject: RE: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge
 latencies in cyclictest

Hi all!
> 
> Hi Julia,
> 
> thanks for the detailed analysis!
> >
> > [...]
> > Okay, we finally received our wakeup event.  We were expecting to be
> > woken up at 10024735653388ns, but were actually woken up at
> 10024735682387ns.
> >
> >   10024735682387 - 10024735653388 = 28999ns
> >
> > Our timer fired ~29us late!  But why...?
> >
> > Sorry I don't have answers, just more questions.  I do wonder what
> > kworker/0:3 was up to at the time the timer interrupt should have fired.
> >
> >    Julia
> I have now instrumented the igb driver to generate trace points that allows me to
> identify in more details what's going on.
> This delivered interesting results!
> 
> Here are some places where I added traces:
> In file igb_ptp.c:
> void igb_ptp_rx_hang(struct igb_adapter *adapter) {
>         struct e1000_hw *hw = &adapter->hw;
>         unsigned long rx_event;
>         u32 tsyncrxctl;
>         trace_igb(700);
>         tsyncrxctl = rd32(E1000_TSYNCRXCTL);
>         trace_igb(701);
> 
>         /* Other hardware uses per-packet timestamps */
>         if (hw->mac.type != e1000_82576)
>                 return;
> ...
> 
> In file igb_main.c:
> static void igb_check_lvmmc(struct igb_adapter *adapter) {
>         struct e1000_hw *hw = &adapter->hw;
>         u32 lvmmc;
> 
>         trace_igb(600);
>         lvmmc = rd32(E1000_LVMMC);
>         trace_igb(601);
>         if (lvmmc) {
> ...
> 
> When I run now my test, I get the following trace:
> [...]
> kworker/-607     0....... 107315621us+: igb: val: 700
> kworker/-607     0d..h... 107315640us : irq_handler_entry: irq=47 name=eth2-rx-0
> kworker/-607     0d..h... 107315640us : irq_handler_exit: irq=47 ret=handled
> kworker/-607     0d..h1.. 107315640us : sched_waking: comm=irq/47-eth2-rx-
> pid=18009 prio=49 target_cpu=000
> kworker/-607     0dN.h2.. 107315641us : sched_wakeup: comm=irq/47-eth2-rx-
> pid=18009 prio=49 target_cpu=000
> kworker/-607     0dN.h1.. 107315643us : irq_handler_entry: irq=48 name=eth2-tx-0
> kworker/-607     0dN.h1.. 107315643us : irq_handler_exit: irq=48 ret=handled
> kworker/-607     0dN.h2.. 107315643us : sched_waking: comm=irq/48-eth2-tx-
> pid=18010 prio=49 target_cpu=000
> kworker/-607     0dN.h3.. 107315644us : sched_wakeup: comm=irq/48-eth2-tx-
> pid=18010 prio=49 target_cpu=000
> kworker/-607     0dN..1.. 107315644us : rcu_utilization: Start context switch
> kworker/-607     0dN..1.. 107315644us : rcu_utilization: End context switch
> kworker/-607     0dN..2.. 107315644us : sched_stat_runtime: comm=kworker/0:1
> pid=607 runtime=88996 [ns] vruntime=49754678074 [ns]
> kworker/-607     0d...2.. 107315645us : sched_switch: prev_comm=kworker/0:1
> prev_pid=607 prev_prio=120 prev_state=R+ ==> next_comm=irq/47-eth2-rx-
> next_pid=18009 next_prio=49
> irq/47-e-18009   0d....11 107315646us : softirq_raise: vec=3 [action=NET_RX]
> irq/47-e-18009   0.....12 107315646us : softirq_entry: vec=3 [action=NET_RX]
> irq/47-e-18009   0.....12 107315647us : napi_poll: napi poll on napi struct
> ffff88040ae58c50 for device eth2 work 0 budget 64
> irq/47-e-18009   0.....12 107315647us : softirq_exit: vec=3 [action=NET_RX]
> irq/47-e-18009   0d...1.. 107315648us : rcu_utilization: Start context switch
> irq/47-e-18009   0d...1.. 107315648us : rcu_utilization: End context switch
> irq/47-e-18009   0d...2.. 107315648us : sched_switch: prev_comm=irq/47-eth2-rx-
> prev_pid=18009 prev_prio=49 prev_state=S ==> next_comm=irq/48-eth2-tx-
> next_pid=18010 next_prio=49
> irq/48-e-18010   0d....11 107315649us : softirq_raise: vec=3 [action=NET_RX]
> irq/48-e-18010   0.....12 107315649us : softirq_entry: vec=3 [action=NET_RX]
> irq/48-e-18010   0.....12 107315650us : napi_poll: napi poll on napi struct
> ffff88040ae5f450 for device eth2 work 0 budget 64
> irq/48-e-18010   0.....12 107315650us : softirq_exit: vec=3 [action=NET_RX]
> irq/48-e-18010   0d...1.. 107315651us : rcu_utilization: Start context switch
> irq/48-e-18010   0d...1.. 107315651us : rcu_utilization: End context switch
> irq/48-e-18010   0d...2.. 107315651us : sched_switch: prev_comm=irq/48-eth2-tx-
> prev_pid=18010 prev_prio=49 prev_state=S ==> next_comm=kworker/0:1
> next_pid=607 next_prio=120
> kworker/-607     0....... 107315652us : igb: val: 701
> kworker/-607     0....... 107315652us : igb: val: 106
> kworker/-607     0....... 107315652us : igb: val: 107
> kworker/-607     0....... 107315652us+: igb: val: 600
> kworker/-607     0d..h... 107315689us : local_timer_entry: vector=239
> kworker/-607     0d..h1.. 107315689us : hrtimer_interrupt: cpu=0 offset=-34521
> curr=kworker/0:1[120] thread=cyclictest[19]
> kworker/-607     0d..h1.. 107315689us : hrtimer_cancel: hrtimer=ffff8803d42efe18
> kworker/-607     0d..h... 107315689us : hrtimer_expire_entry:
> hrtimer=ffff8803d42efe18 function=hrtimer_wakeup now=752735681960
> kworker/-607     0d..h1.. 107315689us : sched_waking: comm=cyclictest pid=18015
> prio=19 target_cpu=000
> kworker/-607     0dN.h2.. 107315690us : sched_wakeup: comm=cyclictest
> pid=18015 prio=19 target_cpu=000
> kworker/-607     0dN.h... 107315690us : hrtimer_expire_exit:
> hrtimer=ffff8803d42efe18
> kworker/-607     0dN.h1.. 107315690us : hrtimer_interrupt: cpu=0 offset=318040
> curr=kworker/0:1[120] thread=<none>[-1]
> kworker/-607     0dN.h... 107315690us : write_msr: 6e0, value 28096cdb9ce
> kworker/-607     0dN.h... 107315690us : local_timer_exit: vector=239
> kworker/-607     0dN..1.. 107315690us : rcu_utilization: Start context switch
> kworker/-607     0dN..1.. 107315691us : rcu_utilization: End context switch
> kworker/-607     0dN..2.. 107315691us : sched_stat_runtime: comm=kworker/0:1
> pid=607 runtime=38439 [ns] vruntime=49754716513 [ns]
> kworker/-607     0d...2.. 107315691us : sched_switch: prev_comm=kworker/0:1
> prev_pid=607 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest
> next_pid=18015 next_prio=19
> kworker/-607     0d...2.. 107315691us : x86_fpu_regs_activated: x86/fpu:
> ffff8803f7f55940 fpregs_active: 1 fpstate_active: 1 counter: 99 xfeatures: 2
> xcomp_bv: 0
> kworker/-607     0d...2.. 107315691us : write_msr: c0000100, value 7ffff7400700
> cyclicte-18015   0....... 107315692us : sys_exit: NR 230 = 0
> cyclicte-18015   0....... 107315697us : sys_enter: NR 1 (5, 7ffff7400300, 1f,
> 7ffff77a5460, 2, 7ffff744c99a)
> cyclicte-18015   0.....11 107315698us : tracing_mark_write: hit latency threshold (37
> > 33)
> cyclicte-18015   0....... 107315699us : sys_exit: NR 1 = 31
> cyclicte-18015   0....... 107315699us : sys_enter: NR 1 (4, 4076b0, 2, 7ffff77a5460,
> 2, 7ffff744c99a)
> 
> 
> Very interesting is also the trace that I get in an idle system - without cyclictest
> running.
> When I just enable my igb tracepoint I got the following result:
> [...]
> kworker/-607     0....... 585779012us+: igb: val: 700
> kworker/-607     0....... 585779042us : igb: val: 701
> kworker/-607     0....... 585779042us : igb: val: 106
> kworker/-607     0....... 585779043us : igb: val: 107
> kworker/-607     0....... 585779043us+: igb: val: 600
> kworker/-607     0....... 585779080us : igb: val: 601
> 
> The time between my trace points 700 and 701 is about 30us, the time between my
> trace points 600 and 601 is even 37us!!
> The code in between is
>      tsyncrxctl = rd32(E1000_TSYNCRXCTL); resp.
>      lvmmc = rd32(E1000_LVMMC);
> 
> In both cases this is a single read from a register.
> I have no idea why this single read could take that much time!
> Is it possible that the igb hardware is in a state that delays the read access and this is
> why the whole I/O system might be delayed?
> 

To have a proper comparison, I did the same with kernel 3.18.27-rt27.
Also here, I instrumented the igb driver to get traces for the rd32 calls.
However, here everything is generally much faster!
In the idle system the maximum I got for a read was about 6us, most times it was 1-2us.
On the 4.8 kernel this is always much slower (see above).
My question is now: Is there any kernel config option that has been introduced in the meantime
that may lead to this effect and which is not set in my 4.8 config?

Regards

Mathias