netdev - RE: tg3 dropping packets at high packet rates

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <9119f62fadaa4342a34882cac835c8b0@AcuMS.aculab.com>
Date:   Mon, 23 May 2022 16:01:00 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     'Pavan Chebbi' <pavan.chebbi@...adcom.com>
CC:     Paolo Abeni <pabeni@...hat.com>,
        Michael Chan <michael.chan@...adcom.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "mchan@...adcom.com" <mchan@...adcom.com>,
        David Miller <davem@...emloft.net>
Subject: RE: tg3 dropping packets at high packet rates

Ok todays test is calling tracing_off() when the napi callback
has more than 1000 packets queued.

This is rather informative.
Just taking the trace for cpu 32 and shortening the lines somewhat
gives the following.

<idle>-0  6.594695: irq_handler_entry: irq=87 name=em2-rx-2
<idle>-0  6.594695: napi_schedule_prep <-tg3_msi_1shot
<idle>-0  6.594695: __napi_schedule <-tg3_msi_1shot
<idle>-0  6.594695: irq_handler_exit: irq=87 ret=handled
<idle>-0  6.594695: softirq_entry: vec=3 [action=NET_RX]
<idle>-0  6.594696: napi_schedule_prep <-tg3_rx
<idle>-0  6.594696: __napi_schedule <-tg3_rx
<idle>-0  6.594697: napi_poll: napi poll on napi struct 06e44eda for device em2 work 1 budget 64
<idle>-0  6.594697: softirq_exit: vec=3 [action=NET_RX]
<idle>-0  6.594697: softirq_entry: vec=3 [action=NET_RX]
<idle>-0  6.594698: napi_poll: napi poll on napi struct 909def03 for device em2 work 0 budget 64
<idle>-0  6.594698: softirq_exit: vec=3 [action=NET_RX]
<idle>-0  6.594700: irq_handler_entry: irq=87 name=em2-rx-2
<idle>-0  6.594701: napi_schedule_prep <-tg3_msi_1shot
<idle>-0  6.594701: __napi_schedule <-tg3_msi_1shot
<idle>-0  6.594701: irq_handler_exit: irq=87 ret=handled
<idle>-0  6.594701: softirq_entry: vec=3 [action=NET_RX]
<idle>-0  6.594704: napi_schedule_prep <-tg3_rx
<idle>-0  6.594704: __napi_schedule <-tg3_rx
<idle>-0  6.594705: napi_poll: napi poll on napi struct 06e44eda for device em2 work 3 budget 64
<idle>-0  6.594706: softirq_exit: vec=3 [action=NET_RX]
<idle>-0  6.594710: irq_handler_entry: irq=87 name=em2-rx-2
<idle>-0  6.594710: napi_schedule_prep <-tg3_msi_1shot
<idle>-0  6.594710: __napi_schedule <-tg3_msi_1shot
<idle>-0  6.594710: irq_handler_exit: irq=87 ret=handled
<idle>-0  6.594712: sched_switch: prev_pid=0 prev_prio=120 prev_state=R ==> next_pid=2275 next_prio=49
pid-2275  6.594720: sys_futex(uaddr: 7fbd2bfe3a88, op: 81, val: 1, utime: 1064720, uaddr2: 0, val3: 27bea)
pid-2275  6.594721: sys_futex -> 0x0
pid-2275  6.598067: sys_epoll_wait(epfd: 61, events: 7fbd2bfe3300, maxevents: 80, timeout: 0)
pid-2275  6.598747: sched_switch: prev_pid=2275 prev_prio=49 prev_state=S ==> next_pid=175 next_prio=120
pid-175   6.598759: sched_switch: prev_pid=175 prev_prio=120 prev_state=R ==> next_pid=819 next_prio=120
pid-819   6.598763: sched_switch: prev_pid=819 prev_prio=120 prev_state=I ==> next_pid=175 next_prio=120
pid-175   6.598765: softirq_entry: vec=3 [action=NET_RX]
pid-175   6.598770: __napi_schedule <-napi_complete_done
pid-175   6.598771: napi_poll: napi poll on napi struct 909def03 for device em2 work 0 budget 64

The first few interrupts look fine.
Then there is a 4ms delay between the irq_handler caling napi_schedule()
and the NET_RX function actually being called.

Except that isn't the actual delay.
There are 3 relevant napi structures.
06e44eda (horrid hash confusing things) for the rx ring of irq=87.
xxxxxxxx (not in the above trace) for the other active rx ring.
909def03 for em2-rx-1 that is also used to copy used rx descriptors
back to the 'free buffer' ring.

So the normal sequence is the hard_irq schedules the 'napi' for that ring.
Max 64 packets are processed (RPS cross schedules them).
Then the 'napi' for ring em2-rx-0 is scheduled.
It is this napi that is being reported complete.
Then the em2-rx-2 napi is called and finds 1000+ items on the status ring.

During the 4ms gap em2-rx-4 is being interrupted and the napi is
processing receive packets - but the shared napi (909def03) isn't run.
(Earlier in the trace it gets scheduled by both ISR.)

pid 175 is ksoftirqd/32 and 819 kworker/32:2.
pid 2275 is part of our application.
It is possible that it looped in userspace for a few ms.

But the napi_schedule() at 6.594704 should have called back as
909def03 immediately.

I've got a lot more ftrace - but it is large and mostly boring.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)