lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALs4sv13Y7CoMvrYm2c58vP6FKyK+_qrSp2UBCv0MURTAkv8hg@mail.gmail.com>
Date:   Mon, 23 May 2022 21:44:58 +0530
From:   Pavan Chebbi <pavan.chebbi@...adcom.com>
To:     David Laight <David.Laight@...lab.com>
Cc:     Paolo Abeni <pabeni@...hat.com>,
        Michael Chan <michael.chan@...adcom.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "mchan@...adcom.com" <mchan@...adcom.com>,
        David Miller <davem@...emloft.net>
Subject: Re: tg3 dropping packets at high packet rates

>From the register dump we can see that your device is 5720-A0.
There is a patch that was applied long time ago:
4d95847381228639844c7197deb8b2211274ef22- tg3: Workaround rx_discards stat bug

This patch could be making the rx_discards count not report the actual discards.
You can see the actual discards if you monitor the register at 0x2250,
which I think matches with
your application report. Can you test this and confirm?

We can also see from your register dump shared earlier that the Flow
Attention Register at 0x3C48 is indicating
bit 6 set. That means the MBUF allocation state machine has reached
mbuf low water threshold.
In summary at this point it appears like the host cannot keep up with
incoming traffic.


On Mon, May 23, 2022 at 9:31 PM David Laight <David.Laight@...lab.com> wrote:
>
> Ok todays test is calling tracing_off() when the napi callback
> has more than 1000 packets queued.
>
> This is rather informative.
> Just taking the trace for cpu 32 and shortening the lines somewhat
> gives the following.
>
> <idle>-0  6.594695: irq_handler_entry: irq=87 name=em2-rx-2
> <idle>-0  6.594695: napi_schedule_prep <-tg3_msi_1shot
> <idle>-0  6.594695: __napi_schedule <-tg3_msi_1shot
> <idle>-0  6.594695: irq_handler_exit: irq=87 ret=handled
> <idle>-0  6.594695: softirq_entry: vec=3 [action=NET_RX]
> <idle>-0  6.594696: napi_schedule_prep <-tg3_rx
> <idle>-0  6.594696: __napi_schedule <-tg3_rx
> <idle>-0  6.594697: napi_poll: napi poll on napi struct 06e44eda for device em2 work 1 budget 64
> <idle>-0  6.594697: softirq_exit: vec=3 [action=NET_RX]
> <idle>-0  6.594697: softirq_entry: vec=3 [action=NET_RX]
> <idle>-0  6.594698: napi_poll: napi poll on napi struct 909def03 for device em2 work 0 budget 64
> <idle>-0  6.594698: softirq_exit: vec=3 [action=NET_RX]
> <idle>-0  6.594700: irq_handler_entry: irq=87 name=em2-rx-2
> <idle>-0  6.594701: napi_schedule_prep <-tg3_msi_1shot
> <idle>-0  6.594701: __napi_schedule <-tg3_msi_1shot
> <idle>-0  6.594701: irq_handler_exit: irq=87 ret=handled
> <idle>-0  6.594701: softirq_entry: vec=3 [action=NET_RX]
> <idle>-0  6.594704: napi_schedule_prep <-tg3_rx
> <idle>-0  6.594704: __napi_schedule <-tg3_rx
> <idle>-0  6.594705: napi_poll: napi poll on napi struct 06e44eda for device em2 work 3 budget 64
> <idle>-0  6.594706: softirq_exit: vec=3 [action=NET_RX]
> <idle>-0  6.594710: irq_handler_entry: irq=87 name=em2-rx-2
> <idle>-0  6.594710: napi_schedule_prep <-tg3_msi_1shot
> <idle>-0  6.594710: __napi_schedule <-tg3_msi_1shot
> <idle>-0  6.594710: irq_handler_exit: irq=87 ret=handled
> <idle>-0  6.594712: sched_switch: prev_pid=0 prev_prio=120 prev_state=R ==> next_pid=2275 next_prio=49
> pid-2275  6.594720: sys_futex(uaddr: 7fbd2bfe3a88, op: 81, val: 1, utime: 1064720, uaddr2: 0, val3: 27bea)
> pid-2275  6.594721: sys_futex -> 0x0
> pid-2275  6.598067: sys_epoll_wait(epfd: 61, events: 7fbd2bfe3300, maxevents: 80, timeout: 0)
> pid-2275  6.598747: sched_switch: prev_pid=2275 prev_prio=49 prev_state=S ==> next_pid=175 next_prio=120
> pid-175   6.598759: sched_switch: prev_pid=175 prev_prio=120 prev_state=R ==> next_pid=819 next_prio=120
> pid-819   6.598763: sched_switch: prev_pid=819 prev_prio=120 prev_state=I ==> next_pid=175 next_prio=120
> pid-175   6.598765: softirq_entry: vec=3 [action=NET_RX]
> pid-175   6.598770: __napi_schedule <-napi_complete_done
> pid-175   6.598771: napi_poll: napi poll on napi struct 909def03 for device em2 work 0 budget 64
>
> The first few interrupts look fine.
> Then there is a 4ms delay between the irq_handler caling napi_schedule()
> and the NET_RX function actually being called.
>
> Except that isn't the actual delay.
> There are 3 relevant napi structures.
> 06e44eda (horrid hash confusing things) for the rx ring of irq=87.
> xxxxxxxx (not in the above trace) for the other active rx ring.
> 909def03 for em2-rx-1 that is also used to copy used rx descriptors
> back to the 'free buffer' ring.
>
> So the normal sequence is the hard_irq schedules the 'napi' for that ring.
> Max 64 packets are processed (RPS cross schedules them).
> Then the 'napi' for ring em2-rx-0 is scheduled.
> It is this napi that is being reported complete.
> Then the em2-rx-2 napi is called and finds 1000+ items on the status ring.
>
> During the 4ms gap em2-rx-4 is being interrupted and the napi is
> processing receive packets - but the shared napi (909def03) isn't run.
> (Earlier in the trace it gets scheduled by both ISR.)
>
> pid 175 is ksoftirqd/32 and 819 kworker/32:2.
> pid 2275 is part of our application.
> It is possible that it looped in userspace for a few ms.
>
> But the napi_schedule() at 6.594704 should have called back as
> 909def03 immediately.
>
> I've got a lot more ftrace - but it is large and mostly boring.
>
>         David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)

Download attachment "smime.p7s" of type "application/pkcs7-signature" (4209 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ