[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <37732c0b-a1f5-5e1d-d34e-16ef07fab597@redhat.com>
Date: Mon, 29 Nov 2021 14:59:53 +0100
From: Jesper Dangaard Brouer <jbrouer@...hat.com>
To: Daniel Borkmann <daniel@...earbox.net>,
Jakub Kicinski <kuba@...nel.org>,
Toke Høiland-Jørgensen <toke@...hat.com>
Cc: brouer@...hat.com, Alexander Lobakin <alexandr.lobakin@...el.com>,
"David S. Miller" <davem@...emloft.net>,
Jesse Brandeburg <jesse.brandeburg@...el.com>,
Michal Swiatkowski <michal.swiatkowski@...ux.intel.com>,
Maciej Fijalkowski <maciej.fijalkowski@...el.com>,
Jonathan Corbet <corbet@....net>,
Shay Agroskin <shayagr@...zon.com>,
Arthur Kiyanovski <akiyano@...zon.com>,
David Arinzon <darinzon@...zon.com>,
Noam Dagan <ndagan@...zon.com>,
Saeed Bishara <saeedb@...zon.com>,
Ioana Ciornei <ioana.ciornei@....com>,
Claudiu Manoil <claudiu.manoil@....com>,
Tony Nguyen <anthony.l.nguyen@...el.com>,
Thomas Petazzoni <thomas.petazzoni@...tlin.com>,
Marcin Wojtas <mw@...ihalf.com>,
Russell King <linux@...linux.org.uk>,
Saeed Mahameed <saeedm@...dia.com>,
Leon Romanovsky <leon@...nel.org>,
Alexei Starovoitov <ast@...nel.org>,
Jesper Dangaard Brouer <hawk@...nel.org>,
John Fastabend <john.fastabend@...il.com>,
Edward Cree <ecree.xilinx@...il.com>,
Martin Habets <habetsm.xilinx@...il.com>,
"Michael S. Tsirkin" <mst@...hat.com>,
Jason Wang <jasowang@...hat.com>,
Andrii Nakryiko <andrii@...nel.org>,
Martin KaFai Lau <kafai@...com>,
Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
KP Singh <kpsingh@...nel.org>,
Lorenzo Bianconi <lorenzo@...nel.org>,
Yajun Deng <yajun.deng@...ux.dev>,
Sergey Ryazanov <ryazanov.s.a@...il.com>,
David Ahern <dsahern@...nel.org>,
Andrei Vagin <avagin@...il.com>,
Johannes Berg <johannes.berg@...el.com>,
Vladimir Oltean <vladimir.oltean@....com>,
Cong Wang <cong.wang@...edance.com>, netdev@...r.kernel.org,
linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-rdma@...r.kernel.org, bpf@...r.kernel.org,
Paolo Abeni <pabeni@...hat.com>
Subject: Re: [PATCH v2 net-next 21/26] ice: add XDP and XSK generic
per-channel statistics
On 27/11/2021 00.01, Daniel Borkmann wrote:
> On 11/26/21 11:27 PM, Daniel Borkmann wrote:
>> On 11/26/21 7:06 PM, Jakub Kicinski wrote:
> [...]
>>> The information required by the admin is higher level. As you say the
>>> primary concern there is "how many packets did XDP eat".
>>
>> Agree. Above said, for XDP_DROP I would see one use case where you
>> compare
>> different drivers or bond vs no bond as we did in the past in [0] when
>> testing against a packet generator (although I don't see bond driver
>> covered
>> in this series here yet where it aggregates the XDP stats from all
>> bond slave devs).
>>
>> On a higher-level wrt "how many packets did XDP eat", it would make sense
>> to have the stats for successful XDP_{TX,REDIRECT} given these are out
>> of reach from a BPF prog PoV - we can only count there how many times we
>> returned with XDP_TX but not whether the pkt /successfully made it/.
>>
Exactly.
>> In terms of error cases, could we just standardize all drivers on the
>> behavior
>> of e.g. mlx5e_xdp_handle(), meaning, a failure from XDP_{TX,REDIRECT} will
>> hit the trace_xdp_exception() and then fallthrough to bump a drop counter
>> (same as we bump in XDP_DROP then). So the drop counter will account for
>> program drops but also driver-related drops.
>>
Hmm... I don't agree here. IMHO the BPF-program's *choice* to drop (via
XDP_DROP) should NOT share the counter with the driver-related drops.
The driver-related drops must be accounted separate.
For the record, I think mlx5e_xdp_handle() does the wrong thing, of
accounting everything as XDP_DROP in (rq->stats->xdp_drop++).
Current mlx5 driver stats are highly problematic actually.
Please don't model stats behavior after this driver.
E.g. if BPF-prog takes the *choice* XDP_TX or XDP_REDIRECT or XDP_DROP,
then the packet is invisible to "ifconfig" stats. It is like the driver
never received these packets (which is wrong IMHO). (The stats are only
avail via ethtool -S).
>> At some later point the trace_xdp_exception() could be extended with
>> an error
>> code that the driver would propagate (given some of them look quite
>> similar
>> across drivers, fwiw), and then whoever wants to do further processing
>> with them can do so via bpftrace or other tooling.
I do like trace_xdp_exception() is invoked in mlx5e_xdp_handle(), but do
notice that xdp_do_redirect() also have a tracepoint that can be used
for troubleshooting. (I usually use xdp_monitor for troubleshooting
which catch both).
I like the stats XDP handling better in mvneta_run_xdp().
> Just thinking out loud, one straight forward example we could start out
> with that is also related to Paolo's series [1] ...
>
> enum xdp_error {
> XDP_UNKNOWN,
> XDP_ACTION_INVALID,
> XDP_ACTION_UNSUPPORTED,
> };
>
> ... and then bpf_warn_invalid_xdp_action() returns one of the latter two
> which we pass to trace_xdp_exception(). Later there could be XDP_DRIVER_*
> cases e.g. propagated from XDP_TX error exceptions.
>
> [...]
> default:
> err = bpf_warn_invalid_xdp_action(act);
> fallthrough;
> case XDP_ABORTED:
> xdp_abort:
> trace_xdp_exception(rq->netdev, prog, act, err);
> fallthrough;
> case XDP_DROP:
> lrstats->xdp_drop++;
> break;
> }
> [...]
>
> [1]
> https://lore.kernel.org/netdev/cover.1637924200.git.pabeni@redhat.com/
>
>> So overall wrt this series: from the lrstats we'd be /dropping/ the pass,
>> tx_errors, redirect_errors, invalid, aborted counters. And we'd be
>> /keeping/
>> bytes & packets counters that XDP sees, (driver-)successful tx & redirect
>> counters as well as drop counter. Also, XDP bytes & packets counters
>> should
>> not be counted twice wrt ethtool stats.
>>
>> [0]
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9e2ee5c7e7c35d195e2aa0692a7241d47a433d1e
>>
Concrete example with mlx5:
For most other hardware (than mlx5) I experience that XDP_TX creates a
push-back on NIC RX-handing speed. Thus, the XDP_TX stats recorded by
BPF-prog is usually correct.
With mlx5 hardware (tested on ConnectX-5 Ex MT28800) the RX
packets-per-sec (pps) stats can easily be faster than actually XDP_TX
transmitted frames.
$ sudo ./xdp_rxq_info --dev mlx5p1 --action XDP_TX
[...]
Running XDP on dev:mlx5p1 (ifindex:10) action:XDP_TX options:swapmac
XDP stats CPU pps issue-pps
XDP-RX CPU 1 13,922,430 0
XDP-RX CPU total 13,922,430
RXQ stats RXQ:CPU pps issue-pps
rx_queue_index 1:1 13,922,430 0
rx_queue_index 1:sum 13,922,430
The real xmit speed is (from below ethtool_stats.pl) is
9,391,314 pps <= rx1_xdp_tx_xmit /sec
The dropped packets are double accounted as:
4,552,033 <= rx_xdp_drop /sec
4,552,033 <= rx_xdp_tx_full /sec
Show adapter(s) (mlx5p1) statistics (ONLY that changed!)
Ethtool(mlx5p1 ) stat: 217865 ( 217,865) <= ch1_poll /sec
Ethtool(mlx5p1 ) stat: 217864 ( 217,864) <= ch_poll /sec
Ethtool(mlx5p1 ) stat: 13943371 ( 13,943,371) <=
rx1_cache_reuse /sec
Ethtool(mlx5p1 ) stat: 4552033 ( 4,552,033) <= rx1_xdp_drop /sec
Ethtool(mlx5p1 ) stat: 146740 ( 146,740) <=
rx1_xdp_tx_cqes /sec
Ethtool(mlx5p1 ) stat: 4552033 ( 4,552,033) <=
rx1_xdp_tx_full /sec
Ethtool(mlx5p1 ) stat: 9391314 ( 9,391,314) <=
rx1_xdp_tx_inlnw /sec
Ethtool(mlx5p1 ) stat: 880436 ( 880,436) <=
rx1_xdp_tx_mpwqe /sec
Ethtool(mlx5p1 ) stat: 997833 ( 997,833) <=
rx1_xdp_tx_nops /sec
Ethtool(mlx5p1 ) stat: 9391314 ( 9,391,314) <=
rx1_xdp_tx_xmit /sec
Ethtool(mlx5p1 ) stat: 45095173 ( 45,095,173) <=
rx_64_bytes_phy /sec
Ethtool(mlx5p1 ) stat: 2886090490 ( 2,886,090,490) <= rx_bytes_phy /sec
Ethtool(mlx5p1 ) stat: 13943293 ( 13,943,293) <= rx_cache_reuse
/sec
Ethtool(mlx5p1 ) stat: 31151957 ( 31,151,957) <=
rx_out_of_buffer /sec
Ethtool(mlx5p1 ) stat: 45095158 ( 45,095,158) <= rx_packets_phy
/sec
Ethtool(mlx5p1 ) stat: 2886072350 ( 2,886,072,350) <= rx_prio0_bytes
/sec
Ethtool(mlx5p1 ) stat: 45094878 ( 45,094,878) <=
rx_prio0_packets /sec
Ethtool(mlx5p1 ) stat: 2705707938 ( 2,705,707,938) <=
rx_vport_unicast_bytes /sec
Ethtool(mlx5p1 ) stat: 45095129 ( 45,095,129) <=
rx_vport_unicast_packets /sec
Ethtool(mlx5p1 ) stat: 4552033 ( 4,552,033) <= rx_xdp_drop /sec
Ethtool(mlx5p1 ) stat: 146739 ( 146,739) <= rx_xdp_tx_cqe /sec
Ethtool(mlx5p1 ) stat: 4552033 ( 4,552,033) <= rx_xdp_tx_full
/sec
Ethtool(mlx5p1 ) stat: 9391319 ( 9,391,319) <=
rx_xdp_tx_inlnw /sec
Ethtool(mlx5p1 ) stat: 880436 ( 880,436) <=
rx_xdp_tx_mpwqe /sec
Ethtool(mlx5p1 ) stat: 997831 ( 997,831) <= rx_xdp_tx_nops
/sec
Ethtool(mlx5p1 ) stat: 9391319 ( 9,391,319) <= rx_xdp_tx_xmit
/sec
Ethtool(mlx5p1 ) stat: 601044221 ( 601,044,221) <= tx_bytes_phy /sec
Ethtool(mlx5p1 ) stat: 9391316 ( 9,391,316) <= tx_packets_phy
/sec
Ethtool(mlx5p1 ) stat: 601040871 ( 601,040,871) <= tx_prio0_bytes
/sec
Ethtool(mlx5p1 ) stat: 9391264 ( 9,391,264) <=
tx_prio0_packets /sec
Ethtool(mlx5p1 ) stat: 563478483 ( 563,478,483) <=
tx_vport_unicast_bytes /sec
Ethtool(mlx5p1 ) stat: 9391316 ( 9,391,316) <=
tx_vport_unicast_packets /sec
[1]
https://github.com/netoptimizer/network-testing/blob/master/bin/ethtool_stats.pl
The net_devices stats says the NIC is processing zero packets:
$ sar -n DEV 2 1000
[...]
Average: IFACE rxpck/s txpck/s rxkB/s txkB/s
rxcmp/s txcmp/s rxmcst/s %ifutil
[...]
Average: mlx5p1 0,00 0,00 0,00 0,00
0,00 0,00 0,00 0,00
Average: mlx5p2 0,00 0,00 0,00 0,00
0,00 0,00 0,00 0,00
Powered by blists - more mailing lists