[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <636c1a4b9010eab5d461c13c7544a1d9e9f9ff3f.camel@mellanox.com>
Date: Thu, 22 Nov 2018 00:21:37 +0000
From: Saeed Mahameed <saeedm@...lanox.com>
To: "pstaszewski@...are.pl" <pstaszewski@...are.pl>,
"toke@...e.dk" <toke@...e.dk>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"dsahern@...il.com" <dsahern@...il.com>
CC: "davem@...emloft.net" <davem@...emloft.net>,
"jasowang@...hat.com" <jasowang@...hat.com>,
"brouer@...hat.com" <brouer@...hat.com>,
"mst@...hat.com" <mst@...hat.com>
Subject: Re: consistency for statistics with XDP mode
On Wed, 2018-11-21 at 22:29 +0100, Paweł Staszewski wrote:
> W dniu 21.11.2018 o 22:14, Toke Høiland-Jørgensen pisze:
> > David Ahern <dsahern@...il.com> writes:
> >
> > > Paweł ran some more XDP tests yesterday and from it found a
> > > couple of
> > > issues. One is a panic in the mlx5 driver unloading the bpf
> > > program
> > > (mlx5e_xdp_xmit); he will send a send a separate email for that
> > > problem.
> > Same as this one, I guess?
> >
> > https://marc.info/?l=linux-netdev&m=153855905619717&w=2
>
> Yes same as this one.
>
> When there is no traffic (for example with xdp_fwd program loaded)
> or
> there is not much traffic like 1k frames per second for icmp - i can
> load/unload without crashing kernel
>
> But when i push tests with pktgen and use more than >50k pps for udp
> -
> then unbinding xdp_fwd program makes kernel to panic :)
>
Yea, this is not precisely mlx5 issue. this is one of the issues we
discussed at LPC, and i think we all agreed that the XDP redirect
infrastructure must allow different driver to sync when they are
changing configurations or disabling XPD tx for a moment, so the fix
must be in the XDP redirect infrastructure.
here is the issue description and a temp fix that i provided to Toke:
https://marc.info/?l=linux-netdev&m=154023109526642&w=2
patch:
https://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git/commit/?h=topic/xdp-redirect-fix&id=a3652d03cc35fd3ad62744986c8ccaca74c9f20c
>
>
> > > The problem I wanted to discuss here is statistics for XDP
> > > context. The
> > > short of it is that we need consistency in the counters across
> > > NIC
> > > drivers and virtual devices. Right now stats are specific to a
> > > driver
> > > with no clear accounting for the packets and bytes handled in
> > > XDP.
> > >
> > > For example virtio has some stats as device private data
> > > extracted via
> > > ethtool:
> > > $ ethtool -S eth2 | grep xdp
> > > ...
> > > rx_queue_3_xdp_packets: 5291
> > > rx_queue_3_xdp_tx: 0
> > > rx_queue_3_xdp_redirects: 5163
> > > rx_queue_3_xdp_drops: 0
> > > ...
> > > tx_queue_3_xdp_tx: 5163
> > > tx_queue_3_xdp_tx_drops: 0
> > >
> > > And the standard counters appear to track bytes and packets for
> > > Rx, but
> > > not Tx if the packet is forwarded in XDP.
> > >
> > > Similarly, mlx5 has some counters (thanks to Jesper and Toke for
> > > helping
> > > out here):
> > >
> > > $ ethtool -S mlx5p1 | grep xdp
> > > rx_xdp_drop: 86468350180
> > > rx_xdp_redirect: 18860584
> > > rx_xdp_tx_xmit: 0
> > > rx_xdp_tx_full: 0
> > > rx_xdp_tx_err: 0
> > > rx_xdp_tx_cqe: 0
> > > tx_xdp_xmit: 0
> > > tx_xdp_full: 0
> > > tx_xdp_err: 0
> > > tx_xdp_cqes: 0
> > > ...
> > > rx3_xdp_drop: 86468350180
> > > rx3_xdp_redirect: 18860556
> > > rx3_xdp_tx_xmit: 0
> > > rx3_xdp_tx_full: 0
> > > rx3_xdp_tx_err: 0
> > > rx3_xdp_tx_cqes: 0
> > > ...
> > > tx0_xdp_xmit: 0
> > > tx0_xdp_full: 0
> > > tx0_xdp_err: 0
> > > tx0_xdp_cqes: 0
> > > ...
> > >
> > > And no accounting in standard stats for packets handled in XDP.
> > >
> > > And then if I understand Jesper's data correctly, the i40e driver
> > > does
> > > not have device specific data:
> > >
> > > $ ethtool -S i40e1 | grep xdp
> > > [NOTHING]
> > >
> > >
> > > But rather bumps the standard counters:
> > >
> > > sudo ./xdp_rxq_info --dev i40e1 --action XDP_DROP
> > >
> > > Running XDP on dev:i40e1 (ifindex:3) action:XDP_DROP
> > > options:no_touch
> > > XDP stats CPU pps issue-pps
> > > XDP-RX CPU 1 36,156,872 0
> > > XDP-RX CPU total 36,156,872
> > >
> > > RXQ stats RXQ:CPU pps issue-pps
> > > rx_queue_index 1:1 36,156,878 0
> > > rx_queue_index 1:sum 36,156,878
> > >
> > >
> > > $ ethtool_stats.pl --dev i40e1
> > >
> > > Show adapter(s) (i40e1) statistics (ONLY that changed!)
> > > Ethtool(i40e1 ) stat: 2711292859 ( 2,711,292,859) <=
> > > port.rx_bytes /sec
> > > Ethtool(i40e1 ) stat: 6274204 ( 6,274,204) <=
> > > port.rx_dropped /sec
> > > Ethtool(i40e1 ) stat: 42363867 ( 42,363,867) <=
> > > port.rx_size_64 /sec
> > > Ethtool(i40e1 ) stat: 42363950 ( 42,363,950) <=
> > > port.rx_unicast /sec
> > > Ethtool(i40e1 ) stat: 2165051990 ( 2,165,051,990) <= rx-
> > > 1.bytes /sec
> > > Ethtool(i40e1 ) stat: 36084200 ( 36,084,200) <= rx-
> > > 1.packets /sec
> > > Ethtool(i40e1 ) stat: 5385 ( 5,385) <=
> > > rx_dropped /sec
> > > Ethtool(i40e1 ) stat: 36089727 ( 36,089,727) <=
> > > rx_unicast /sec
> > >
> > >
> > > We really need consistency in the counters and at a minimum,
> > > users
> > > should be able to track packet and byte counters for both Rx and
> > > Tx
> > > including XDP.
> > >
> > > It seems to me the Rx and Tx packet, byte and dropped counters
> > > returned
> > > for the standard device stats (/proc/net/dev, ip -s li show, ...)
> > > should
> > > include all packets managed by the driver regardless of whether
> > > they are
> > > forwarded / dropped in XDP or go up the Linux stack. This also
> > > aligns
> > > with mlxsw and the stats it shows which are packets handled by
> > > the hardware.
> > >
> > > From there the private stats can include XDP specifics as
> > > desired --
> > > like the drops and redirects but that those should be add-ons and
> > > even
> > > here some consistency makes life easier for users.
> > >
> > > The same standards should be also be applied to virtual devices
> > > built on
> > > top of the ports -- e.g, vlans. I have an API now that allows
> > > bumping
> > > stats for vlan devices.
> > >
> > > Keeping the basic xdp packets in the standard counters allows
> > > Paweł, for
> > > example, to continue to monitor /proc/net/dev.
> > >
> > > Can we get agreement on this? And from there, get updates to the
> > > mlx5
> > > and virtio drivers?
> > I'd say it sounds reasonable to include XDP in the normal traffic
> > counters, but having the detailed XDP-specific counters is quite
> > useful
> > as well... So can't we do both (for all drivers)?
> >
What are you thinking ?
reporting XDP_DROP in interface dropped counter ?
and XDP_TX/REDIRECT in the TX counter ?
XDP_ABORTED in the err/drop counter ?
how about having a special XDP command in the .ndo_bpf that would query
the standardized XDP stats ?
> > -Toke
> >
Powered by blists - more mailing lists