lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 24 Nov 2021 16:34:08 +0000
From:   Lorenz Bauer <lmb@...udflare.com>
To:     Daniel Borkmann <daniel@...earbox.net>
Cc:     Alexander Lobakin <alexandr.lobakin@...el.com>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Jesse Brandeburg <jesse.brandeburg@...el.com>,
        Michal Swiatkowski <michal.swiatkowski@...ux.intel.com>,
        Maciej Fijalkowski <maciej.fijalkowski@...el.com>,
        Jonathan Corbet <corbet@....net>,
        Shay Agroskin <shayagr@...zon.com>,
        Arthur Kiyanovski <akiyano@...zon.com>,
        David Arinzon <darinzon@...zon.com>,
        Noam Dagan <ndagan@...zon.com>,
        Saeed Bishara <saeedb@...zon.com>,
        Ioana Ciornei <ioana.ciornei@....com>,
        Claudiu Manoil <claudiu.manoil@....com>,
        Tony Nguyen <anthony.l.nguyen@...el.com>,
        Thomas Petazzoni <thomas.petazzoni@...tlin.com>,
        Marcin Wojtas <mw@...ihalf.com>,
        Russell King <linux@...linux.org.uk>,
        Saeed Mahameed <saeedm@...dia.com>,
        Leon Romanovsky <leon@...nel.org>,
        Alexei Starovoitov <ast@...nel.org>,
        Jesper Dangaard Brouer <hawk@...nel.org>,
        Toke Høiland-Jørgensen <toke@...hat.com>,
        John Fastabend <john.fastabend@...il.com>,
        Edward Cree <ecree.xilinx@...il.com>,
        Martin Habets <habetsm.xilinx@...il.com>,
        "Michael S. Tsirkin" <mst@...hat.com>,
        Jason Wang <jasowang@...hat.com>,
        Andrii Nakryiko <andrii@...nel.org>,
        Martin KaFai Lau <kafai@...com>,
        Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
        KP Singh <kpsingh@...nel.org>,
        Lorenzo Bianconi <lorenzo@...nel.org>,
        Yajun Deng <yajun.deng@...ux.dev>,
        Sergey Ryazanov <ryazanov.s.a@...il.com>,
        David Ahern <dsahern@...nel.org>,
        Andrei Vagin <avagin@...il.com>,
        Johannes Berg <johannes.berg@...el.com>,
        Vladimir Oltean <vladimir.oltean@....com>,
        Cong Wang <cong.wang@...edance.com>,
        Networking <netdev@...r.kernel.org>, linux-doc@...r.kernel.org,
        LKML <linux-kernel@...r.kernel.org>, linux-rdma@...r.kernel.org,
        bpf <bpf@...r.kernel.org>,
        virtualization@...ts.linux-foundation.org
Subject: Re: [PATCH v2 net-next 21/26] ice: add XDP and XSK generic
 per-channel statistics

Daniel asked me to share my opinion, as Cloudflare has an XDP load
balancer as well.

On Wed, 24 Nov 2021 at 00:53, Daniel Borkmann <daniel@...earbox.net> wrote:

> I'm just taking our XDP L4LB in Cilium as an example: there we already count errors and
> export them via per-cpu map that eventually lead to XDP_DROP cases including the /reason/
> which caused the XDP_DROP (e.g. Prometheus can then scrape these insights from all the
> nodes in the cluster). Given the different action codes are very often application specific,
> there's not much debugging that you can do when /only/ looking at `ip link xdpstats` to
> gather insight on *why* some of these actions were triggered (e.g. fib lookup failure, etc).

Agreed. For our purpose we often want to know whether a specific
program has been invoked. Per-channel or per device stats don't help
us much since we have a chain of programs (not using libxdp though).
My colleague Arthur has written xdpcap [1], which gives per-action,
per-program counters. This way we can correlate an action with a
packet and a program.

> If really of interest, then maybe libxdp could have such per-action counters as opt-in in
> its call chain..

We could also make it part of BPF_ENABLE_STATS, it's kind of coarse
grained though.

> In the case of ice_run_xdp() today, we already bump total_rx_bytes/total_rx_pkts under
> XDP and update ice_update_rx_ring_stats(). I do see the case for XDP_TX and XDP_REDIRECT
> where we run into driver-specific errors that are /outside of the reach/ of the BPF prog.
> For example, we've been running into errors from XDP_TX in ice_xmit_xdp_ring() in the
> past during testing, and were able to pinpoint the location as xdp_ring->tx_stats.tx_busy
> was increasing. These things are useful and would make sense to standardize for XDP context.

I'd like to see more tracepoints like trace_xdp_exception, personally.
We can use things like bpftrace for exploration and ebpf_exporter [2]
to generate alerts much more easily than something wired into
iproute2.

Best
Lorenz

1: https://github.com/cloudflare/xdpcap
2: https://github.com/cloudflare/ebpf_exporter

-- 
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ