linux-kernel - Re: [PATCH v2 net-next 21/26] ice: add XDP and XSK generic per-channel statistics

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <878rx79o3l.fsf@toke.dk>
Date:   Mon, 29 Nov 2021 12:51:26 +0100
From:   Toke Høiland-Jørgensen <toke@...hat.com>
To:     Daniel Borkmann <daniel@...earbox.net>,
        Jakub Kicinski <kuba@...nel.org>
Cc:     Alexander Lobakin <alexandr.lobakin@...el.com>,
        "David S. Miller" <davem@...emloft.net>,
        Jesse Brandeburg <jesse.brandeburg@...el.com>,
        Michal Swiatkowski <michal.swiatkowski@...ux.intel.com>,
        Maciej Fijalkowski <maciej.fijalkowski@...el.com>,
        Jonathan Corbet <corbet@....net>,
        Shay Agroskin <shayagr@...zon.com>,
        Arthur Kiyanovski <akiyano@...zon.com>,
        David Arinzon <darinzon@...zon.com>,
        Noam Dagan <ndagan@...zon.com>,
        Saeed Bishara <saeedb@...zon.com>,
        Ioana Ciornei <ioana.ciornei@....com>,
        Claudiu Manoil <claudiu.manoil@....com>,
        Tony Nguyen <anthony.l.nguyen@...el.com>,
        Thomas Petazzoni <thomas.petazzoni@...tlin.com>,
        Marcin Wojtas <mw@...ihalf.com>,
        Russell King <linux@...linux.org.uk>,
        Saeed Mahameed <saeedm@...dia.com>,
        Leon Romanovsky <leon@...nel.org>,
        Alexei Starovoitov <ast@...nel.org>,
        Jesper Dangaard Brouer <hawk@...nel.org>,
        John Fastabend <john.fastabend@...il.com>,
        Edward Cree <ecree.xilinx@...il.com>,
        Martin Habets <habetsm.xilinx@...il.com>,
        "Michael S. Tsirkin" <mst@...hat.com>,
        Jason Wang <jasowang@...hat.com>,
        Andrii Nakryiko <andrii@...nel.org>,
        Martin KaFai Lau <kafai@...com>,
        Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
        KP Singh <kpsingh@...nel.org>,
        Lorenzo Bianconi <lorenzo@...nel.org>,
        Yajun Deng <yajun.deng@...ux.dev>,
        Sergey Ryazanov <ryazanov.s.a@...il.com>,
        David Ahern <dsahern@...nel.org>,
        Andrei Vagin <avagin@...il.com>,
        Johannes Berg <johannes.berg@...el.com>,
        Vladimir Oltean <vladimir.oltean@....com>,
        Cong Wang <cong.wang@...edance.com>, netdev@...r.kernel.org,
        linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-rdma@...r.kernel.org, bpf@...r.kernel.org,
        virtualization@...ts.linux-foundation.org
Subject: Re: [PATCH v2 net-next 21/26] ice: add XDP and XSK generic
 per-channel statistics

Daniel Borkmann <daniel@...earbox.net> writes:

> On 11/26/21 7:06 PM, Jakub Kicinski wrote:
>> On Fri, 26 Nov 2021 13:30:16 +0100 Toke Høiland-Jørgensen wrote:
>>>>> TBH I wasn't following this thread too closely since I saw Daniel
>>>>> nacked it already. I do prefer rtnl xstats, I'd just report them
>>>>> in -s if they are non-zero. But doesn't sound like we have an agreement
>>>>> whether they should exist or not.
>>>>
>>>> Right, just -s is fine, if we drop the per-channel approach.
>>>
>>> I agree that adding them to -s is fine (and that resolves my "no one
>>> will find them" complain as well). If it crowds the output we could also
>>> default to only output'ing a subset, and have the more detailed
>>> statistics hidden behind a verbose switch (or even just in the JSON
>>> output)?
>>>
>>>>> Can we think of an approach which would make cloudflare and cilium
>>>>> happy? Feels like we're trying to make the slightly hypothetical
>>>>> admin happy while ignoring objections of very real users.
>>>>
>>>> The initial idea was to only uniform the drivers. But in general
>>>> you are right, 10 drivers having something doesn't mean it's
>>>> something good.
>>>
>>> I don't think it's accurate to call the admin use case "hypothetical".
>>> We're expending a significant effort explaining to people that XDP can
>>> "eat" your packets, and not having any standard statistics makes this
>>> way harder. We should absolutely cater to our "early adopters", but if
>>> we want XDP to see wider adoption, making it "less weird" is critical!
>> 
>> Fair. In all honesty I said that hoping to push for a more flexible
>> approach hidden entirely in BPF, and not involving driver changes.
>> Assuming the XDP program has more fine grained stats we should be able
>> to extract those instead of double-counting. Hence my vague "let's work
>> with apps" comment.
>> 
>> For example to a person familiar with the workload it'd be useful to
>> know if program returned XDP_DROP because of configured policy or
>> failure to parse a packet. I don't think that sort distinction is
>> achievable at the level of standard stats.
>
> Agree on the additional context. How often have you looked at tc clsact
> /dropped/ stats specifically when you debug a more complex BPF program
> there?
>
>    # tc -s qdisc show clsact dev foo
>    qdisc clsact ffff: parent ffff:fff1
>     Sent 6800 bytes 120 pkt (dropped 0, overlimits 0 requeues 0)
>     backlog 0b 0p requeues 0
>
> Similarly, XDP_PASS counters may be of limited use as well for same reason
> (and I think we might not even have a tc counter equivalent for it).
>
>> The information required by the admin is higher level. As you say the
>> primary concern there is "how many packets did XDP eat".
>
> Agree. Above said, for XDP_DROP I would see one use case where you compare
> different drivers or bond vs no bond as we did in the past in [0] when
> testing against a packet generator (although I don't see bond driver covered
> in this series here yet where it aggregates the XDP stats from all bond slave
> devs).
>
> On a higher-level wrt "how many packets did XDP eat", it would make sense
> to have the stats for successful XDP_{TX,REDIRECT} given these are out
> of reach from a BPF prog PoV - we can only count there how many times we
> returned with XDP_TX but not whether the pkt /successfully made it/.
>
> In terms of error cases, could we just standardize all drivers on the behavior
> of e.g. mlx5e_xdp_handle(), meaning, a failure from XDP_{TX,REDIRECT} will
> hit the trace_xdp_exception() and then fallthrough to bump a drop counter
> (same as we bump in XDP_DROP then). So the drop counter will account for
> program drops but also driver-related drops.
>
> At some later point the trace_xdp_exception() could be extended with an error
> code that the driver would propagate (given some of them look quite similar
> across drivers, fwiw), and then whoever wants to do further processing with
> them can do so via bpftrace or other tooling.
>
> So overall wrt this series: from the lrstats we'd be /dropping/ the pass,
> tx_errors, redirect_errors, invalid, aborted counters. And we'd be /keeping/
> bytes & packets counters that XDP sees, (driver-)successful tx & redirect
> counters as well as drop counter. Also, XDP bytes & packets counters should
> not be counted twice wrt ethtool stats.

This sounds reasonable to me, and I also like the error code to
tracepoint idea :)

-Toke