netdev - Re: [PATCH net-next 00/11] Add drop monitor for offloaded data paths

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87r26vvbz8.fsf@toke.dk>
Date:   Fri, 12 Jul 2019 11:27:55 +0200
From:   Toke Høiland-Jørgensen <toke@...hat.com>
To:     Neil Horman <nhorman@...driver.com>,
        Ido Schimmel <idosch@...sch.org>
Cc:     David Miller <davem@...emloft.net>, netdev@...r.kernel.org,
        jiri@...lanox.com, mlxsw@...lanox.com, dsahern@...il.com,
        roopa@...ulusnetworks.com, nikolay@...ulusnetworks.com,
        andy@...yhouse.net, pablo@...filter.org,
        jakub.kicinski@...ronome.com, pieter.jansenvanvuuren@...ronome.com,
        andrew@...n.ch, f.fainelli@...il.com, vivien.didelot@...il.com,
        idosch@...lanox.com
Subject: Re: [PATCH net-next 00/11] Add drop monitor for offloaded data paths

Neil Horman <nhorman@...driver.com> writes:

> On Thu, Jul 11, 2019 at 03:39:09PM +0300, Ido Schimmel wrote:
>> On Sun, Jul 07, 2019 at 12:45:41PM -0700, David Miller wrote:
>> > From: Ido Schimmel <idosch@...sch.org>
>> > Date: Sun,  7 Jul 2019 10:58:17 +0300
>> > 
>> > > Users have several ways to debug the kernel and understand why a packet
>> > > was dropped. For example, using "drop monitor" and "perf". Both
>> > > utilities trace kfree_skb(), which is the function called when a packet
>> > > is freed as part of a failure. The information provided by these tools
>> > > is invaluable when trying to understand the cause of a packet loss.
>> > > 
>> > > In recent years, large portions of the kernel data path were offloaded
>> > > to capable devices. Today, it is possible to perform L2 and L3
>> > > forwarding in hardware, as well as tunneling (IP-in-IP and VXLAN).
>> > > Different TC classifiers and actions are also offloaded to capable
>> > > devices, at both ingress and egress.
>> > > 
>> > > However, when the data path is offloaded it is not possible to achieve
>> > > the same level of introspection as tools such "perf" and "drop monitor"
>> > > become irrelevant.
>> > > 
>> > > This patchset aims to solve this by allowing users to monitor packets
>> > > that the underlying device decided to drop along with relevant metadata
>> > > such as the drop reason and ingress port.
>> > 
>> > We are now going to have 5 or so ways to capture packets passing through
>> > the system, this is nonsense.
>> > 
>> > AF_PACKET, kfree_skb drop monitor, perf, XDP perf events, and now this
>> > devlink thing.
>> > 
>> > This is insanity, too many ways to do the same thing and therefore the
>> > worst possible user experience.
>> > 
>> > Pick _ONE_ method to trap packets and forward normal kfree_skb events,
>> > XDP perf events, and these taps there too.
>> > 
>> > I mean really, think about it from the average user's perspective.  To
>> > see all drops/pkts I have to attach a kfree_skb tracepoint, and not just
>> > listen on devlink but configure a special tap thing beforehand and then
>> > if someone is using XDP I gotta setup another perf event buffer capture
>> > thing too.
>> 
>> Dave,
>> 
>> Before I start working on v2, I would like to get your feedback on the
>> high level plan. Also adding Neil who is the maintainer of drop_monitor
>> (and counterpart DropWatch tool [1]).
>> 
>> IIUC, the problem you point out is that users need to use different
>> tools to monitor packet drops based on where these drops occur
>> (SW/HW/XDP).
>> 
>> Therefore, my plan is to extend the existing drop_monitor netlink
>> channel to also cover HW drops. I will add a new message type and a new
>> multicast group for HW drops and encode in the message what is currently
>> encoded in the devlink events.
>> 
> A few things here:
> IIRC we don't announce individual hardware drops, drivers record them in
> internal structures, and they are retrieved on demand via ethtool calls, so you
> will either need to include some polling (probably not a very performant idea),
> or some sort of flagging mechanism to indicate that on the next message sent to
> user space you should go retrieve hw stats from a given interface.  I certainly
> wouldn't mind seeing this happen, but its more work than just adding a new
> netlink message.
>
> Also, regarding XDP drops, we wont see them if the xdp program is offloaded to
> hardware (you'll need your hw drop gathering mechanism for that), but for xdp
> programs run on the cpu, dropwatch should alrady catch those.  I.e. if the xdp
> program returns a DROP result for a packet being processed, the OS will call
> kfree_skb on its behalf, and dropwatch wil call that.

There is no skb by the time an XDP program runs, so this is not true. As
I mentioned upthread, there's a tracepoint that will get called if an
error occurs (or the program returns XDP_ABORTED), but in most cases,
XDP_DROP just means that the packet silently disappears...

-Toke