netdev - Re: [iovisor-dev] Best userspace programming API for XDP features query to kernel?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180409132635.7d5e1bda@redhat.com>
Date:   Mon, 9 Apr 2018 13:26:35 +0200
From:   Jesper Dangaard Brouer <brouer@...hat.com>
To:     Daniel Borkmann <daniel@...earbox.net>
Cc:     Eric Leblond <eric@...it.org>, Victor Julien <victor@...iniac.net>,
        Peter Manev <petermanev@...il.com>,
        oisf-devel@...ts.openinfosecfoundation.org,
        Jakub Kicinski <jakub.kicinski@...ronome.com>,
        Alexei Starovoitov <ast@...com>,
        "iovisor-dev@...ts.iovisor.org" <iovisor-dev@...ts.iovisor.org>,
        Saeed Mahameed <saeedm@...lanox.com>,
        Daniel Borkmann <borkmann@...earbox.net>,
        netdev@...r.kernel.org, davem@...emloft.net,
        Jiri Benc <jbenc@...hat.com>, brouer@...hat.com
Subject: Re: [iovisor-dev] Best userspace programming API for XDP features
 query to kernel?

On Fri, 6 Apr 2018 12:36:18 +0200
Daniel Borkmann <daniel@...earbox.net> wrote:

> On 04/05/2018 10:51 PM, Jesper Dangaard Brouer wrote:
> > On Thu, 5 Apr 2018 12:37:19 +0200
> > Daniel Borkmann <daniel@...earbox.net> wrote:
> >   
> >> On 04/04/2018 02:28 PM, Jesper Dangaard Brouer via iovisor-dev wrote:  
> >>> Hi Suricata people,
> >>>
> >>> When Eric Leblond (and I helped) integrated XDP in Suricata, we ran
> >>> into the issue, that at Suricata load/start time, we cannot determine
> >>> if the chosen XDP config options, like xdp-cpu-redirect[1], is valid on
> >>> this HW (e.g require driver XDP_REDIRECT support and bpf cpumap).
> >>>
> >>> We would have liked a way to report that suricata.yaml config was
> >>> invalid for this hardware/setup.  Now, it just loads, and packets gets
> >>> silently dropped by XDP (well a WARN_ONCE and catchable via tracepoints).
> >>>
> >>> My question to suricata developers: (Q1) Do you already have code that
> >>> query the kernel or drivers for features?
> >>>
> >>> At the IOvisor call (2 weeks ago), we discussed two options of exposing
> >>> XDP features avail in a given driver.
> >>>
> >>> Option#1: Extend existing ethtool -k/-K "offload and other features"
> >>> with some XDP features, that userspace can query. (Do you already query
> >>> offloads, regarding Q1)
> >>>
> >>> Option#2: Invent a new 'ip link set xdp' netlink msg with a query option.    
> >>
> >> I don't really mind if you go via ethtool, as long as we handle this
> >> generically from there and e.g. call the dev's ndo_bpf handler such that
> >> we keep all the information in one place. This can be a new ndo_bpf command
> >> e.g. XDP_QUERY_FEATURES or such.  
> > 
> > Just to be clear: notice as Victor points out[2], they are programmable
> > going though the IOCTL (SIOCETHTOOL) and not using cmdline tools.  
> 
> Sure, that was perfectly clear. (But at the same time if you extend the
> ioctl, it's obvious to also add support to actual ethtool cmdline tool.)
> 
> > [2] https://github.com/OISF/suricata/blob/master/src/util-ioctl.c#L326
> > 
> > If you want everything to go through the drivers ndo_bpf call anyway
> > (which userspace API is netlink based) then at what point to you  
> 
> Not really, that's the front end. ndo_bpf itself is a plain netdev op
> and has no real tie to netlink.
> 
> > want drivers to call their own ndo_bpf, when activated though their
> > ethtool_ops ? (Sorry, but I don't follow the flow you are proposing)
> > 
> > Notice, I'm not directly against using the drivers ndo_bpf call.  I can
> > see it does provide kernel more flexibility than the ethtool IOCTL.  
> 
> What I was saying is that even if you go via ethtool ioctl api, where
> you end up in dev_ethtool() and have some new ETHTOOL_* query command,
> then instead of adding a new ethtool_ops callback, we can and should
> reuse ndo_bpf from there.

Okay, you want to catch this in dev_ethtool(), that connected the dots
for me, thanks.  BUT it does feel a little fake and confusing to do
this, as the driver developers seeing the info via ethtool, would
expect they need to implement an ethtool_ops callback.

My original idea behind going through the ethtool APIs, were that every
driver is already using this, and it's familiar to the driver
developers.  And there are existing userspace tools that sysadm's
already know, that we can leverage.

Thinking about this over the weekend.  I have changed my mind a bit,
based on your feedback. I do buy into your idea of forcing things
through the drivers ndo_bpf call.  I'm no-longer convinced this should
go through ethtool, but as (you desc) we can, if we find that this is
the easiest ways to export and leverage an existing userspace tool.



> [...]
> > Here, I want to discuss how drivers expose/tell userspace that they
> > support a given feature: Specifically a bit for: XDP_REDIRECT action
> > support.
> >   
> >> Same for meta data,  
> > 
> > Well, not really.  It would be a "nice-to-have", but not strictly
> > needed as a feature bit.  XDP meta-data is controlled via a helper.
> > And the BPF-prog can detect/see runtime, that the helper bpf_xdp_adjust_meta
> > returns -ENOTSUPP (and need to check the ret value anyhow).  Thus,
> > there is that not much gained by exposing this to be detected setup
> > time, as all drivers should eventually support this, and we can detect
> > it runtime.
> > 
> > The missing XDP_REDIRECT action features bit it different, as the
> > BPF-prog cannot detect runtime that this is an unsupported action.
> > Plus, setup time we cannot query the driver for supported XDP actions.  
> 
> Ok, so with the example of meta data, you're arguing that it's okay
> to load a native XDP program onto a driver, and run actual traffic on
> the NIC in order probe for the availability of the feature when you're
> saying that it "can detect/see [at] runtime". I totally agree with you
> that all drivers should eventually support this (same with XDP_REDIRECT),
> but today there are even differences in drivers on bpf_xdp_adjust_meta()/
> bpf_xdp_adjust_head() with regards to how much headroom they have available,
> etc (e.g. some of them have none), so right now you can either go and
> read the code or do a runtime test with running actual traffic through
> the NIC to check whether your BPF prog is supported or not. Theoretically,
> you can do the same runtime test with XDP_REDIRECT (taking the warn in
> bpf_warn_invalid_xdp_action() aside for a moment), but you do have the
> trace_xdp_exception() tracepoint to figure it out, yes, it's a painful
> hassle, but overall, it's not that different as you were trying to argue
> here. For /both/ cases it would be nice to know at setup time whether
> this would be supported or not. Hence, such query is not just limited to
> XDP_REDIRECT alone. Eventually once such interface is agreed upon,
> undoubtedly the list of feature bits will grow is what I'm trying to say;
> only arguing on the XDP_REDIRECT here would be short term.
> 
> [...]
> The underlying problem is effectively the decoupling of program
> verification that doesn't have/know the context of where it is being
> attached to in this case. 

Yes, exactly, that the underlying problem. (plus the first XDP prog
gets verified and driver attached, and subsequent added bpf tail calls,
cannot be "rejected" (at "driver attachment time") as its too late).

> Thinking out loud for a sec on couple of other options aside
> from feature bits, what about i) providing the target ifindex to the
> verifier for XDP programs, such that at verification time you have the
> full context similar to nfp offload case today, 

I do like that we could provide the target ifindex earlier.  But
userspace still need some way to express that it e.g. need the
XDP_REDIRECT features (as verifier cannot reliability detect the action
return codes used, as discussed before, due to bpf tail calls and maps
used as return values).

> or ii) populating some
> XDP specific auxillary data to the BPF program at verification time such
> that the driver can check at program attach time whether the requested
> features are possible and if not it will reject and respond with netlink
> extack message to the user (as we do in various XDP attach cases already
> through XDP_SETUP_PROG{,_HW}).

I like proposal ii) better.  But how do I specify/input that I need
e.g. the XDP_REDIRECT feature, such that is it avail to "the BPF
program at verification time"?


My proposal iii), what about at XDP attachment time, create a new
netlink attribute that describe XDP action codes the program
needs/wants. If the info is provided, the ndo_bpf call check and
reject, and respond with netlink extack message.
  If I want to query for avail action codes, then I can use the same
netlink attribute format, and kernel will return it "populated" with
the info.


It is very useful that the program gets rejected at attachment time (to
avoid loading an XDP prog that silently drops packets). BUT I also
want a query option/functionality (reuse netlink attr format).

Specifically for Suricata the same "bypass" features can be implemented
at different levels (XDP, XDP-generic or clsbpf), and the XDP program
contains multiple features.  Thus, depending on what NIC driver
supports, we want to load different XDP and/or clsbpf/TC BPF-programs.
Thus, still support the same user requested feature/functionality, even
if XDP_REDIRECT is not avail, just slightly slower.


> This would, for example, avoid the need for feature bits, and do actual
> rejection of the program while retaining flexibility (and avoiding to
> expose bits that over time hopefully will deprecate anyway due to all
> XDP aware drivers implementing them). For both cases i) and ii), it
> would mean we make the verifier a bit smarter with regards to keeping
> track of driver related (XDP) requirements. Program return code checking
> is already present in the verifier's check_return_code() and we could
> extend it for XDP as well, for example. Seems cleaner and more extensible
> than feature bits, imho.

I thought the verifier's check_return_code() couldn't see/verify if the
XDP return action code is provided as a map lookup result(?). How is
that handled?

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer