[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHrNZNjfNU-XPHk+XHoXQ78PZeFNA3dZAg1kfFexgcH522j=jQ@mail.gmail.com>
Date: Mon, 14 Jun 2021 13:41:15 -0700
From: Tanner Love <tannerlove.kernel@...il.com>
To: Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc: Jason Wang <jasowang@...hat.com>,
Alexei Starovoitov <alexei.starovoitov@...il.com>,
Network Development <netdev@...r.kernel.org>,
David Miller <davem@...emloft.net>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Andrii Nakryiko <andrii@...nel.org>,
Eric Dumazet <edumazet@...gle.com>,
Petar Penkov <ppenkov@...gle.com>,
Jakub Kicinski <kuba@...nel.org>,
"Michael S . Tsirkin" <mst@...hat.com>,
Tanner Love <tannerlove@...gle.com>
Subject: Re: [PATCH net-next v4 2/3] virtio_net: add optional flow dissection
in virtio_net_hdr_to_skb
On Fri, Jun 11, 2021 at 7:13 AM Willem de Bruijn
<willemdebruijn.kernel@...il.com> wrote:
>
> On Thu, Jun 10, 2021 at 11:40 PM Jason Wang <jasowang@...hat.com> wrote:
> >
> >
> > 在 2021/6/11 上午10:45, Willem de Bruijn 写道:
> > > On Thu, Jun 10, 2021 at 10:11 PM Jason Wang <jasowang@...hat.com> wrote:
> > >>
> > >> 在 2021/6/10 下午10:04, Willem de Bruijn 写道:
> > >>> On Thu, Jun 10, 2021 at 1:25 AM Jason Wang <jasowang@...hat.com> wrote:
> > >>>> 在 2021/6/10 下午12:19, Alexei Starovoitov 写道:
> > >>>>> On Wed, Jun 9, 2021 at 9:13 PM Jason Wang <jasowang@...hat.com> wrote:
> > >>>>>> So I wonder why not simply use helpers to access the vnet header like
> > >>>>>> how tcp-bpf access the tcp header?
> > >>>>> Short answer - speed.
> > >>>>> tcp-bpf accesses all uapi and non-uapi structs directly.
> > >>>>>
> > >>>> Ok, this makes sense. But instead of coupling device specific stuffs
> > >>>> like vnet header and neediness into general flow_keys as a context.
> > >>>>
> > >>>> It would be better to introduce a vnet header context which contains
> > >>>>
> > >>>> 1) vnet header
> > >>>> 2) flow keys
> > >>>> 3) other contexts like endian and virtio-net features
> > >>>>
> > >>>> So we preserve the performance and decouple the virtio-net stuffs from
> > >>>> general structures like flow_keys or __sk_buff.
> > >>> You are advocating for a separate BPF program that takes a vnet hdr
> > >>> and flow_keys as context and is run separately after flow dissection?
> > >>
> > >> Yes.
> > >>
> > >>
> > >>> I don't understand the benefit of splitting the program in two in this manner.
> > >>
> > >> It decouples a device specific attributes from the general structures
> > >> like flow keys. We have xen-netfront, netvsc and a lot of drivers that
> > >> works for the emulated devices. We could not add all those metadatas as
> > >> the context of flow keys.
> > > What are device-specific attributes here? What kind of metadata?
> >
> >
> > Isn't virtio_net_hdr a virito-net specific metadata?
>
> I don't think it is ever since it was also adopted for tun, tap,
> pf_packet and uml. Basically, all callers of virtio_net_hdr_to_skb.
>
> >
> >
> > >
> > > The only metadata that can be passed with tuntap, pf_packet et al is
> > > virtio_net_hdr.
> > >
> > > I quite don't understand where xen-netfront et al come in.
> >
> >
> > The problem is, what kind of issue you want to solve. If you want to
> > solve virtio specific issue, why do you need to do that in the general
> > flow dissector?
Suppose we determine that it would indeed also be good to add
support for xen-netfront, netvsc validation in this way. Is your
suggestion that these would need to be added all in the same patch
series? Could we not implement just virtio-net first, and add the
others subsequently, if we can demonstrate a feasible plan for
doing so? Thanks
>
> >
> > If you want to solve the general issue of the packet metadata validation
> > from untrusted source, you need validate not only virtio-net but all the
> > others. Netfront is another dodgy source and there're a lot of implied
> > untrusted source in the case of virtualization environment.
>
> Ah understood.
>
> Yes, the goal is to protect the kernel from untrusted packets coming
> from userspace in general. There are only a handful such injection
> APIs.
>
> I have not had to deal with netfront as much as the virtio_net_hdr
> users. I'll take a look at that and netvsc. I cannot recall seeing as
> many syzkaller reports about those, but that may not necessarily imply
> that they are more robust -- it could just be that syzkaller has no
> easy way to exercise them, like with packet sockets.
>
> >
> > >
> > >> That's why I suggest to use something more
> > >> generic like XDP from the start. Yes, GSO stuffs was disabled by
> > >> virtio-net on XDP but it's not something that can not be fixed. If the
> > >> GSO and s/g support can not be done in short time
> > > An alternative interface does not address that we already have this
> > > interface and it is already causing problems.
> >
> >
> > What problems did you meant here?
>
> The long list of syzkaller bugs that required fixes either directly in
> virtio_net_hdr_to_skb or deeper in the stack, e.g.,
>
> 924a9bc362a5 net: check if protocol extracted by
> virtio_net_hdr_set_proto is correct
> 6dd912f82680 net: check untrusted gso_size at kernel entry
> 9274124f023b net: stricter validation of untrusted gso packets
> d2aa125d6290 net: Don't set transport offset to invalid value
> d5be7f632bad net: validate untrusted gso packets without csum offload
> 9d2f67e43b73 net/packet: fix packet drop as of virtio gso
>
> 7c68d1a6b4db net: qdisc_pkt_len_init() should be more robust
> cb9f1b783850 ip: validate header length on virtual device xmit
> 418e897e0716 gso: validate gso_type on ipip style tunnels
> 121d57af308d gso: validate gso_type in GSO handlers
>
> This is not necessarily an exhaustive list.
>
> And others not directly due to gso/csum metadata, but malformed
> packets from userspace nonetheless, such as
>
> 76c0ddd8c3a6 ip6_tunnel: be careful when accessing the inner header
>
> > >
> > >> then a virtio-net
> > >> specific BPF program still looks much better than coupling virtio-net
> > >> metadata into flow keys or other general data structures.
> > >>
> > >>
> > >>> Your previous comment mentions two vnet_hdr definitions that can get
> > >>> out of sync. Do you mean v1 of this patch, that adds the individual
> > >>> fields to bpf_flow_dissector?
> > >>
> > >> No, I meant this part of the patch:
> > >>
> > >>
> > >> +static int check_virtio_net_hdr_access(struct bpf_verifier_env *env,
> > >> int off,
> > >> + int size)
> > >> +{
> > >> + if (size < 0 || off < 0 ||
> > >> + (u64)off + size > sizeof(struct virtio_net_hdr)) {
> > >> + verbose(env, "invalid access to virtio_net_hdr off=%d size=%d\n",
> > >> + off, size);
> > >> + return -EACCES;
> > >> + }
> > >> + return 0;
> > >> +}
> > >> +
> > >>
> > >>
> > >> It prevents the program from accessing e.g num_buffers.
> > > I see, thanks. See my response to your following point.
> > >
> > >>
> > >>> That is no longer the case: the latest
> > >>> version directly access the real struct. As Alexei points out, doing
> > >>> this does not set virtio_net_hdr in stone in the ABI. That is a valid
> > >>> worry. But so this patch series will not restrict how that struct may
> > >>> develop over time. A version field allows a BPF program to parse the
> > >>> different variants of the struct -- in the same manner as other
> > >>> protocol headers.
> > >>
> > >> The format of the virtio-net header depends on the virtio features, any
> > >> reason for another version? The correct way is to provide features in
> > >> the context, in this case you don't event need the endian hint.
> > > That might work. It clearly works for virtio. Not sure how to apply it
> > > to pf_packet or tuntap callers of virtio_net_hdr_to_skb.
> >
> >
> > This requires more thought but it also applies to the version. For
> > tuntap, features could be deduced from the 1) TUN_SET_VET_HDR and 2)
> > TUN_SET_OFFLOADS
> >
> > Note that theatrically features could be provided by the userspace, but
> > version is not (unless it's a part of uAPI but it became a duplication
> > of the features).
> >
> > Actually, this is a strong hint that the conext for packet, tuntap is
> > different with virtio-net (though they are sharing the same (or patial)
> > vnet header structure). E.g tuntap is unaware of mergerable buffers, it
> > just leave the room for vhost-net or qemu to fill that fields.
>
> True. We need to pass the length of the struct along with any
> version/typing information (which is currently only that endianness
> boolean). Also to address the bounds checking issue you raised above.
>
> >
> > >
> > >>> If you prefer, we can add that field from the start.
> > >>> I don't see a benefit to an extra layer of indirection in the form of
> > >>> helper functions.
> > >>>
> > >>> I do see downsides to splitting the program. The goal is to ensure
> > >>> consistency between vnet_hdr and packet payload. A program split
> > >>> limits to checking vnet_hdr against what the flow_keys struct has
> > >>> extracted. That is a great reduction over full packet access.
> > >>
> > >> Full packet access could be still done in bpf flow dissector.
> > >>
> > >>
> > >>> For
> > >>> instance, does the packet contain IP options? No idea.
> > >>
> > >> I don't understand here. You can figure out this in flow dissector, and
> > >> you can extend the flow keys to carry out this information if necessary.
> > > This I disagree with. flow_keys are a distillation/simplification of
> > > the packet contents. It is unlikely to capture every feature of every
> > > packet.
> >
> >
> > It depends on what kind of checks you want to do. When introducing a new
> > API like this, we need to make sure all the fields could be validated
> > instead of limiting it to some specific fields. Not all the fields are
> > related to the flow, that's another point that validating it in the flow
> > dissector is not a good choice.
> >
> > For vnet_header fields that is related to the flow, they should be a
> > subset of the current flow keys otherwise it's a hint a flow keys need
> > to be extended. For vnet_header fields that is not related to the flow,
> > validate it in flow dissector requires a lot of other context (features,
> > and probably the virtqueue size for the num_buffers). And if you want to
> > validate things that is totally unrelated to the vnet header (e.g IP
> > option), it can be done in the flow dissector right now or via XDP.
>
> But correlating this data with the metadata is needlessly more complex
> with two programs vs one. The issue is not that flow keys can capture
> all vnet_hdr fields. It is that it won't capture all relevant packet
> contents.
>
> >
> > > We end up having to extend it for every new case we're
> > > interested in. That is ugly and a lot of busywork. And for what
> > > purpose? The virtio_net_hdr field prefaces the protocol headers in the
> > > same buffer in something like tpacket. Processing the metadata
> > > together with the data is straightforward. I don't see what isolation
> > > or layering that breaks.
> >
> >
> > Well, if you want to process metadata with the data, isn't XDP a much
> > more better place to do that?
>
> XDP or bpf flow dissector: they are both BPF programs. The defining
> point is what context to pass along.
>
> > >
> > >> And if you want to have more capability, XDP which is designed for early
> > >> packet filtering is the right way to go which have even more functions
> > >> that a simple bpf flow dissector.
> > >>
> > >>> If stable ABI is not a concern and there are no different struct
> > >>> definitions that can go out of sync, does that address your main
> > >>> concerns?
> > >>
> > >> I think not. Assuming we provide sufficient contexts (e.g the virtio
> > >> features), problem still: 1) coupling virtio-net with flow_keys
> > > A flow dissection program is allowed to read both contents of struct
> > > virtio_net_hdr and packet contents. virtio_net_hdr is not made part of
> > > struct bpf_flow_keys.
> >
> >
> > It doesn't matter whether or not it's a pointer. And actually you had
> > vhdr_is_little_endian:
> >
> > @@ -6017,6 +6017,8 @@ struct bpf_flow_keys {
> > };
> > __u32 flags;
> > __be32 flow_label;
> > + __bpf_md_ptr(const struct virtio_net_hdr *, vhdr);
> > + __u8 vhdr_is_little_endian;
> > };
> >
> >
> > And if we want to add the support to validate other untrusted sources,
> > do we want:
> >
> > struct bpf_flow_keys {
> > /* general fields */
> > virtio_net_context;
> > xen_netfront_context;
> > netvsc_context;
> > ...
> > };
> >
> > ?
>
> No, I think just a pointer to the metadata context and a
> type/versioning field that tells the program how to interpret it. A
> strict program can drop everything that it does not understand. It
> does not have to be complete.
>
> > > The pointer there is just a way to give access
> > > to multiple data sources through the single bpf program context.
> > >
> > >> 2) can't work for XDP.
> > > This future work (AF_?)XDP based alternative to
> > > pf_packet/tuntap/virtio does not exist yet, so it's hard to fully
> > > prepare for. But any replacement interface will observe the same
> > > issue: when specifying offloads like GSO/csum, that metadata may not
> > > agree with actual packet contents. We have to have a way to validate
> > > that. I could imagine that this XDP program attached to the AF_XDP
> > > interface would do the validation itself?
> >
> >
> > If the XDP can do validation itself, any reason to duplicate the work in
> > the flow dissector?
> >
> >
> > > Is that what you mean?
> >
> >
> > Kind of, it's not specific to AF_XDP, assuming XDP supports GSO/sg. With
> > XDP_REDIRECT/XDP_TX, you still need to validate the vnet header.
>
> The purpose is validation of data coming from untrusted userspace into
> the kernel.
>
> XDP_REDIRECT and XDP_TX forward data within the kernel, so are out of scope.
>
> In the case where they are used along with some ABI through which data
> can be inserted into the kernel, such as AF_XDP, it is relevant. And I
> agree that then the XDP program can do the validation directly.
>
> That just does not address validating of legacy interfaces. Those
> interfaces generally require CAP_NET_RAW to setup, but they are often
> used after setup to accept untrusted contents, so I don't think they
> should be considered "root, so anything goes".
>
> > Thanks
>
> Thanks for the discussion. As said, we'll start by looking at the
> other similar netfront interfaces.
>
> >
> > >
> >
Powered by blists - more mailing lists