[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKgT0Uf6YrDtvEfL02-P7A3Q_V32MWZ-tV7B=xtkY0ZzxEo9yg@mail.gmail.com>
Date: Fri, 3 Sep 2021 09:46:51 -0700
From: Alexander Duyck <alexander.duyck@...il.com>
To: Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc: Netdev <netdev@...r.kernel.org>,
David Miller <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
Ido Schimmel <idosch@...sch.org>,
chouhan.shreyansh630@...il.com
Subject: Re: [PATCH net] ip_gre: validate csum_start only if CHECKSUM_PARTIAL
On Thu, Sep 2, 2021 at 7:41 PM Willem de Bruijn
<willemdebruijn.kernel@...il.com> wrote:
>
> On Thu, Sep 2, 2021 at 10:01 PM Alexander Duyck
> <alexander.duyck@...il.com> wrote:
> >
> > On Thu, Sep 2, 2021 at 2:45 PM Willem de Bruijn
> > <willemdebruijn.kernel@...il.com> wrote:
> > >
> > > On Thu, Sep 2, 2021 at 5:17 PM Alexander Duyck
> > > <alexander.duyck@...il.com> wrote:
> > > >
> > > > On Thu, Sep 2, 2021 at 1:30 PM Willem de Bruijn
> > > > <willemdebruijn.kernel@...il.com> wrote:
> > > > >
> > > > > On Thu, Sep 2, 2021 at 4:25 PM Alexander Duyck
> > > > > <alexander.duyck@...il.com> wrote:
> > > > > >
> > > > > > On Thu, Sep 2, 2021 at 12:38 PM Willem de Bruijn
> > > > > > <willemdebruijn.kernel@...il.com> wrote:
> > > > > > >
> > > > > > > From: Willem de Bruijn <willemb@...gle.com>
> > > > > > >
> > > > > > > Only test integrity of csum_start if checksum offload is configured.
> > > > > > >
> > > > > > > With checksum offload and GRE tunnel checksum, gre_build_header will
> > > > > > > cheaply build the GRE checksum using local checksum offload. This
> > > > > > > depends on inner packet csum offload, and thus that csum_start points
> > > > > > > behind GRE. But validate this condition only with checksum offload.
> > > > > > >
> > > > > > > Link: https://lore.kernel.org/netdev/YS+h%2FtqCJJiQei+W@shredder/
> > > > > > > Fixes: 1d011c4803c7 ("ip_gre: add validation for csum_start")
> > > > > > > Signed-off-by: Willem de Bruijn <willemb@...gle.com>
> > > > > > > ---
> > > > > > > net/ipv4/ip_gre.c | 5 ++++-
> > > > > > > 1 file changed, 4 insertions(+), 1 deletion(-)
> > > > > > >
> > > > > > > diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
> > > > > > > index 177d26d8fb9c..09311992a617 100644
> > > > > > > --- a/net/ipv4/ip_gre.c
> > > > > > > +++ b/net/ipv4/ip_gre.c
> > > > > > > @@ -473,8 +473,11 @@ static void __gre_xmit(struct sk_buff *skb, struct net_device *dev,
> > > > > > >
> > > > > > > static int gre_handle_offloads(struct sk_buff *skb, bool csum)
> > > > > > > {
> > > > > > > - if (csum && skb_checksum_start(skb) < skb->data)
> > > > > > > + /* Local checksum offload requires csum offload of the inner packet */
> > > > > > > + if (csum && skb->ip_summed == CHECKSUM_PARTIAL &&
> > > > > > > + skb_checksum_start(skb) < skb->data)
> > > > > > > return -EINVAL;
> > > > > > > +
> > > > > > > return iptunnel_handle_offloads(skb, csum ? SKB_GSO_GRE_CSUM : SKB_GSO_GRE);
> > > > > > > }
> > > > >
> > > > > Thanks for taking a look.
> > > > >
> > > > > > So a few minor nits.
> > > > > >
> > > > > > First I think we need this for both v4 and v6 since it looks like this
> > > > > > code is reproduced for net/ipv6/ip6_gre.c.
> > > > >
> > > > > I sent a separate patch for v6. Perhaps should have made it a series
> > > > > to make this more clear.
> > > >
> > > > Yeah, that was part of the reason I assumed the ipv6 patch was overlooked.
> > >
> > > I was in two minds only because a series should come with a cover
> > > letter and thus one extra email added to the firehose. But this makes
> > > clear the value. Will just do that in the future.
> > >
> > > > > > Second I don't know if we even need to bother including the "csum"
> > > > > > portion of the lookup since that technically is just telling us if the
> > > > > > GRE tunnel is requesting a checksum or not and I am not sure that
> > > > > > applies to the fact that the inner L4 header is going to be what is
> > > > > > requesting the checksum offload most likely.
> > > > >
> > > > > This test introduced in the original patch specifically protects the
> > > > > GRE tunnel checksum calculation using lco_csum. The regular inner
> > > > > packet path likely is already robust (as similar bug reports would be
> > > > > more likely for that more common case).
> > > >
> > > > I was just thinking in terms of shaving off some extra comparisons. I
> > > > suppose it depends on if this is being inlined or not. If it is being
> > > > inlined there are at least 2 cases where the if statement would be
> > > > dropped since csum is explicitly false. My thought was that by just
> > > > jumping straight to the skb->ip_summed check we can drop the lookup
> > > > for csum since it would be implied by the fact that skb->ip_summed is
> > > > being checked. It would create a broader check, but at the same time
> > > > it would reduce the number of comparisons in the call.
> > >
> > > Most GRE tunnels don't have checksums and csum is likely in a register,
> > > as function argument, so it likely is the cheaper test?
> > >
> > > More functional argument: if !csum, the GRE tunnel does not care about
> > > the integrity of csum_start. So I think that it should not read it at all.
> >
> > The problem as I see it is that it is just passing the problem on.
> > Adding the check to the GRE drivers only really fixes one spot that
> > exposes the issue. In this case it was triggered because the lco_csum
> > was causing issues. What happens when one of these packets makes it
> > down to hardware and it has to deal with the malformed csum_start? I
> > suspect we end up potentially causing issues where Tx metadata could
> > be malformed resulting in a dropped packet at the best case, and
> > kernel panics at the worst.
> >
> > > > > > Also maybe this should be triggering a WARN_ON_ONCE if we hit this as
> > > > > > the path triggering this should be fixed rather than us silently
> > > > > > dropping frames. We should be figuring out what cases are resulting in
> > > > > > us getting CHECKSUM_PARTIAL without skb_checksum_start being set.
> > > > >
> > > > > We already know that bad packets can enter the kernel and trigger
> > > > > this, using packet sockets and virtio_net_hdr. Unfortunately, this
> > > > > *is* the fix.
> > > >
> > > > It sounds almost like we need a CHECKSUM_DODGY to go along with the
> > > > SKB_GSO_DODGY in order to resolve these kinds of issues.
> > > >
> > > > So essentially we have a source that we know can give us bad packets.
> > > > We really should be performing some sort of validation on these much
> > > > earlier in order to clean them up so that we don't have to add this
> > > > sort of exception handling code all over the Tx path.
> > >
> > > Agreed with the concern. I've been arguing for validation at kernel
> > > entry of virtio_net_hdr. As an optional feature, if nothing else:
> > > https://patchwork.kernel.org/project/netdevbpf/patch/20210616203448.995314-3-tannerlove.kernel@gmail.com/
> > >
> > > But unless we accept the cost of full parsing to identify the
> > > transport headers, we cannot predict at that stage whether the field
> > > is bogus, let alone whether it might trigger a bug later on. We do
> > > basic validation: csum_start is verified to be within the skb linear,
> > > so not totally out of bounds.
> > >
> > > That the offset is not just bogus, but causes a bug appears to be a
> > > rare exception peculiar to the GRE tunnel. Only it pulls the outer
> > > header (in ipgre_xmit), applies lco_csum and can so trigger negative
> > > overflow, as far as I could tell. That's why we decided to add the
> > > limited check local to that code.
> > >
> > > I'm not sure how we would use CHECKSUM_DODGY in practice.
> >
> > The issue is drivers with NETIF_F_HW_CSUM would be expecting a
> > reasonable csum_start and csum_offset. If the hardware is only
> > advertising that and we are expecting it to offload the checksum we
> > should probably be doing some sort of validation on user derived
> > inputs to make sure that they aren't totally ridiculous as is the case
> > here where the original issue was that the csum_start wasn't even in
> > the packet data.
> >
> > Would it maybe make sense to look at reverting the earlier fixes and
> > instead updating skb_partial_csum_set so that we cannot write a
> > csum_start value that is less than the start of skb->data? That way we
> > are addressing this at the source instead of allowing the garbage data
> > to propagate further down the stack and having to drop it at the
> > driver level which is going to have us playing whack a mole trying to
> > fix it where it pops up. It seems like we are already validating the
> > upper bounds for these values in that function so why not validate the
> > lower bounds as well?
>
> skb_partial_csum_set verifies that csum_start is within bounds
> at the time it is called. The offset passed from userspace is
> an unsigned int added to skb_headroom(skb), so >= skb->data.
>
> The issue is with ipgre_xmit calling skb_pull. Only then does it
> become out of bounds. From what I saw, pulling headers like
> that is very rare in the transmit path. Indeed, I did not see it in
> any other tunnels. We could instrument each of these directly,
> but at non-trivial cost.
So there are some positives and negatives to addressing this in
ipgre_xmit. Specifically if we address it before the pull we could do
some other quick checks to verify the offset. If the offset and start
are both in the region to be pulled we could just drop the offload,
whereas if the offset is stored somewhere in the unstripped data we
could then drop the packet and count it as a drop without having to
modify the frame via the skb_pull.
> The negative underflow and kernel crash is very specific to
> lco_csum, which calculates the offset between csum_offset
> and transport_offset. Standard checksum offset doesn't.
>
> One alternative fix, then, would be to add a negative overflow
> check in lco_csum itself.
> That only has three callers and unfortunately by the time we hit
> that for GRE in gre_build_header, there no longer is a path to
> cleanly dropping the packet and returning an error.
Right. We could find the problem there but it doesn't solve the issue.
The problem is the expectation that the offset exists in the area
after the checksum for the tunnel header, not before.
> I don't find this bad input whack-a-mole elegant either. But I
> thought it was the least bad option..
The part that has been confusing me is how the virtio_net_hdr could
have been pointing as the IP or GRE headers since they shouldn't have
been present on the frame provided by the userspace. I think I am
starting to see now.
So in the case of af_packet it looks like it is calling
dev_hard_header which calls header_ops->create before doing the
virtio_net_hdr_to_skb call. Do I have that right? Maybe for those
cases we need to look at adding an unsigned int argument to
virtio_net_hdr_to_skb in which we could pass 0 for the unused case or
dev->hard_header_len in the cases where we have something like
af_packet that is transmitting over an ipgre tunnel. The general idea
is to prevent these virtio_net_hdr_to_skb calls from pointing the
csum_start into headers that userspace was not responsible for
populating.
Powered by blists - more mailing lists