lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bc3ec02d-4d4e-477a-b8a5-5245425326c6@kadam.mountain>
Date:   Wed, 26 Jul 2023 18:01:00 +0300
From:   Dan Carpenter <dan.carpenter@...aro.org>
To:     Yan Zhai <yan@...udflare.com>
Cc:     bpf@...r.kernel.org, Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Andrii Nakryiko <andrii@...nel.org>,
        Martin KaFai Lau <martin.lau@...ux.dev>,
        Song Liu <song@...nel.org>, Yonghong Song <yhs@...com>,
        John Fastabend <john.fastabend@...il.com>,
        KP Singh <kpsingh@...nel.org>,
        Stanislav Fomichev <sdf@...gle.com>,
        Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>,
        "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        Mykola Lysenko <mykolal@...com>, Shuah Khan <shuah@...nel.org>,
        linux-kernel@...r.kernel.org, netdev@...r.kernel.org,
        linux-kselftest@...r.kernel.org, kernel-team@...udflare.com,
        Jordan Griege <jgriege@...udflare.com>,
        Markus Elfring <Markus.Elfring@....de>,
        Jakub Sitnicki <jakub@...udflare.com>
Subject: Re: [PATCH v4 bpf 1/2] bpf: fix skb_do_redirect return values

On Wed, Jul 26, 2023 at 07:14:56AM -0700, Yan Zhai wrote:
> On Wed, Jul 26, 2023 at 04:39:08PM +0300, Dan Carpenter wrote:
> > I'm not positive I understand the code in ip_finish_output2().  I think
> > instead of looking for LWTUNNEL_XMIT_DONE it should instead look for
> > != LWTUNNEL_XMIT_CONTINUE.  It's unfortunate that NET_XMIT_DROP and
> > LWTUNNEL_XMIT_CONTINUE are the both 0x1.  Why don't we just change that
> > instead?
> > 
> I considered about changing lwt side logic. But it would bring larger
> impact since there are multiple types of encaps on this hook, not just
> bpf redirect. Changing bpf return values is a minimum change on the
> other hand. In addition, returning value of NET_RX_DROP and
> NET_XMIT_CN are the same, so if we don't do something in bpf redirect,
> there is no way to distinguish them later: the former is considered as
> an error, while "CN" is considered as non-error.

Uh, NET_RX/XMIT_DROP values are 1.  NET_XMIT_CN is 2.

I'm not an expert but I think what happens is that we treat NET_XMIT_CN
as success so that it takes a while for the resend to happen.
Eventually the TCP layer will detect it as a dropped packet.

> 
> > Also there seems to be a leak in lwtunnel_xmit().  Should that return
> > LWTUNNEL_XMIT_CONTINUE or should it call kfree_skb() before returning?
> > 
> > Something like the following?
> > 
> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index 11652e464f5d..375790b672bc 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> > @@ -112,6 +112,9 @@ void netdev_sw_irq_coalesce_default_on(struct net_device *dev);
> >  #define NET_XMIT_CN		0x02	/* congestion notification	*/
> >  #define NET_XMIT_MASK		0x0f	/* qdisc flags in net/sch_generic.h */
> >  
> > +#define LWTUNNEL_XMIT_DONE NET_XMIT_SUCCESS
> > +#define LWTUNNEL_XMIT_CONTINUE 0x3
> > +
> >  /* NET_XMIT_CN is special. It does not guarantee that this packet is lost. It
> >   * indicates that the device will soon be dropping packets, or already drops
> >   * some packets of the same priority; prompting us to send less aggressively. */
> > diff --git a/include/net/lwtunnel.h b/include/net/lwtunnel.h
> > index 6f15e6fa154e..8ab032ee04d0 100644
> > --- a/include/net/lwtunnel.h
> > +++ b/include/net/lwtunnel.h
> > @@ -16,12 +16,6 @@
> >  #define LWTUNNEL_STATE_INPUT_REDIRECT	BIT(1)
> >  #define LWTUNNEL_STATE_XMIT_REDIRECT	BIT(2)
> >  
> > -enum {
> > -	LWTUNNEL_XMIT_DONE,
> > -	LWTUNNEL_XMIT_CONTINUE,
> > -};
> > -
> > -
> >  struct lwtunnel_state {
> >  	__u16		type;
> >  	__u16		flags;
> > diff --git a/net/core/lwtunnel.c b/net/core/lwtunnel.c
> > index 711cd3b4347a..732415d1287d 100644
> > --- a/net/core/lwtunnel.c
> > +++ b/net/core/lwtunnel.c
> > @@ -371,7 +371,7 @@ int lwtunnel_xmit(struct sk_buff *skb)
> >  
> >  	if (lwtstate->type == LWTUNNEL_ENCAP_NONE ||
> >  	    lwtstate->type > LWTUNNEL_ENCAP_MAX)
> > -		return 0;
> > +		return LWTUNNEL_XMIT_CONTINUE;
> 
> You are correct this path would leak skb. Return continue (or drop)
> would avoid the leak. Personally I'd prefer drop instead to signal the
> error setup. Since this is a separate issue, do you want to send a
> separate patch on this? Or I am happy to do it if you prefer.
> 

I don't know which makes sense so I'll leave that up to you.

> >  
> >  	ret = -EOPNOTSUPP;
> >  	rcu_read_lock();
> > diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> > index 6e70839257f7..4be50a211b14 100644
> > --- a/net/ipv4/ip_output.c
> > +++ b/net/ipv4/ip_output.c
> > @@ -216,7 +216,7 @@ static int ip_finish_output2(struct net *net, struct sock *sk, struct sk_buff *s
> >  	if (lwtunnel_xmit_redirect(dst->lwtstate)) {
> >  		int res = lwtunnel_xmit(skb);
> >  
> > -		if (res < 0 || res == LWTUNNEL_XMIT_DONE)
> > +		if (res != LWTUNNEL_XMIT_CONTINUE)
> >  			return res;
> 
> Unfortunately we cannot return res directly here when res > 0. This is
> the final reason why I didn't patch here. Return values here can be
> propagated back to sendmsg syscall, so returning a positive value
> would break the syscall convention.

The neigh_output() function is going to return NET_XMIT_DROP so this
already happens.  Is that not what we want to happen?

I guess my concern is that eventually people will eventually new
introduce bugs.  Fixing incorrect error codes is something that I do
several times per week.  :P

regards,
dan carpenter


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ