[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-id: <171375748540.7600.5672163982570379489@noble.neil.brown.name>
Date: Mon, 22 Apr 2024 13:44:45 +1000
From: "NeilBrown" <neilb@...e.de>
To: "Lex Siegel" <usiegl00@...il.com>
Cc: "Chuck Lever" <chuck.lever@...cle.com>, "Jeff Layton" <jlayton@...nel.org>,
 "Olga Kornievskaia" <kolga@...app.com>, "Dai Ngo" <Dai.Ngo@...cle.com>,
 "Tom Talpey" <tom@...pey.com>,
 "Trond Myklebust" <trond.myklebust@...merspace.com>,
 "Anna Schumaker" <anna@...nel.org>, "David S. Miller" <davem@...emloft.net>,
 "Eric Dumazet" <edumazet@...gle.com>, "Jakub Kicinski" <kuba@...nel.org>,
 "Paolo Abeni" <pabeni@...hat.com>, linux-nfs@...r.kernel.org,
 netdev@...r.kernel.org, linux-kernel@...r.kernel.org, bpf@...r.kernel.org
Subject: Re: [PATCH] xprtsock: Fix a loop in xs_tcp_setup_socket()
On Mon, 22 Apr 2024, Lex Siegel wrote:
> > Better still would be for kernel_connect() to return a more normal error
> > code - not EPERM.  If that cannot be achieved, then I think it would be
> > best for the sunrpc code to map EPERM to something else at the place
> > where kernel_connect() is called - catch it early.
> 
> The question is whether a permission error, EPERM, should cause a retry or
> return. Currently xs_tcp_setup_socket() is retrying. For the retry to clear,
> the connect call will have to not return a permission error to halt the retry
> attempts.
> 
> This is a default behavior because EPERM is not an explicit case of the switch
> statement. Because bpf appropriately uses EPERM to show that the kernel_connect
> was not permitted, it highlights the return handling for this case is missing.
> It is unlikely that retry was ever the intended result.
> 
> Upstream, the bpf that caused this is at:
> https://github.com/cilium/cilium/blob/v1.15/bpf/bpf_sock.c#L336
> 
> This cilium bpf code has two return statuses, EPERM and ENXIO, that fall
> through to the default case of retrying. Here, cilium expects both of these
> statuses to indicate the connect failed. A retry is not the intended result.
> 
> Handling this case without a retry aligns this code with the udp behavior. This
> precedence for passing EPERM back up the stack was set in 3dedbb5ca10ef.
> 
> I will amend my patch to include an explicit case for ENXIO as well, as this is
> also in cilium's bpf and will cause the same bug to occur.
> 
I think it should be up to cilium to report an errno that the kernel
understands, not up to the kernel to understand whatever errno cilium
chooses to return.
I don't think EPERM or ENXIO are appropriate errors for network
problems.
EHOSTUNREACH or ECONNREFUSED would make much more sense.
NeilBrown
> 
> On Mon, Apr 22, 2024 at 8:22 AM NeilBrown <neilb@...e.de> wrote:
> >
> > On Sat, 20 Apr 2024, Lex Siegel wrote:
> > > When using a bpf on kernel_connect(), the call can return -EPERM.
> > > This causes xs_tcp_setup_socket() to loop forever, filling up the
> > > syslog and causing the kernel to freeze up.
> > >
> > > Signed-off-by: Lex Siegel <usiegl00@...il.com>
> > > ---
> > >  net/sunrpc/xprtsock.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > >
> > > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> > > index bb9b747d58a1..47b254806a08 100644
> > > --- a/net/sunrpc/xprtsock.c
> > > +++ b/net/sunrpc/xprtsock.c
> > > @@ -2446,6 +2446,8 @@ static void xs_tcp_setup_socket(struct work_struct *work)
> > >               /* Happens, for instance, if the user specified a link
> > >                * local IPv6 address without a scope-id.
> > >                */
> > > +     case -EPERM:
> > > +             /* Happens, for instance, if a bpf is preventing the connect */
> >
> > This will propagate -EPERM up into other layers which might not be ready
> > to handle it.
> > It might be safer to map EPERM to an error we would be more likely to
> > expect  from the network system - such as ECONNREFUSED or ENETDOWN.
> >
> > Better still would be for kernel_connect() to return a more normal error
> > code - not EPERM.  If that cannot be achieved, then I think it would be
> > best for the sunrpc code to map EPERM to something else at the place
> > where kernel_connect() is called - catch it early.
> >
> > NeilBrown
> >
> >
> > >       case -ECONNREFUSED:
> > >       case -ECONNRESET:
> > >       case -ENETDOWN:
> > > --
> > > 2.39.3
> > >
> > >
> >
> 
Powered by blists - more mailing lists
 
