[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b1b95a71-a4e8-288c-7731-811ad548d641@blackhole.kfki.hu>
Date: Fri, 22 Mar 2024 21:16:08 +0100 (CET)
From: Jozsef Kadlecsik <kadlec@...ckhole.kfki.hu>
To: Jason Xing <kerneljasonxing@...il.com>
cc: Pablo Neira Ayuso <pablo@...filter.org>, edumazet@...gle.com,
Florian Westphal <fw@...len.de>, kuba@...nel.org, pabeni@...hat.com,
David Miller <davem@...emloft.net>, netfilter-devel@...r.kernel.org,
coreteam@...filter.org, netdev@...r.kernel.org,
Jason Xing <kernelxing@...cent.com>
Subject: Re: [PATCH nf-next v2] netfilter: conntrack: avoid sending RST to
reply out-of-window skb
On Fri, 22 Mar 2024, Jason Xing wrote:
> On Fri, Mar 22, 2024 at 6:50 PM Jozsef Kadlecsik
> <kadlec@...ckhole.kfki.hu> wrote:
> >
> > Hi,
> >
> > On Thu, 21 Mar 2024, Pablo Neira Ayuso wrote:
> >
> > > On Mon, Mar 11, 2024 at 03:05:50PM +0800, Jason Xing wrote:
> > > > From: Jason Xing <kernelxing@...cent.com>
> > > >
> > > > Supposing we set DNAT policy converting a_port to b_port on the
> > > > server at the beginning, the socket is set up by using 4-tuple:
> > > >
> > > > client_ip:client_port <--> server_ip:b_port
> > > >
> > > > Then, some strange skbs from client or gateway, say, out-of-window
> > > > skbs are eventually sent to the server_ip:a_port (not b_port)
> > > > in TCP layer due to netfilter clearing skb->_nfct value in
> > > > nf_conntrack_in() function. Why? Because the tcp_in_window()
> > > > considers the incoming skb as an invalid skb by returning
> > > > NFCT_TCP_INVALID.
> > > >
> > > > At last, the TCP layer process the out-of-window
> > > > skb (client_ip,client_port,server_ip,a_port) and try to look up
> > > > such an socket in tcp_v4_rcv(), as we can see, it will fail for sure
> > > > because the port is a_port not our expected b_port and then send
> > > > back an RST to the client.
> > > >
> > > > The detailed call graphs go like this:
> > > > 1)
> > > > nf_conntrack_in()
> > > > -> nf_conntrack_handle_packet()
> > > > -> nf_conntrack_tcp_packet()
> > > > -> tcp_in_window() // tests if the skb is out-of-window
> > > > -> return -NF_ACCEPT;
> > > > -> skb->_nfct = 0; // if the above line returns a negative value
> > > > 2)
> > > > tcp_v4_rcv()
> > > > -> __inet_lookup_skb() // fails, then jump to no_tcp_socket
> > > > -> tcp_v4_send_reset()
> > > >
> > > > The moment the client receives the RST, it will drop. So the RST
> > > > skb doesn't hurt the client (maybe hurt some gateway which cancels
> > > > the session when filtering the RST without validating
> > > > the sequence because of performance reason). Well, it doesn't
> > > > matter. However, we can see many strange RST in flight.
> > > >
> > > > The key reason why I wrote this patch is that I don't think
> > > > the behaviour is expected because the RFC 793 defines this
> > > > case:
> > > >
> > > > "If the connection is in a synchronized state (ESTABLISHED,
> > > > FIN-WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, LAST-ACK, TIME-WAIT),
> > > > any unacceptable segment (out of window sequence number or
> > > > unacceptible acknowledgment number) must elicit only an empty
> > > > acknowledgment segment containing the current send-sequence number
> > > > and an acknowledgment..."
> > > >
> > > > I think, even we have set DNAT policy, it would be better if the
> > > > whole process/behaviour adheres to the original TCP behaviour as
> > > > default.
> > > >
> > > > Suggested-by: Florian Westphal <fw@...len.de>
> > > > Signed-off-by: Jason Xing <kernelxing@...cent.com>
> > > > ---
> > > > v2
> > > > Link: https://lore.kernel.org/netdev/20240307090732.56708-1-kerneljasonxing@gmail.com/
> > > > 1. add one more test about NAT and then drop the skb (Florian)
> > > > ---
> > > > net/netfilter/nf_conntrack_proto_tcp.c | 15 +++++++++++++--
> > > > 1 file changed, 13 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
> > > > index ae493599a3ef..19ddac526ea0 100644
> > > > --- a/net/netfilter/nf_conntrack_proto_tcp.c
> > > > +++ b/net/netfilter/nf_conntrack_proto_tcp.c
> > > > @@ -1256,10 +1256,21 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct,
> > > > case NFCT_TCP_IGNORE:
> > > > spin_unlock_bh(&ct->lock);
> > > > return NF_ACCEPT;
> > > > - case NFCT_TCP_INVALID:
> > > > + case NFCT_TCP_INVALID: {
> > > > + int verdict = -NF_ACCEPT;
> > > > +
> > > > + if (ct->status & IPS_NAT_MASK)
> > > > + /* If DNAT is enabled and netfilter receives
> > > > + * out-of-window skbs, we should drop it directly,
> > >
> > > Yes, if _be_liberal toggle is disabled this can happen.
> > >
> > > > + * or else skb would miss NAT transformation and
> > > > + * trigger corresponding RST sending to the flow
> > > > + * in TCP layer, which is not supposed to happen.
> > > > + */
> > > > + verdict = NF_DROP;
> > >
> > > One comment for the SNAT case.
> > >
> > > nf_conntrack_in() calls this function from the prerouting hook. For
> > > the very first packet, IPS_NAT_MASK might not be yet fully set on
> > > (masquerade/snat happens in postrouting), then still one packet can be
> > > leaked without NAT mangling in the SNAT case.
> > >
> > > Rulesets should really need to set default policy to drop in NAT
> > > chains to address this.
> > >
> > > And after this update, user has no chance anymore to bump counters at
> > > the end of the policy, to debug issues.
> > >
> > > We have relied on the rule that "conntrack should not drop packets"
> > > since the very beginning, instead signal rulesets that something is
> > > invalid, so user decides what to do.
> > >
> > > I'm ambivalent about this, Jozsef?
> >
> > [I'm putting on my sysadmin hat.]
> >
> > My personal opinion is that silently dropping packets does not make
> > sysadmin's life easier at all. On the contrary, it makes hunting down
> > problems harder and more challenging: you have got no indication
> > whatsoever why the given packets were dropped.
> >
> > The proper solution to the problem is to (log and) drop INVALID packets.
> > That is neither a knob nor a workaround: conntrack cannot handle the
> > packets and should only signal it to the rule stack.
> >
> > Actually, the few cases where conntrack itself drops (directly causes it)
> > packets should be eliminated and not more added.
> >
> > Do not blind sysadmins by silently dropping packets.
>
> Thanks for the comment.
>
> Though I'm not totally convinced, I can live with it because it seems
> there are no other good ways to solve it perfectly, meanwhile
> diminishing my confusion (like resorting to more complex
> configurations).
>
> >
> > Jason, the RST packets which triggered you to write your patch are not
> > cause but effect. The cause is the INVALID packets.
>
> You could say that.
>
> I spent a lot of time tracing down to this area and finally found the
> out-of-window causing the problem, so I ignorantly think other admins
> also may not know about this :(
>
> Anyway, thanks to both of you for so much patience and help :)
I understand and appreciate your efforts. But please consider the case
when one have to diagnose a failing connection and conntrack drops
packets. What should be suspected? Firewall rules? One can enable TRACE
and check which rules are hit - but because conntrack drops packet,
nothing is shown there. Enable and check conntrack events? Because the
packets are INVALID, checking the events does not help either. Only when
one runs tcpdump and compares it with the TRACE/NFLOG/LOG entries can one
spot that some packets "disappeared".
Compare the whole thing with the case when packets are not dropped
silently but can be logged via checking the INVALID flag. One can directly
tell that conntrack could not handle the packets and can see all packet
parameters.
Best regards,
Jozsef
> > > > nf_tcp_handle_invalid(ct, dir, index, skb, state);
> > > > spin_unlock_bh(&ct->lock);
> > > > - return -NF_ACCEPT;
> > > > + return verdict;
> > > > + }
> > > > case NFCT_TCP_ACCEPT:
> > > > break;
> > > > }
> > > > --
> > > > 2.37.3
> > > >
> > >
> >
> > --
> > E-mail : kadlec@...ckhole.kfki.hu, kadlecsik.jozsef@...ner.hu
> > PGP key : https://wigner.hu/~kadlec/pgp_public_key.txt
> > Address : Wigner Research Centre for Physics
> > H-1525 Budapest 114, POB. 49, Hungary
>
--
E-mail : kadlec@...ckhole.kfki.hu, kadlecsik.jozsef@...ner.hu
PGP key : https://wigner.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics
H-1525 Budapest 114, POB. 49, Hungary
Powered by blists - more mailing lists