netdev - Re: [PATCH net-next] netfilter: conntrack: avoid sending RST to reply out-of-window skb

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAL+tcoDo3=uLFjxnnKj6LnE+DBHb8UybXK1_HQHTtZVr-tBYUw@mail.gmail.com>
Date: Fri, 8 Mar 2024 16:59:38 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Florian Westphal <fw@...len.de>
Cc: edumazet@...gle.com, pablo@...filter.org, kadlec@...filter.org, 
	kuba@...nel.org, pabeni@...hat.com, davem@...emloft.net, 
	netfilter-devel@...r.kernel.org, coreteam@...filter.org, 
	netdev@...r.kernel.org, Jason Xing <kernelxing@...cent.com>
Subject: Re: [PATCH net-next] netfilter: conntrack: avoid sending RST to reply
 out-of-window skb

On Thu, Mar 7, 2024 at 11:11 PM Jason Xing <kerneljasonxing@...il.com> wrote:
>
> On Thu, Mar 7, 2024 at 10:10 PM Florian Westphal <fw@...len.de> wrote:
> >
> > Jason Xing <kerneljasonxing@...il.com> wrote:
> > > On Thu, Mar 7, 2024 at 8:00 PM Florian Westphal <fw@...len.de> wrote:
> > > >
> > > > Jason Xing <kerneljasonxing@...il.com> wrote:
> > > > > > This change disables most of the tcp_in_window() test, this will
> > > > > > pretend everything is fine even though tcp_in_window says otherwise.
> > > > >
> > > > > Thanks for the information. It does make sense.
> > > > >
> > > > > What I've done is quite similar to nf_conntrack_tcp_be_liberal sysctl
> > > > > knob which you also pointed out. It also pretends to ignore those
> > > > > out-of-window skbs.
> > > > >
> > > > > >
> > > > > > You could:
> > > > > >  - drop invalid tcp packets in input hook
> > > > >
> > > > > How about changing the return value only as below? Only two cases will
> > > > > be handled:
> > > > >
> > > > > diff --git a/net/netfilter/nf_conntrack_proto_tcp.c
> > > > > b/net/netfilter/nf_conntrack_proto_tcp.c
> > > > > index ae493599a3ef..c88ce4cd041e 100644
> > > > > --- a/net/netfilter/nf_conntrack_proto_tcp.c
> > > > > +++ b/net/netfilter/nf_conntrack_proto_tcp.c
> > > > > @@ -1259,7 +1259,7 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct,
> > > > >         case NFCT_TCP_INVALID:
> > > > >                 nf_tcp_handle_invalid(ct, dir, index, skb, state);
> > > > >                 spin_unlock_bh(&ct->lock);
> > > > > -               return -NF_ACCEPT;
> > > > > +               return -NF_DROP;
> > > >
> > > > Lets not do this.  conntrack should never drop packets and defer to ruleset
> > > > whereever possible.
> > >
> > > Hmm, sorry, it is against my understanding.
> > >
> > > If we cannot return -NF_DROP, why have we already added some 'return
> > > NF_DROP' in the nf_conntrack_handle_packet() function? And why does
> > > this test statement exist?
> >
> > Sure we can drop.  But we should only do it if there is no better
> > alternative.
> >
> > > nf_conntrack_in()
> > >   -> nf_conntrack_handle_packet()
> > >   -> if (ret <= 0) {
> > >          if (ret == -NF_DROP) NF_CT_STAT_INC_ATOMIC(state->net, drop);
> >
> > AFAICS this only happens when we receive syn for an existing conntrack
> > that is being removed already so we'd expect next syn to create a new
>
> Sorry, I've double-checked this part and found out there is no chance
> to return '-NF_DROP' for nf_conntrack_handle_packet(). It might return
> 'NF_DROP' (see link [1]) instead. The if-else statements seem like
> dead code.

My bad. My mind went blank last night, maybe it's too late. Well, It's
not dead code because '-NF_DROP' which is still zero makes me
misunderstand :(

I'll post a patch to change 'if (ret == -NF_DROP)' to 'if (ret ==
NF_DROP)' just for easy reading.

Thanks,
Jason

>
> [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/netfilter/nf_conntrack_proto_tcp.c#:~:text=%2DNF_REPEAT%3B-,return%20NF_DROP%3B,-%7D%0A%09%09fallthrough%3B
>
> > connection.  Feel free to send patches that replace drop with -accept
> > where possible/where it makes sense, but I don't think the
> > TCP_CONNTRACK_SYN_SENT one can reasonably be avoided.
>
> Oh, are you suggesting replacing NF_DROP with -NF_ACCEPT in
> nf_conntrack_dccp_packet()?
>
> There are three points where nf_conntrack_handle_packet() returns NF_DROP:
> 1) one (syn_sent case) exists in nf_conntrack_tcp_packet(). As you
> said, it's not necessary to change.
> 2) another two exist in nf_conntrack_dccp_packet() which should be the
> same as nf_conntrack_tcp_packet() handles.
>
> The patch goes like this:
> diff --git a/net/netfilter/nf_conntrack_proto_dccp.c
> b/net/netfilter/nf_conntrack_proto_dccp.c
> index e2db1f4ec2df..ebc4f733bb2e 100644
> --- a/net/netfilter/nf_conntrack_proto_dccp.c
> +++ b/net/netfilter/nf_conntrack_proto_dccp.c
> @@ -525,7 +525,7 @@ int nf_conntrack_dccp_packet(struct nf_conn *ct,
> struct sk_buff *skb,
>
>         dh = skb_header_pointer(skb, dataoff, sizeof(*dh), &_dh.dh);
>         if (!dh)
> -               return NF_DROP;
> +               return -NF_ACCEPT;
>
>         if (dccp_error(dh, skb, dataoff, state))
>                 return -NF_ACCEPT;
> @@ -533,7 +533,7 @@ int nf_conntrack_dccp_packet(struct nf_conn *ct,
> struct sk_buff *skb,
>         /* pull again, including possible 48 bit sequences and subtype header */
>         dh = dccp_header_pointer(skb, dataoff, dh, &_dh);
>         if (!dh)
> -               return NF_DROP;
> +               return -NF_ACCEPT;
>
>         type = dh->dccph_type;
>         if (!nf_ct_is_confirmed(ct) && !dccp_new(ct, skb, dh, state))
>
> >
> > > My only purpose is not to let the TCP layer sending strange RST to the
> > > right flow.
> >
> > AFAIU tcp layer is correct, no?  Out of the blue packet to some listener
> > socket?
>
> Allow me to finish the full sentence: my only purpose is not to let
> the TCP layer send strange RST to the _established_ socket due to
> receiving strange out-of-window skbs.
>
> >
> > > Besides, resorting to turning on nf_conntrack_tcp_be_liberal sysctl
> > > knob seems odd to me though it can workaround :S
> >
> > I don't see a better alternative, other than -p tcp -m conntrack
> > --ctstate INVALID -j DROP rule, if you wish for tcp stack to not see
> > such packets.
> >
> > > I would like to prevent sending such an RST as default behaviour.
> >
> > I don't see a way to make this work out of the box, without possible
> > unwanted side effects.
> >
> > MAYBE we could drop IFF we check that the conntrack entry candidate
> > that fails sequence validation has NAT translation applied to it, and
> > thus the '-NF_ACCEPT' packet won't be translated.
> >
> > Not even compile tested:
> >
> > diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
> > --- a/net/netfilter/nf_conntrack_proto_tcp.c
> > +++ b/net/netfilter/nf_conntrack_proto_tcp.c
> > @@ -1256,10 +1256,14 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct,
> >         case NFCT_TCP_IGNORE:
> >                 spin_unlock_bh(&ct->lock);
> >                 return NF_ACCEPT;
> > -       case NFCT_TCP_INVALID:
> > +       case NFCT_TCP_INVALID: {
> > +               verdict = -NF_ACCEPT;
> > +               if (ct->status & IPS_NAT_MASK)
> > +                       res = NF_DROP; /* skb would miss nat transformation */
>
> Above line, I guess, should be 'verdict = NF_DROP'? Then this skb
> would be dropped in nf_hook_slow() eventually and would not be passed
> to the TCP layer.
>
> >                 nf_tcp_handle_invalid(ct, dir, index, skb, state);
> >                 spin_unlock_bh(&ct->lock);
> > -               return -NF_ACCEPT;
> > +               return verdict;
> > +       }
> >         case NFCT_TCP_ACCEPT:
> >                 break;
> >         }
>
> Great! I think your draft patch makes sense really, which takes NAT
> into consideration.
>
> >
> > But I don't really see the advantage compared to doing drop decision in
> > iptables/nftables ruleset.
>
> From our views, especially to kernel developers, you're right: we
> could easily turn on that knob or add a drop policy to prevent it
> happening. Actually I did this in production to prevent such a case.
> It surely works.
>
> But from the views of normal users and those who do not understand how
> it works in the kernel, it looks strange: people may ask why we get
> some unknown RSTs in flight?
>
> >
> > I also have a hunch that someone will eventually complain about this
> > change in behavior.
>
> Well, I still think the patch you suggested is proper and don't know
> why people could complain about it.
>
> Thanks for your patience :)
>
> Thanks,
> Jason