netdev - Re: [PATCH net v2] netfilter: nf_flow_table: fix teardown flow timeout

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20220516122300.6gwrlmun4w3ynz7s@SvensMacbookPro.hq.voleatech.com>
Date:   Mon, 16 May 2022 14:23:00 +0200
From:   Sven Auhagen <sven.auhagen@...eatech.de>
To:     Pablo Neira Ayuso <pablo@...filter.org>
Cc:     Oz Shlomo <ozsh@...dia.com>, Felix Fietkau <nbd@....name>,
        netdev@...r.kernel.org, netfilter-devel@...r.kernel.org,
        Florian Westphal <fw@...len.de>, Paul Blakey <paulb@...dia.com>
Subject: Re: [PATCH net v2] netfilter: nf_flow_table: fix teardown flow
 timeout

On Mon, May 16, 2022 at 02:13:03PM +0200, Pablo Neira Ayuso wrote:
> On Mon, May 16, 2022 at 12:56:41PM +0200, Pablo Neira Ayuso wrote:
> > On Thu, May 12, 2022 at 09:28:03PM +0300, Oz Shlomo wrote:
> > > Connections leaving the established state (due to RST / FIN TCP packets)
> > > set the flow table teardown flag. The packet path continues to set lower
> > > timeout value as per the new TCP state but the offload flag remains set.
> > >
> > > Hence, the conntrack garbage collector may race to undo the timeout
> > > adjustment of the packet path, leaving the conntrack entry in place with
> > > the internal offload timeout (one day).
> > >
> > > Avoid ct gc timeout overwrite by flagging teared down flowtable
> > > connections.
> > >
> > > On the nftables side we only need to allow established TCP connections to
> > > create a flow offload entry. Since we can not guaruantee that
> > > flow_offload_teardown is called by a TCP FIN packet we also need to make
> > > sure that flow_offload_fixup_ct is also called in flow_offload_del
> > > and only fixes up established TCP connections.
> > [...]
> > > diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> > > index 0164e5f522e8..324fdb62c08b 100644
> > > --- a/net/netfilter/nf_conntrack_core.c
> > > +++ b/net/netfilter/nf_conntrack_core.c
> > > @@ -1477,7 +1477,8 @@ static void gc_worker(struct work_struct *work)
> > >  			tmp = nf_ct_tuplehash_to_ctrack(h);
> > >  
> > >  			if (test_bit(IPS_OFFLOAD_BIT, &tmp->status)) {
> > > -				nf_ct_offload_timeout(tmp);
> > 
> > Hm, it is the trick to avoid checking for IPS_OFFLOAD from the packet
> > path that triggers the race, ie. nf_ct_is_expired()
> > 
> > The flowtable ct fixup races with conntrack gc collector.
> > 
> > Clearing IPS_OFFLOAD might result in offloading the entry again for
> > the closing packets.
> > 
> > Probably clear IPS_OFFLOAD from teardown, and skip offload if flow is
> > in a TCP state that represent closure?
> > 
> >   		if (unlikely(!tcph || tcph->fin || tcph->rst))
> >   			goto out;
> > 
> > this is already the intention in the existing code.
> 
> I'm attaching an incomplete sketch patch. My goal is to avoid the
> extra IPS_ bit.

You might create a race with ct gc that will remove the ct
if it is in close or end of close and before flow offload teardown is running
so flow offload teardown might access memory that was freed.
It is not a very likely scenario but never the less it might happen now
since the IPS_OFFLOAD_BIT is not set and the state might just time out.

If someone sets a very small TCP CLOSE timeout it gets more likely.

So Oz and myself were debatting about three possible cases/problems:

1. ct gc sets timeout even though the state is in CLOSE/FIN because the
IPS_OFFLOAD is still set but the flow is in teardown
2. ct gc removes the ct because the IPS_OFFLOAD is not set and
the CLOSE timeout is reached before the flow offload del
3. tcp ct is always set to ESTABLISHED with a very long timeout
in flow offload teardown/delete even though the state is already
CLOSED.

Also as a remark we can not assume that the FIN or RST packet is hitting
flow table teardown as the packet might get bumped to the slow path in
nftables.

Best
Sven