netdev - Re: [PATCH net v2] netfilter: nf_flow_table: fix teardown flow timeout

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YoKO0dPJs+VjbTXP@salvia>
Date:   Mon, 16 May 2022 19:50:09 +0200
From:   Pablo Neira Ayuso <pablo@...filter.org>
To:     Sven Auhagen <sven.auhagen@...eatech.de>
Cc:     Oz Shlomo <ozsh@...dia.com>, Felix Fietkau <nbd@....name>,
        netdev@...r.kernel.org, netfilter-devel@...r.kernel.org,
        Florian Westphal <fw@...len.de>, Paul Blakey <paulb@...dia.com>
Subject: Re: [PATCH net v2] netfilter: nf_flow_table: fix teardown flow
 timeout

On Mon, May 16, 2022 at 03:02:13PM +0200, Sven Auhagen wrote:
> On Mon, May 16, 2022 at 02:43:06PM +0200, Pablo Neira Ayuso wrote:
> > On Mon, May 16, 2022 at 02:23:00PM +0200, Sven Auhagen wrote:
> > > On Mon, May 16, 2022 at 02:13:03PM +0200, Pablo Neira Ayuso wrote:
> > > > On Mon, May 16, 2022 at 12:56:41PM +0200, Pablo Neira Ayuso wrote:
> > > > > On Thu, May 12, 2022 at 09:28:03PM +0300, Oz Shlomo wrote:
[...]
> > > > > [...]
> > > > > > diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> > > > > > index 0164e5f522e8..324fdb62c08b 100644
> > > > > > --- a/net/netfilter/nf_conntrack_core.c
> > > > > > +++ b/net/netfilter/nf_conntrack_core.c
> > > > > > @@ -1477,7 +1477,8 @@ static void gc_worker(struct work_struct *work)
> > > > > >  			tmp = nf_ct_tuplehash_to_ctrack(h);
> > > > > >  
> > > > > >  			if (test_bit(IPS_OFFLOAD_BIT, &tmp->status)) {
> > > > > > -				nf_ct_offload_timeout(tmp);
> > > > > 
> > > > > Hm, it is the trick to avoid checking for IPS_OFFLOAD from the packet
> > > > > path that triggers the race, ie. nf_ct_is_expired()
> > > > > 
> > > > > The flowtable ct fixup races with conntrack gc collector.
> > > > > 
> > > > > Clearing IPS_OFFLOAD might result in offloading the entry again for
> > > > > the closing packets.
> > > > > 
> > > > > Probably clear IPS_OFFLOAD from teardown, and skip offload if flow is
> > > > > in a TCP state that represent closure?
> > > > > 
> > > > >   		if (unlikely(!tcph || tcph->fin || tcph->rst))
> > > > >   			goto out;
> > > > > 
> > > > > this is already the intention in the existing code.
> > > > 
> > > > I'm attaching an incomplete sketch patch. My goal is to avoid the
> > > > extra IPS_ bit.
> > > 
> > > You might create a race with ct gc that will remove the ct
> > > if it is in close or end of close and before flow offload teardown is running
> > > so flow offload teardown might access memory that was freed.
> > 
> > flow object holds a reference to the ct object until it is released,
> > no use-after-free can happen.
> > 
> 
> Also if nf_ct_delete is called before flowtable delete?
> Can you let me know why?

nf_ct_delete() removes the conntrack object from lists and it
decrements the reference counter by one.

flow_offload_free() also calls nf_ct_put(). flow_offload_alloc() bumps
the reference count on the conntrack object before creating the flow.

> > > It is not a very likely scenario but never the less it might happen now
> > > since the IPS_OFFLOAD_BIT is not set and the state might just time out.
> > > 
> > > If someone sets a very small TCP CLOSE timeout it gets more likely.
> > > 
> > > So Oz and myself were debatting about three possible cases/problems:
> > > 
> > > 1. ct gc sets timeout even though the state is in CLOSE/FIN because the
> > > IPS_OFFLOAD is still set but the flow is in teardown
> > > 2. ct gc removes the ct because the IPS_OFFLOAD is not set and
> > > the CLOSE timeout is reached before the flow offload del
> > 
> > OK.
> > 
> > > 3. tcp ct is always set to ESTABLISHED with a very long timeout
> > > in flow offload teardown/delete even though the state is already
> > > CLOSED.
> > >
> > > Also as a remark we can not assume that the FIN or RST packet is hitting
> > > flow table teardown as the packet might get bumped to the slow path in
> > > nftables.
> > 
> > I assume this remark is related to 3.?
> 
> Yes, exactly.
> 
> > if IPS_OFFLOAD is unset, then conntrack would update the state
> > according to this FIN or RST.
> 
> It will move to a different TCP state anyways only the ct state
> will be at IPS_OFFLOAD_BIT and prevent it from beeing garbage collected.
> The timeout will be bumped back up as long as IPS_OFFLOAD_BIT is set
> even though TCP might already be CLOSED.

If teardown fixes the ct state and timeout to established, and IPS_OFFLOAD is
unset, then the packet is passed up in a consistent state.

I made a patch, it is based on yours, it's attached:

- If flow timeout expires or rst/fin is seen, ct state and timeout is
  fixed up (to established state) and IPS_OFFLOAD is unset.

- If rst/fin packet is seen, ct state and timeout is fixed up (to
  established state) and IPS_OFFLOAD is unset. The packet continues
  its travel up to the classic path, so conntrack triggers the
  transition from established to one of the close states.

For the case 1., IPS_OFFLOAD is not set anymore, so conntrack gc
cannot race to reset the ct timeout anymore.

For the case 2., if gc conntrack ever removes the ct entry, then the
IPS_DYING bit is set, which implicitly triggers the teardown state
from the flowtable gc. The flowtable still holds a reference to the
ct object, so no UAF can happen.

For the case 3. the conntrack is set to ESTABLISHED with a long
timeout, yes. This is to deal with the two possible cases:

a) flowtable timeout expired, so conntrack recovers control on the
   flow.
b) tcp rst/fin will take back the packet to slow path. The ct has been
   fixed up to established state so it will trasition to one of the
   close states.

Am I missing anything?

View attachment "fix-pickup.patch" of type "text/x-diff" (3656 bytes)