netdev - Re: [PATCH nf-next] netfilter: flowtable: separate replace, destroy and stats to different workqueues

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YFutK3Mn+h5OWNXe@horizon.localdomain>
Date:   Wed, 24 Mar 2021 18:20:43 -0300
From:   Marcelo Ricardo Leitner <marcelo.leitner@...il.com>
To:     Oz Shlomo <ozsh@...dia.com>
Cc:     Pablo Neira Ayuso <pablo@...filter.org>, netdev@...r.kernel.org,
        netfilter-devel@...r.kernel.org,
        Saeed Mahameed <saeedm@...dia.com>,
        Paul Blakey <paulb@...dia.com>
Subject: Re: [PATCH nf-next] netfilter: flowtable: separate replace, destroy
 and stats to different workqueues

On Wed, Mar 24, 2021 at 01:24:53PM +0200, Oz Shlomo wrote:
> Hi,

Hi,

> 
> On 3/24/2021 3:38 AM, Pablo Neira Ayuso wrote:
> > Hi Marcelo,
> > 
> > On Mon, Mar 22, 2021 at 03:09:51PM -0300, Marcelo Ricardo Leitner wrote:
> > > On Wed, Mar 03, 2021 at 05:11:47PM +0100, Pablo Neira Ayuso wrote:
> > [...]
> > > > Or probably make the cookie unique is sufficient? The cookie refers to
> > > > the memory address but memory can be recycled very quickly. If the
> > > > cookie helps to catch the reorder scenario, then the conntrack id
> > > > could be used instead of the memory address as cookie.
> > > 
> > > Something like this, if I got the idea right, would be even better. If
> > > the entry actually expired before it had a chance of being offloaded,
> > > there is no point in offloading it to then just remove it.
> > 
> > It would be interesting to explore this idea you describe. Maybe a
> > flag can be set on stale objects, or simply remove the stale object
> > from the offload queue. So I guess it should be possible to recover
> > control on the list of pending requests as a batch that is passed
> > through one single queue_work call.
> > 
> 
> Removing stale objects is a good optimization for cases when the rate of
> established connections is greater than the hardware offload insertion rate.
> However, with a single workqueue design, a burst of del commands may postpone connection offload tasks.
> Postponed offloads may cause additional packets to go through software, thus
> creating a chain effect which may diminish the system's connection rate.

Right. I didn't intend to object to multiqueues. I'm sorry if it
sounded that way.

> 
> Marcelo, AFAIU add/del are synchronized by design since the del is triggered by the gc thread.
> A del workqueue item will be instantiated only after a connection is in hardware.

They were synchronized, but after this patch, not anymore AFAICT:

tcf_ct_flow_table_add()
  flow_offload_add()
              if (nf_flowtable_hw_offload(flow_table)) {
                  __set_bit(NF_FLOW_HW, &flow->flags);    [A]
                  nf_flow_offload_add(flow_table, flow);
                           ^--- schedules on _add workqueue

then the gc thread:
nf_flow_offload_gc_step()
          if (nf_flow_has_expired(flow) || nf_ct_is_dying(flow->ct))
                  set_bit(NF_FLOW_TEARDOWN, &flow->flags);

          if (test_bit(NF_FLOW_TEARDOWN, &flow->flags)) {
	                   ^-- can also set by tcf_ct_flow_table_lookup()
			       on fin's, by calling flow_offload_teardown()
                  if (test_bit(NF_FLOW_HW, &flow->flags)) {
                                    ^--- this is set in [A], even if the _add is still queued
                          if (!test_bit(NF_FLOW_HW_DYING, &flow->flags))
                                  nf_flow_offload_del(flow_table, flow);

nf_flow_offload_del()
          offload = nf_flow_offload_work_alloc(flowtable, flow, FLOW_CLS_DESTROY);
          if (!offload)
                  return;

          set_bit(NF_FLOW_HW_DYING, &flow->flags);
          flow_offload_queue_work(offload);

NF_FLOW_HW_DYING only avoids a double _del here.

Maybe I'm just missing it but I'm not seeing how removals would only
happen after the entry is actually offloaded. As in, if the add queue
is very long, and the datapath see a FIN, seems the next gc iteration
could try to remove it before it's actually offloaded. I think this is
what Pablo meant on his original reply here too, then his idea on
having add/del to work with the same queue.