netdev - Re: [PATCH nf-next] netfilter: flowtable: separate replace, destroy and stats to different workqueues

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <b89d8340-ca1c-1424-bbaa-0e85d37a84bb@nvidia.com>
Date:   Thu, 25 Mar 2021 10:46:12 +0200
From:   Oz Shlomo <ozsh@...dia.com>
To:     Marcelo Ricardo Leitner <marcelo.leitner@...il.com>
CC:     Pablo Neira Ayuso <pablo@...filter.org>, <netdev@...r.kernel.org>,
        <netfilter-devel@...r.kernel.org>,
        Saeed Mahameed <saeedm@...dia.com>,
        "Paul Blakey" <paulb@...dia.com>
Subject: Re: [PATCH nf-next] netfilter: flowtable: separate replace, destroy
 and stats to different workqueues

Hi Marcelo,

On 3/24/2021 11:20 PM, Marcelo Ricardo Leitner wrote:
> On Wed, Mar 24, 2021 at 01:24:53PM +0200, Oz Shlomo wrote:
>> Hi,
> 
> Hi,
> 
>>
>> On 3/24/2021 3:38 AM, Pablo Neira Ayuso wrote:
>>> Hi Marcelo,
>>>
>>> On Mon, Mar 22, 2021 at 03:09:51PM -0300, Marcelo Ricardo Leitner wrote:
>>>> On Wed, Mar 03, 2021 at 05:11:47PM +0100, Pablo Neira Ayuso wrote:
>>> [...]
>>>>> Or probably make the cookie unique is sufficient? The cookie refers to
>>>>> the memory address but memory can be recycled very quickly. If the
>>>>> cookie helps to catch the reorder scenario, then the conntrack id
>>>>> could be used instead of the memory address as cookie.
>>>>
>>>> Something like this, if I got the idea right, would be even better. If
>>>> the entry actually expired before it had a chance of being offloaded,
>>>> there is no point in offloading it to then just remove it.
>>>
>>> It would be interesting to explore this idea you describe. Maybe a
>>> flag can be set on stale objects, or simply remove the stale object
>>> from the offload queue. So I guess it should be possible to recover
>>> control on the list of pending requests as a batch that is passed
>>> through one single queue_work call.
>>>
>>
>> Removing stale objects is a good optimization for cases when the rate of
>> established connections is greater than the hardware offload insertion rate.
>> However, with a single workqueue design, a burst of del commands may postpone connection offload tasks.
>> Postponed offloads may cause additional packets to go through software, thus
>> creating a chain effect which may diminish the system's connection rate.
> 
> Right. I didn't intend to object to multiqueues. I'm sorry if it
> sounded that way.
> 
>>
>> Marcelo, AFAIU add/del are synchronized by design since the del is triggered by the gc thread.
>> A del workqueue item will be instantiated only after a connection is in hardware.
> 
> They were synchronized, but after this patch, not anymore AFAICT:
> 
> tcf_ct_flow_table_add()
>    flow_offload_add()
>                if (nf_flowtable_hw_offload(flow_table)) {
>                    __set_bit(NF_FLOW_HW, &flow->flags);    [A]
>                    nf_flow_offload_add(flow_table, flow);
>                             ^--- schedules on _add workqueue
> 
> then the gc thread:
> nf_flow_offload_gc_step()
>            if (nf_flow_has_expired(flow) || nf_ct_is_dying(flow->ct))
>                    set_bit(NF_FLOW_TEARDOWN, &flow->flags);
> 
>            if (test_bit(NF_FLOW_TEARDOWN, &flow->flags)) {
> 	                   ^-- can also set by tcf_ct_flow_table_lookup()
> 			       on fin's, by calling flow_offload_teardown()
>                    if (test_bit(NF_FLOW_HW, &flow->flags)) {
>                                      ^--- this is set in [A], even if the _add is still queued
>                            if (!test_bit(NF_FLOW_HW_DYING, &flow->flags))
>                                    nf_flow_offload_del(flow_table, flow);
> 
> nf_flow_offload_del()
>            offload = nf_flow_offload_work_alloc(flowtable, flow, FLOW_CLS_DESTROY);
>            if (!offload)
>                    return;
> 
>            set_bit(NF_FLOW_HW_DYING, &flow->flags);
>            flow_offload_queue_work(offload);
> 
> NF_FLOW_HW_DYING only avoids a double _del here.
> 
> Maybe I'm just missing it but I'm not seeing how removals would only
> happen after the entry is actually offloaded. As in, if the add queue
> is very long, and the datapath see a FIN, seems the next gc iteration
> could try to remove it before it's actually offloaded. I think this is
> what Pablo meant on his original reply here too, then his idea on
> having add/del to work with the same queue.
> 

The work item will not be allocated if the hw offload is pending.

nf_flow_offload_work_alloc()
	if (test_and_set_bit(NF_FLOW_HW_PENDING, &flow->flags))
		return NULL;