[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5db2e2b3-c768-661a-dc05-fa850c8d066a@nvidia.com>
Date: Thu, 27 May 2021 12:14:21 -0400
From: Ariel Levkovich <lariel@...dia.com>
To: Marcelo Ricardo Leitner <marcelo.leitner@...il.com>
Cc: netdev@...r.kernel.org, paulb@...dia.com, jiri@...nulli.us
Subject: Re: [PATCH net] net/sched: act_ct: Fix ct template allocation for
zone 0
On 5/27/21 11:50 AM, Marcelo Ricardo Leitner wrote:
> On Thu, May 27, 2021 at 11:36:18AM -0400, Ariel Levkovich wrote:
>> On 5/26/21 9:49 PM, Marcelo Ricardo Leitner wrote:
>>> On Wed, May 26, 2021 at 08:01:10PM +0300, Ariel Levkovich wrote:
>>>> Fix current behavior of skipping template allocation in case the
>>>> ct action is in zone 0.
>>>>
>>>> Skipping the allocation may cause the datapath ct code to ignore the
>>>> entire ct action with all its attributes (commit, nat) in case the ct
>>>> action in zone 0 was preceded by a ct clear action.
>>>>
>>>> The ct clear action sets the ct_state to untracked and resets the
>>>> skb->_nfct pointer. Under these conditions and without an allocated
>>>> ct template, the skb->_nfct pointer will remain NULL which will
>>>> cause the tc ct action handler to exit without handling commit and nat
>>>> actions, if such exist.
>>>>
>>>> For example, the following rule in OVS dp:
>>>> recirc_id(0x2),ct_state(+new-est-rel-rpl+trk),ct_label(0/0x1), \
>>>> in_port(eth0),actions:ct_clear,ct(commit,nat(src=10.11.0.12)), \
>>>> recirc(0x37a)
>>>>
>>>> Will result in act_ct skipping the commit and nat actions in zone 0.
>>>>
>>>> The change removes the skipping of template allocation for zone 0 and
>>>> treats it the same as any other zone.
>>>>
>>>> Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct")
>>>> Signed-off-by: Ariel Levkovich <lariel@...dia.com>
>>>> ---
>>>> net/sched/act_ct.c | 3 ---
>>> Hah! I guess I had looked only at netfilter code regarding
>>> NF_CT_DEFAULT_ZONE_ID.
>>>
>>>> 1 file changed, 3 deletions(-)
>>>>
>>>> diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
>>>> index ec7a1c438df9..dfdfb677e6a9 100644
>>>> --- a/net/sched/act_ct.c
>>>> +++ b/net/sched/act_ct.c
>>>> @@ -1202,9 +1202,6 @@ static int tcf_ct_fill_params(struct net *net,
>>>> sizeof(p->zone));
>>>> }
>>>> - if (p->zone == NF_CT_DEFAULT_ZONE_ID)
>>>> - return 0;
>>>> -
>>> This patch makes act_ct behave like ovs kernel datapath, but I'm not
>>> sure ovs kernel is doing the right thing. :-) (jump to last paragraph
>>> for my suggestion, might ease the reading)
>>>
>>> As you described:
>>> "The ct clear action sets the ct_state to untracked and resets the
>>> skb->_nfct pointer." I think the problem lies on the first part, on
>>> setting it to untracked.
>>>
>>> That was introduced in ovs kernel on commit b8226962b1c4
>>> ("openvswitch: add ct_clear action") and AFAICT the idea there was to
>>> "reset it to original state" [A].
>>>
>>> Then ovs userspace has commit 0cdfddddb664 ("datapath: add ct_clear
>>> action") as well, a mirror of the one above. There, it is noted:
>>> " - if IP_CT_UNTRACKED is not available use 0 as other nf_ct_set()
>>> calls do. Since we're setting ct to NULL this is okay."
>>>
>>> Thing is, IP_CT_ESTABLISHED is the first member of enum
>>> ip_conntrack_info and evalutes 0, while IP_CT_UNTRACKED is actually:
>>> include/uapi/linux/netfilter/nf_conntrack_common.h:
>>> /* only for userspace compatibility */
>>> #ifndef __KERNEL__
>>> IP_CT_NEW_REPLY = IP_CT_NUMBER,
>>> #else
>>> IP_CT_UNTRACKED = 7,
>>> #endif
>>>
>>> In the commits above, none of them mention that the packet should be
>>> set to Untracked. That's a different thing than "undoing CT"..
>>> That setting untrack here is the equivalent of:
>>> # iptables -A ... -j CT --notrack
>>>
>>> Then, when it finally reaches nf_conntrack_in:
>>> tmpl = nf_ct_get(skb, &ctinfo);
>>> vvvv--- NULL if from act_ct and zone 0, !NULL if from ovs
>>> if (tmpl || ctinfo == IP_CT_UNTRACKED) {
>>> ^^-- always true after ct_clear
>>> /* Previously seen (loopback or untracked)? Ignore. */
>>> if ((tmpl && !nf_ct_is_template(tmpl)) ||
>>> ctinfo == IP_CT_UNTRACKED)
>>> ^^--- always true..
>>> return NF_ACCEPT;
>>> ^^ returns her
>>> skb->_nfct = 0;
>>> }
>>>
>>> If ct_clear (act_ct and ovs) instead set it 0 (which, well, it's odd
>>> but maps to IP_CT_ESTABLISHED), it wouldn't match on the conditions
>>> above, and would do CT normally.
>>>
>>> With all that, what about keeping the check here, as it avoids an
>>> allocation that AFAICT is superfluous when not setting a zone != 0 and
>>> atomics for _get/_put, and changing ct_clear instead (act_ct and ovs),
>>> to NOT set the packet as IP_CT_UNTRACKED?
>>>
>>> Thanks,
>>> Marcelo
>> I understand your point. But if we go in this path, that means going into
>> zone 0,
>>
>> skb will not be associated with zone 0 unless there was a ct_clear action
>> prior to that.
>>
>> skb->_nfct will carry the pointer from previous zone. I see several
>> scenarios where this will
>>
>> be problematic.
> I don't follow why. When the skb is created, skb->_nfct is "entirely NULL",
> right? Done by the memset() on __alloc_skb().
>
> Then,
>
> @@ -950,7 +950,7 @@ static int tcf_ct_act(struct sk_buff *skb, const struct tc_action *a,
> ct = nf_ct_get(skb, &ctinfo);
> if (ct) {
> nf_conntrack_put(&ct->ct_general);
> - nf_ct_set(skb, NULL, IP_CT_UNTRACKED);
> + nf_ct_set(skb, NULL, 0);
> }
>
> goto out_clear;
>
> would restore it to that state and not have a pointer to the previous
> zone in skb->_nfct.
>
> Thanks,
> Marcelo
I meant if there's no ct_clear action.
Assume we already when through zone X in some previous action.
In such case skb->_nfct has that zone's id.
Now, if we go to zone=0, we skip this entirely, since p->tmpl is NULL :
/* Associate skb with specified zone. */
if (tmpl) {
nf_conntrack_put(skb_nfct(skb));
nf_conntrack_get(&tmpl->ct_general);
nf_ct_set(skb, tmpl, IP_CT_NEW);
}
And then in nf_conntrack_in it continues with the previous zone:
err = nf_conntrack_in(skb, &state)
calling -> ret = resolve_normal_ct(tmpl, skb, dataoff,
protonum, state);
calling -> zone = nf_ct_zone_tmpl(tmpl, skb, &tmp);
I encountered it by accident while running one of our test which was buggy
but due to the zone=0 bug the bug in the test was hidden and test passed.
Once I enabled my fix to alloc templated for zone=0 it was exposed.
The test doesn't use ct_clear between zones.
Ariel
>
>>
>> Thanks,
>>
>> Ariel
>>
>>>> nf_ct_zone_init(&zone, p->zone, NF_CT_DEFAULT_ZONE_DIR, 0);
>>>> tmpl = nf_ct_tmpl_alloc(net, &zone, GFP_KERNEL);
>>>> if (!tmpl) {
>>>> --
>>>> 2.25.2
>>>>
Powered by blists - more mailing lists