[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <d0ae6e4b-5392-704a-5c87-2939bc2d9c44@nvidia.com>
Date: Sun, 2 Oct 2022 12:30:18 +0300
From: Paul Blakey <paulb@...dia.com>
To: Cong Wang <xiyou.wangcong@...il.com>
Cc: Daniel Borkmann <daniel@...earbox.net>,
Vlad Buslov <vladbu@...dia.com>, Oz Shlomo <ozsh@...dia.com>,
Roi Dayan <roid@...dia.com>, netdev@...r.kernel.org,
Saeed Mahameed <saeedm@...dia.com>,
Eric Dumazet <edumazet@...gle.com>,
"David S. Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>
Subject: Re: [PATCH net v2 1/2] net: Fix return value of qdisc ingress
handling on success
On 01/10/2022 23:19, Cong Wang wrote:
> On Wed, Sep 28, 2022 at 10:55:49AM +0300, Paul Blakey wrote:
>>
>>
>> On 25/09/2022 21:00, Cong Wang wrote:
>>> On Sun, Sep 25, 2022 at 11:14:21AM +0300, Paul Blakey wrote:
>>>> Currently qdisc ingress handling (sch_handle_ingress()) doesn't
>>>> set a return value and it is left to the old return value of
>>>> the caller (__netif_receive_skb_core()) which is RX drop, so if
>>>> the packet is consumed, caller will stop and return this value
>>>> as if the packet was dropped.
>>>>
>>>> This causes a problem in the kernel tcp stack when having a
>>>> egress tc rule forwarding to a ingress tc rule.
>>>> The tcp stack sending packets on the device having the egress rule
>>>> will see the packets as not successfully transmitted (although they
>>>> actually were), will not advance it's internal state of sent data,
>>>> and packets returning on such tcp stream will be dropped by the tcp
>>>> stack with reason ack-of-unsent-data. See reproduction in [0] below.
>>>>
>>>
>>> Hm, but how is this return value propagated to egress? I checked
>>> tcf_mirred_act() code, but don't see how it is even used there.
>>>
>>> 318 err = tcf_mirred_forward(want_ingress, skb2);
>>> 319 if (err) {
>>> 320 out:
>>> 321 tcf_action_inc_overlimit_qstats(&m->common);
>>> 322 if (tcf_mirred_is_act_redirect(m_eaction))
>>> 323 retval = TC_ACT_SHOT;
>>> 324 }
>>> 325 __this_cpu_dec(mirred_rec_level);
>>> 326
>>> 327 return retval;
>>>
>>>
>>> What am I missing?
>>
>> for the ingress acting act_mirred it will return TC_ACT_CONSUMED above
>> the code you mentioned (since redirect=1, use_reinsert=1. Although
>> TC_ACT_STOLEN which is the retval set for this action, will also act the
>> same)
>>
>>
>> It is propagated as such (TX stack starting from tcp):
>>
>
> Sorry for my misunderstanding.
>
> I meant to say those TC_ACT_* return value, not NET_RX_*, but I worried
> too much here, as mirred lets user specify the return value
Yes TC_ACT_* start at the action mirred case, and end in
tcf_handle_ingress/egresss switch cases, which then should be converted
to NET_RX and NET_XMIT if done.
>
> BTW, it seems you at least miss the drop case, which is NET_RX_DROP for
> TC_ACT_SHOT at least? Possibly other code paths in sch_handle_ingress()
> too.
I'll add the SHOT for v3 as the packet was handled in this case, but I
should only update ret where the packet/skb was handled, which is where
we also return NULL, as otherwise rx pipeline should continue and will
update ret once handled (say in running the rx_handler).
For example, if there are not tc filters (tcf_classify returns
TC_ACT_UNSPEC) I should not update *ret, and it will continue to the rx
handler, and if there isn't any, it would return the default ret RX_DROP
value.
>
> Thanks.
Powered by blists - more mailing lists