netdev - Re: [PATCH net v2 1/2] net: Fix return value of qdisc ingress handling on success

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5a9d0e6a-2916-e791-a123-c8a957a3e3e5@nvidia.com>
Date:   Wed, 28 Sep 2022 10:55:49 +0300
From:   Paul Blakey <paulb@...dia.com>
To:     Cong Wang <xiyou.wangcong@...il.com>
Cc:     Daniel Borkmann <daniel@...earbox.net>,
        Vlad Buslov <vladbu@...dia.com>, Oz Shlomo <ozsh@...dia.com>,
        Roi Dayan <roid@...dia.com>, netdev@...r.kernel.org,
        Saeed Mahameed <saeedm@...dia.com>,
        Eric Dumazet <edumazet@...gle.com>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>
Subject: Re: [PATCH net v2 1/2] net: Fix return value of qdisc ingress
 handling on success



On 25/09/2022 21:00, Cong Wang wrote:
> On Sun, Sep 25, 2022 at 11:14:21AM +0300, Paul Blakey wrote:
>> Currently qdisc ingress handling (sch_handle_ingress()) doesn't
>> set a return value and it is left to the old return value of
>> the caller (__netif_receive_skb_core()) which is RX drop, so if
>> the packet is consumed, caller will stop and return this value
>> as if the packet was dropped.
>>
>> This causes a problem in the kernel tcp stack when having a
>> egress tc rule forwarding to a ingress tc rule.
>> The tcp stack sending packets on the device having the egress rule
>> will see the packets as not successfully transmitted (although they
>> actually were), will not advance it's internal state of sent data,
>> and packets returning on such tcp stream will be dropped by the tcp
>> stack with reason ack-of-unsent-data. See reproduction in [0] below.
>>
> 
> Hm, but how is this return value propagated to egress? I checked
> tcf_mirred_act() code, but don't see how it is even used there.
> 
> 318         err = tcf_mirred_forward(want_ingress, skb2);
> 319         if (err) {
> 320 out:
> 321                 tcf_action_inc_overlimit_qstats(&m->common);
> 322                 if (tcf_mirred_is_act_redirect(m_eaction))
> 323                         retval = TC_ACT_SHOT;
> 324         }
> 325         __this_cpu_dec(mirred_rec_level);
> 326
> 327         return retval;
> 
> 
> What am I missing?

for the ingress acting act_mirred it will return TC_ACT_CONSUMED above
the code you mentioned (since redirect=1, use_reinsert=1. Although 
TC_ACT_STOLEN which is the retval set for this action, will also act the 
same)


It is propagated as such (TX stack starting from tcp):

__tcp_transmit_skb
ip_queue_xmit
__ip_queue_xmit
ip_local_out
dst_output
ip_output
ip_finish_output
ip_finish_output2
neigh_output
neigh_hh_output
dev_queue_xmit
sch_handle_egress
   tcf_classify
   tcf_mirred_act
   tcf_mirred_forward
   netif_receive_skb
     # here we moved to ingress processing
     netif_receive_skb_internal
     __netif_receive_skb
     __netif_receive_skb_core
       sch_handle_ingress
         tcf_classify
         tcf_mirred_act
           tcf_mirred_forward
           tcf_dev_queue_xmit
           dev_queue_xmit
           # sends packet ...
         return TC_ACT_CONSUMED
       return NULL, and leaves *ret untouched
     return NET_RX_DROP
	...


> 
> Also, the offending commit is very old and this configuration is not
> uncommon at all, how could we even not notice this for such a long time?

I blamed the commit that left the ret value unset for the first time, 
but normally netif_receive_skb return value is ignored (as mentioned in 
the comment above it), this time it isn't ignored because it was chained 
to egress processing of the tcp transmit path. And this specific case 
where I chain tc ingress to tc egress came much later (with the commit
I blamed in v1 of this patch).

> 
> Thanks.