netdev - Re: [PATCH net] ipvlan: add cond_resched_rcu() while processing muticast backlog

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <4b5f9ff7-12ab-9402-60c1-8a9ee852700d@gmail.com>
Date:   Mon, 9 Mar 2020 19:38:12 -0700
From:   Eric Dumazet <eric.dumazet@...il.com>
To:     Mahesh Bandewar (महेश बंडेवार) <maheshb@...gle.com>,
        Eric Dumazet <eric.dumazet@...il.com>
Cc:     David Miller <davem@...emloft.net>,
        Netdev <netdev@...r.kernel.org>,
        Eric Dumazet <edumazet@...gle.com>,
        Mahesh Bandewar <mahesh@...dewar.net>,
        syzbot <syzkaller@...glegroups.com>
Subject: Re: [PATCH net] ipvlan: add cond_resched_rcu() while processing
 muticast backlog



On 3/9/20 7:21 PM, Mahesh Bandewar (महेश बंडेवार) wrote:
> On Mon, Mar 9, 2020 at 6:07 PM Eric Dumazet <eric.dumazet@...il.com> wrote:
>>
>>
>>
>> On 3/9/20 3:57 PM, Mahesh Bandewar wrote:
>>> If there are substantial number of slaves created as simulated by
>>> Syzbot, the backlog processing could take much longer and result
>>> into the issue found in the Syzbot report.
>>>
>>
>> ...
>>
>>>
>>> Fixes: ba35f8588f47 (“ipvlan: Defer multicast / broadcast processing to a work-queue”)
>>> Signed-off-by: Mahesh Bandewar <maheshb@...gle.com>
>>> Reported-by: syzbot <syzkaller@...glegroups.com>
>>> ---
>>>  drivers/net/ipvlan/ipvlan_core.c | 1 +
>>>  1 file changed, 1 insertion(+)
>>>
>>> diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c
>>> index 53dac397db37..5759e91dec71 100644
>>> --- a/drivers/net/ipvlan/ipvlan_core.c
>>> +++ b/drivers/net/ipvlan/ipvlan_core.c
>>> @@ -277,6 +277,7 @@ void ipvlan_process_multicast(struct work_struct *work)
>>>                       }
>>>                       ipvlan_count_rx(ipvlan, len, ret == NET_RX_SUCCESS, true);
>>>                       local_bh_enable();
>>> +                     cond_resched_rcu();
>>
>> This does not work : If you release rcu_read_lock() here,
>> then the surrounding loop can not be continued without risking use-after-free
>>
> .. but cond_resched_rcu() is nothing but
>       rcu_read_unlock(); cond_resched(); rcu_read_lock();
> 
> isn't that sufficient?

It is buggy.

Think about iterating a list with a spinlock protection.

Then in the middle of the loop, releasing the spinlock and re-acquiring it.

The cursor in the loop might point to freed memory.

Same for rcu really.

> 
>> rcu_read_lock();
>> list_for_each_entry_rcu(ipvlan, &port->ipvlans, pnode) {
>>     ...
>>     cond_resched_rcu();
>>     // after this point bad things can happen
>> }
>>
>>
>> You probably should do instead :
>>
>> diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c
>> index 30cd0c4f0be0b4d1dea2c0a4d68d0e33d1931ebc..57617ff5565fb87035c13dcf1de9fa5431d04e10 100644
>> --- a/drivers/net/ipvlan/ipvlan_core.c
>> +++ b/drivers/net/ipvlan/ipvlan_core.c
>> @@ -293,6 +293,7 @@ void ipvlan_process_multicast(struct work_struct *work)
>>                 }
>>                 if (dev)
>>                         dev_put(dev);
>> +               cond_resched();
>>         }
> 
> reason this may not work is because the inner loop is for slaves for a
> single packet and if there are 1k slaves, then skb_clone() will be
> called 1k times before doing cond_reched() and the problem may not
> even get mitigated.


The problem that syzbot found is that queuing IPVLAN_QBACKLOG_LIMIT (1000) packets on the backlog
could force the ipvlan_process_multicast() worker to process 1000 packets.

Multiply this by the number of slaves, say 1000 -> 1,000,000 skbs clones.

After the patch, we divide by 1000 the time taken in one invocation,
that should be just good enough.

You do not need to schedule after _each_ clone.

Think about netdev_max_backlog which is set to 1000 : we believe it is fine
to process 1000 packets per round.