netdev - Re: Very slow remove interface from kernel

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <32CBE6C0-DAA7-4470-96FC-628FE69BDD14@gmail.com>
Date: Wed, 10 May 2023 16:15:07 +0300
From: Martin Zaharinov <micron10@...il.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: Ido Schimmel <idosch@...sch.org>,
 netdev <netdev@...r.kernel.org>
Subject: Re: Very slow remove interface from kernel

Ok i will try to set CONFIG_HZ to 1000 and will make tests


Thanks Eric

> On 10 May 2023, at 12:40, Eric Dumazet <edumazet@...gle.com> wrote:
> 
> On Wed, May 10, 2023 at 8:06 AM Martin Zaharinov <micron10@...il.com> wrote:
>> 
>> I think problem is in this part of code in net/core/dev.c
> 
> What makes you think this ?
> 
> msleep()  is not called a single time on my test bed.
> 
> # perf probe -a msleep
> # cat bench.sh
> modprobe dummy 2>/dev/null
> ip link set dev dummy0 up 2>/dev/null
> for i in $(seq 2 4094); do ip link add link dummy0 name vlan$i type
> vlan id $i; done
> for i in $(seq 2 4094); do ip link set dev vlan$i up; done
> time for i in $(seq 2 4094); do ip link del link dummy0 name vlan$i
> type vlan id $i; done
> 
> #  perf record -e probe:msleep -a -g ./bench.sh
> 
> real 0m59.877s
> user 0m0.588s
> sys 0m7.023s
> [ perf record: Woken up 6 times to write data ]
> [ perf record: Captured and wrote 8.561 MB perf.data ]
> # perf script
> #   << empty, nothing >>
> 
> 
> 
> 
>> #define WAIT_REFS_MIN_MSECS 1
>> #define WAIT_REFS_MAX_MSECS 250
>> /**
>> * netdev_wait_allrefs_any - wait until all references are gone.
>> * @list: list of net_devices to wait on
>> *
>> * This is called when unregistering network devices.
>> *
>> * Any protocol or device that holds a reference should register
>> * for netdevice notification, and cleanup and put back the
>> * reference if they receive an UNREGISTER event.
>> * We can get stuck here if buggy protocols don't correctly
>> * call dev_put.
>> */
>> static struct net_device *netdev_wait_allrefs_any(struct list_head *list)
>> {
>>        unsigned long rebroadcast_time, warning_time;
>>        struct net_device *dev;
>>        int wait = 0;
>> 
>>        rebroadcast_time = warning_time = jiffies;
>> 
>>        list_for_each_entry(dev, list, todo_list)
>>                if (netdev_refcnt_read(dev) == 1)
>>                        return dev;
>> 
>>        while (true) {
>>                if (time_after(jiffies, rebroadcast_time + 1 * HZ)) {
>>                        rtnl_lock();
>> 
>>                        /* Rebroadcast unregister notification */
>>                        list_for_each_entry(dev, list, todo_list)
>>                                call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
>> 
>>                        __rtnl_unlock();
>>                        rcu_barrier();
>>                        rtnl_lock();
>> 
>>                        list_for_each_entry(dev, list, todo_list)
>>                                if (test_bit(__LINK_STATE_LINKWATCH_PENDING,
>>                                             &dev->state)) {
>>                                        /* We must not have linkwatch events
>>                                         * pending on unregister. If this
>>                                         * happens, we simply run the queue
>>                                         * unscheduled, resulting in a noop
>>                                         * for this device.
>>                                         */
>>                                        linkwatch_run_queue();
>>                                        break;
>>                                }
>> 
>>                        __rtnl_unlock();
>> 
>>                        rebroadcast_time = jiffies;
>>                }
>> 
>>                if (!wait) {
>>                        rcu_barrier();
>>                        wait = WAIT_REFS_MIN_MSECS;
>>                } else {
>>                        msleep(wait);
>>                        wait = min(wait << 1, WAIT_REFS_MAX_MSECS);
>>                }
>> 
>>                list_for_each_entry(dev, list, todo_list)
>>                        if (netdev_refcnt_read(dev) == 1)
>>                                return dev;
>> 
>>                if (time_after(jiffies, warning_time +
>>                               READ_ONCE(netdev_unregister_timeout_secs) * HZ)) {
>>                        list_for_each_entry(dev, list, todo_list) {
>>                                pr_emerg("unregister_netdevice: waiting for %s to become free. Usage count = %d\n",
>>                                         dev->name, netdev_refcnt_read(dev));
>>                                ref_tracker_dir_print(&dev->refcnt_tracker, 10);
>>                        }
>> 
>>                        warning_time = jiffies;
>>                }
>>        }
>> }
>> 
>> 
>> 
>> m.
>> 
>> 
>>> On 9 May 2023, at 23:08, Ido Schimmel <idosch@...sch.org> wrote:
>>> 
>>> On Tue, May 09, 2023 at 09:50:18PM +0300, Martin Zaharinov wrote:
>>>> i try on kernel 6.3.1
>>>> 
>>>> 
>>>> time for i in $(seq 2 4094); do ip link del link eth1 name vlan$i type vlan id $i; done
>>>> 
>>>> real 4m51.633s  —— here i stop with Ctrl + C  -  and rerun  and second part finish after 3 min
>>>> user 0m7.479s
>>>> sys 0m0.367s
>>> 
>>> You are off-CPU most of the time, the question is what is blocking. I'm
>>> getting the following results with net-next:
>>> 
>>> # time -p for i in $(seq 2 4094); do ip link del dev eth0.$i; done
>>> real 177.09
>>> user 3.85
>>> sys 31.26
>>> 
>>> When using a batch file to perform the deletion:
>>> 
>>> # time -p ip -b vlan_del.batch
>>> real 35.25
>>> user 0.02
>>> sys 3.61
>>> 
>>> And to check where we are blocked most of the time while using the batch
>>> file:
>>> 
>>> # ../bcc/libbpf-tools/offcputime -p `pgrep -nx ip`
>>> [...]
>>>   __schedule
>>>   schedule
>>>   schedule_timeout
>>>   wait_for_completion
>>>   rcu_barrier
>>>   netdev_run_todo
>>>   rtnetlink_rcv_msg
>>>   netlink_rcv_skb
>>>   netlink_unicast
>>>   netlink_sendmsg
>>>   ____sys_sendmsg
>>>   ___sys_sendmsg
>>>   __sys_sendmsg
>>>   do_syscall_64
>>>   entry_SYSCALL_64_after_hwframe
>>>   -                ip (3660)
>>>       25089479
>>> [...]
>>> 
>>> We are blocked for around 70% of the time on the rcu_barrier() in
>>> netdev_run_todo().
>>> 
>>> Note that one big difference between my setup and yours is that in my
>>> case eth0 is a dummy device and in your case it's probably a physical
>>> device that actually implements netdev_ops::ndo_vlan_rx_kill_vid(). If
>>> so, it's possible that a non-negligible amount of time is spent talking
>>> to hardware/firmware to delete the 4K VIDs from the device's VLAN
>>> filter.
>>> 
>>>> 
>>>> 
>>>> Config is very clean i remove big part of CONFIG options .
>>>> 
>>>> is there options to debug what is happen.
>>>> 
>>>> m
>>