lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-Id: <D1743DF0-79B9-44C4-900C-22159B65CE59@gmail.com> Date: Thu, 25 May 2023 10:50:44 +0300 From: Martin Zaharinov <micron10@...il.com> To: Eric Dumazet <edumazet@...gle.com> Cc: Ido Schimmel <idosch@...sch.org>, netdev <netdev@...r.kernel.org> Subject: Re: Very slow remove interface from kernel Hi Eric after switch to HZ 1666 reduce time to 30 sec for remove 4093 vlans . Do you think there will be a problem? Best regards, martin > On 10 May 2023, at 12:40, Eric Dumazet <edumazet@...gle.com> wrote: > > On Wed, May 10, 2023 at 8:06 AM Martin Zaharinov <micron10@...il.com> wrote: >> >> I think problem is in this part of code in net/core/dev.c > > What makes you think this ? > > msleep() is not called a single time on my test bed. > > # perf probe -a msleep > # cat bench.sh > modprobe dummy 2>/dev/null > ip link set dev dummy0 up 2>/dev/null > for i in $(seq 2 4094); do ip link add link dummy0 name vlan$i type > vlan id $i; done > for i in $(seq 2 4094); do ip link set dev vlan$i up; done > time for i in $(seq 2 4094); do ip link del link dummy0 name vlan$i > type vlan id $i; done > > # perf record -e probe:msleep -a -g ./bench.sh > > real 0m59.877s > user 0m0.588s > sys 0m7.023s > [ perf record: Woken up 6 times to write data ] > [ perf record: Captured and wrote 8.561 MB perf.data ] > # perf script > # << empty, nothing >> > > > > >> #define WAIT_REFS_MIN_MSECS 1 >> #define WAIT_REFS_MAX_MSECS 250 >> /** >> * netdev_wait_allrefs_any - wait until all references are gone. >> * @list: list of net_devices to wait on >> * >> * This is called when unregistering network devices. >> * >> * Any protocol or device that holds a reference should register >> * for netdevice notification, and cleanup and put back the >> * reference if they receive an UNREGISTER event. >> * We can get stuck here if buggy protocols don't correctly >> * call dev_put. >> */ >> static struct net_device *netdev_wait_allrefs_any(struct list_head *list) >> { >> unsigned long rebroadcast_time, warning_time; >> struct net_device *dev; >> int wait = 0; >> >> rebroadcast_time = warning_time = jiffies; >> >> list_for_each_entry(dev, list, todo_list) >> if (netdev_refcnt_read(dev) == 1) >> return dev; >> >> while (true) { >> if (time_after(jiffies, rebroadcast_time + 1 * HZ)) { >> rtnl_lock(); >> >> /* Rebroadcast unregister notification */ >> list_for_each_entry(dev, list, todo_list) >> call_netdevice_notifiers(NETDEV_UNREGISTER, dev); >> >> __rtnl_unlock(); >> rcu_barrier(); >> rtnl_lock(); >> >> list_for_each_entry(dev, list, todo_list) >> if (test_bit(__LINK_STATE_LINKWATCH_PENDING, >> &dev->state)) { >> /* We must not have linkwatch events >> * pending on unregister. If this >> * happens, we simply run the queue >> * unscheduled, resulting in a noop >> * for this device. >> */ >> linkwatch_run_queue(); >> break; >> } >> >> __rtnl_unlock(); >> >> rebroadcast_time = jiffies; >> } >> >> if (!wait) { >> rcu_barrier(); >> wait = WAIT_REFS_MIN_MSECS; >> } else { >> msleep(wait); >> wait = min(wait << 1, WAIT_REFS_MAX_MSECS); >> } >> >> list_for_each_entry(dev, list, todo_list) >> if (netdev_refcnt_read(dev) == 1) >> return dev; >> >> if (time_after(jiffies, warning_time + >> READ_ONCE(netdev_unregister_timeout_secs) * HZ)) { >> list_for_each_entry(dev, list, todo_list) { >> pr_emerg("unregister_netdevice: waiting for %s to become free. Usage count = %d\n", >> dev->name, netdev_refcnt_read(dev)); >> ref_tracker_dir_print(&dev->refcnt_tracker, 10); >> } >> >> warning_time = jiffies; >> } >> } >> } >> >> >> >> m. >> >> >>> On 9 May 2023, at 23:08, Ido Schimmel <idosch@...sch.org> wrote: >>> >>> On Tue, May 09, 2023 at 09:50:18PM +0300, Martin Zaharinov wrote: >>>> i try on kernel 6.3.1 >>>> >>>> >>>> time for i in $(seq 2 4094); do ip link del link eth1 name vlan$i type vlan id $i; done >>>> >>>> real 4m51.633s —— here i stop with Ctrl + C - and rerun and second part finish after 3 min >>>> user 0m7.479s >>>> sys 0m0.367s >>> >>> You are off-CPU most of the time, the question is what is blocking. I'm >>> getting the following results with net-next: >>> >>> # time -p for i in $(seq 2 4094); do ip link del dev eth0.$i; done >>> real 177.09 >>> user 3.85 >>> sys 31.26 >>> >>> When using a batch file to perform the deletion: >>> >>> # time -p ip -b vlan_del.batch >>> real 35.25 >>> user 0.02 >>> sys 3.61 >>> >>> And to check where we are blocked most of the time while using the batch >>> file: >>> >>> # ../bcc/libbpf-tools/offcputime -p `pgrep -nx ip` >>> [...] >>> __schedule >>> schedule >>> schedule_timeout >>> wait_for_completion >>> rcu_barrier >>> netdev_run_todo >>> rtnetlink_rcv_msg >>> netlink_rcv_skb >>> netlink_unicast >>> netlink_sendmsg >>> ____sys_sendmsg >>> ___sys_sendmsg >>> __sys_sendmsg >>> do_syscall_64 >>> entry_SYSCALL_64_after_hwframe >>> - ip (3660) >>> 25089479 >>> [...] >>> >>> We are blocked for around 70% of the time on the rcu_barrier() in >>> netdev_run_todo(). >>> >>> Note that one big difference between my setup and yours is that in my >>> case eth0 is a dummy device and in your case it's probably a physical >>> device that actually implements netdev_ops::ndo_vlan_rx_kill_vid(). If >>> so, it's possible that a non-negligible amount of time is spent talking >>> to hardware/firmware to delete the 4K VIDs from the device's VLAN >>> filter. >>> >>>> >>>> >>>> Config is very clean i remove big part of CONFIG options . >>>> >>>> is there options to debug what is happen. >>>> >>>> m >>
Powered by blists - more mailing lists