lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 10 May 2023 11:40:39 +0200
From: Eric Dumazet <edumazet@...gle.com>
To: Martin Zaharinov <micron10@...il.com>
Cc: Ido Schimmel <idosch@...sch.org>, netdev <netdev@...r.kernel.org>
Subject: Re: Very slow remove interface from kernel

On Wed, May 10, 2023 at 8:06 AM Martin Zaharinov <micron10@...il.com> wrote:
>
> I think problem is in this part of code in net/core/dev.c

What makes you think this ?

msleep()  is not called a single time on my test bed.

# perf probe -a msleep
# cat bench.sh
modprobe dummy 2>/dev/null
ip link set dev dummy0 up 2>/dev/null
for i in $(seq 2 4094); do ip link add link dummy0 name vlan$i type
vlan id $i; done
for i in $(seq 2 4094); do ip link set dev vlan$i up; done
time for i in $(seq 2 4094); do ip link del link dummy0 name vlan$i
type vlan id $i; done

#  perf record -e probe:msleep -a -g ./bench.sh

real 0m59.877s
user 0m0.588s
sys 0m7.023s
[ perf record: Woken up 6 times to write data ]
[ perf record: Captured and wrote 8.561 MB perf.data ]
# perf script
#   << empty, nothing >>




> #define WAIT_REFS_MIN_MSECS 1
> #define WAIT_REFS_MAX_MSECS 250
> /**
>  * netdev_wait_allrefs_any - wait until all references are gone.
>  * @list: list of net_devices to wait on
>  *
>  * This is called when unregistering network devices.
>  *
>  * Any protocol or device that holds a reference should register
>  * for netdevice notification, and cleanup and put back the
>  * reference if they receive an UNREGISTER event.
>  * We can get stuck here if buggy protocols don't correctly
>  * call dev_put.
>  */
> static struct net_device *netdev_wait_allrefs_any(struct list_head *list)
> {
>         unsigned long rebroadcast_time, warning_time;
>         struct net_device *dev;
>         int wait = 0;
>
>         rebroadcast_time = warning_time = jiffies;
>
>         list_for_each_entry(dev, list, todo_list)
>                 if (netdev_refcnt_read(dev) == 1)
>                         return dev;
>
>         while (true) {
>                 if (time_after(jiffies, rebroadcast_time + 1 * HZ)) {
>                         rtnl_lock();
>
>                         /* Rebroadcast unregister notification */
>                         list_for_each_entry(dev, list, todo_list)
>                                 call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
>
>                         __rtnl_unlock();
>                         rcu_barrier();
>                         rtnl_lock();
>
>                         list_for_each_entry(dev, list, todo_list)
>                                 if (test_bit(__LINK_STATE_LINKWATCH_PENDING,
>                                              &dev->state)) {
>                                         /* We must not have linkwatch events
>                                          * pending on unregister. If this
>                                          * happens, we simply run the queue
>                                          * unscheduled, resulting in a noop
>                                          * for this device.
>                                          */
>                                         linkwatch_run_queue();
>                                         break;
>                                 }
>
>                         __rtnl_unlock();
>
>                         rebroadcast_time = jiffies;
>                 }
>
>                 if (!wait) {
>                         rcu_barrier();
>                         wait = WAIT_REFS_MIN_MSECS;
>                 } else {
>                         msleep(wait);
>                         wait = min(wait << 1, WAIT_REFS_MAX_MSECS);
>                 }
>
>                 list_for_each_entry(dev, list, todo_list)
>                         if (netdev_refcnt_read(dev) == 1)
>                                 return dev;
>
>                 if (time_after(jiffies, warning_time +
>                                READ_ONCE(netdev_unregister_timeout_secs) * HZ)) {
>                         list_for_each_entry(dev, list, todo_list) {
>                                 pr_emerg("unregister_netdevice: waiting for %s to become free. Usage count = %d\n",
>                                          dev->name, netdev_refcnt_read(dev));
>                                 ref_tracker_dir_print(&dev->refcnt_tracker, 10);
>                         }
>
>                         warning_time = jiffies;
>                 }
>         }
> }
>
>
>
> m.
>
>
> > On 9 May 2023, at 23:08, Ido Schimmel <idosch@...sch.org> wrote:
> >
> > On Tue, May 09, 2023 at 09:50:18PM +0300, Martin Zaharinov wrote:
> >> i try on kernel 6.3.1
> >>
> >>
> >> time for i in $(seq 2 4094); do ip link del link eth1 name vlan$i type vlan id $i; done
> >>
> >> real 4m51.633s  —— here i stop with Ctrl + C  -  and rerun  and second part finish after 3 min
> >> user 0m7.479s
> >> sys 0m0.367s
> >
> > You are off-CPU most of the time, the question is what is blocking. I'm
> > getting the following results with net-next:
> >
> > # time -p for i in $(seq 2 4094); do ip link del dev eth0.$i; done
> > real 177.09
> > user 3.85
> > sys 31.26
> >
> > When using a batch file to perform the deletion:
> >
> > # time -p ip -b vlan_del.batch
> > real 35.25
> > user 0.02
> > sys 3.61
> >
> > And to check where we are blocked most of the time while using the batch
> > file:
> >
> > # ../bcc/libbpf-tools/offcputime -p `pgrep -nx ip`
> > [...]
> >    __schedule
> >    schedule
> >    schedule_timeout
> >    wait_for_completion
> >    rcu_barrier
> >    netdev_run_todo
> >    rtnetlink_rcv_msg
> >    netlink_rcv_skb
> >    netlink_unicast
> >    netlink_sendmsg
> >    ____sys_sendmsg
> >    ___sys_sendmsg
> >    __sys_sendmsg
> >    do_syscall_64
> >    entry_SYSCALL_64_after_hwframe
> >    -                ip (3660)
> >        25089479
> > [...]
> >
> > We are blocked for around 70% of the time on the rcu_barrier() in
> > netdev_run_todo().
> >
> > Note that one big difference between my setup and yours is that in my
> > case eth0 is a dummy device and in your case it's probably a physical
> > device that actually implements netdev_ops::ndo_vlan_rx_kill_vid(). If
> > so, it's possible that a non-negligible amount of time is spent talking
> > to hardware/firmware to delete the 4K VIDs from the device's VLAN
> > filter.
> >
> >>
> >>
> >> Config is very clean i remove big part of CONFIG options .
> >>
> >> is there options to debug what is happen.
> >>
> >> m
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ