[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89i+=gQ8501d-rSf_wM_DDUgYj+uJJQPpCFev5CgaSsKrQg@mail.gmail.com>
Date: Wed, 10 May 2023 11:22:10 +0200
From: Eric Dumazet <edumazet@...gle.com>
To: Martin Zaharinov <micron10@...il.com>
Cc: Ido Schimmel <idosch@...sch.org>, netdev <netdev@...r.kernel.org>
Subject: Re: Very slow remove interface from kernel
On Wed, May 10, 2023 at 11:17 AM Martin Zaharinov <micron10@...il.com> wrote:
>
> Hi all
>
> one more update
>
> i test with Proxmox direct with kernel 6.2.6
>
> modprobe dummy numdummies=1
> ip link set dev dummy0 up
> for i in $(seq 2 1999); do ip link add link dummy0 name vlan$i type vlan id $i; done
> for i in $(seq 2 1999); do ip link set dev vlan$i up; done
> time for i in $(seq 2 1999); do ip link del link dummy0 name vlan$i type vlan id $i; done
>
> real 1m6.308s
> user 0m4.451s
> sys 0m1.589s
>
>
> This kernel is configured with CONFIG_HZ 250 and as you see i add 1998 vlans if add 4094 is time up to 4-5 min to remove
>
> in test kernel i set CONFIG_HZ to 1000 but i dont this this is fine for any server.
We use CONFIG_HZ=1000 on server builds.
Other values cause suboptimal behavior, for instance in TCP stack.
>
>
> > On 9 May 2023, at 23:08, Ido Schimmel <idosch@...sch.org> wrote:
> >
> > On Tue, May 09, 2023 at 09:50:18PM +0300, Martin Zaharinov wrote:
> >> i try on kernel 6.3.1
> >>
> >>
> >> time for i in $(seq 2 4094); do ip link del link eth1 name vlan$i type vlan id $i; done
> >>
> >> real 4m51.633s —— here i stop with Ctrl + C - and rerun and second part finish after 3 min
> >> user 0m7.479s
> >> sys 0m0.367s
> >
> > You are off-CPU most of the time, the question is what is blocking. I'm
> > getting the following results with net-next:
> >
> > # time -p for i in $(seq 2 4094); do ip link del dev eth0.$i; done
> > real 177.09
> > user 3.85
> > sys 31.26
> >
> > When using a batch file to perform the deletion:
> >
> > # time -p ip -b vlan_del.batch
> > real 35.25
> > user 0.02
> > sys 3.61
> >
> > And to check where we are blocked most of the time while using the batch
> > file:
> >
> > # ../bcc/libbpf-tools/offcputime -p `pgrep -nx ip`
> > [...]
> > __schedule
> > schedule
> > schedule_timeout
> > wait_for_completion
> > rcu_barrier
> > netdev_run_todo
> > rtnetlink_rcv_msg
> > netlink_rcv_skb
> > netlink_unicast
> > netlink_sendmsg
> > ____sys_sendmsg
> > ___sys_sendmsg
> > __sys_sendmsg
> > do_syscall_64
> > entry_SYSCALL_64_after_hwframe
> > - ip (3660)
> > 25089479
> > [...]
> >
> > We are blocked for around 70% of the time on the rcu_barrier() in
> > netdev_run_todo().
> >
> > Note that one big difference between my setup and yours is that in my
> > case eth0 is a dummy device and in your case it's probably a physical
> > device that actually implements netdev_ops::ndo_vlan_rx_kill_vid(). If
> > so, it's possible that a non-negligible amount of time is spent talking
> > to hardware/firmware to delete the 4K VIDs from the device's VLAN
> > filter.
> >
> >>
> >>
> >> Config is very clean i remove big part of CONFIG options .
> >>
> >> is there options to debug what is happen.
> >>
> >> m
>
Powered by blists - more mailing lists