[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <41107812-01ba-169e-2f18-69cecec94d8d@linux.alibaba.com>
Date: Thu, 6 Aug 2020 17:09:00 +0800
From: Wen Yang <wenyang@...ux.alibaba.com>
To: David Miller <davem@...emloft.net>
Cc: kuba@...nel.org, xlpang@...ux.alibaba.com,
caspar@...ux.alibaba.com, andrew@...n.ch, edumazet@...gle.com,
jiri@...lanox.com, leon@...nel.org, jwi@...ux.ibm.com,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] net: core: explicitly call linkwatch_fire_event to speed
up the startup of network services
在 2020/8/5 上午6:58, David Miller 写道:
> From: Wen Yang <wenyang@...ux.alibaba.com>
> Date: Sat, 1 Aug 2020 16:58:45 +0800
>
>> diff --git a/net/core/link_watch.c b/net/core/link_watch.c
>> index 75431ca..6b9d44b 100644
>> --- a/net/core/link_watch.c
>> +++ b/net/core/link_watch.c
>> @@ -98,6 +98,9 @@ static bool linkwatch_urgent_event(struct net_device *dev)
>> if (netif_is_lag_port(dev) || netif_is_lag_master(dev))
>> return true;
>>
>> + if ((dev->flags & IFF_UP) && dev->operstate == IF_OPER_DOWN)
>> + return true;
>> +
>> return netif_carrier_ok(dev) && qdisc_tx_changing(dev);
>> }
>>
>
> You're bypassing explicitly the logic here:
>
> /*
> * Limit the number of linkwatch events to one
> * per second so that a runaway driver does not
> * cause a storm of messages on the netlink
> * socket. This limit does not apply to up events
> * while the device qdisc is down.
> */
> if (!urgent_only)
> linkwatch_nextevent = jiffies + HZ;
> /* Limit wrap-around effect on delay. */
> else if (time_after(linkwatch_nextevent, jiffies + HZ))
> linkwatch_nextevent = jiffies;
>
> Something about this isn't right. We need to analyze what you are seeing,
> what device you are using, and what systemd is doing to figure out what
> the right place for the fix.
>
> Thank you.
>
Thank you very much for your comments.
We are using virtio_net and the environment is a microvm similar to
firecracker.
Let's briefly explain.
net_device->operstate is assigned through linkwatch_event, and the call
stack is as follows:
process_one_work
-> linkwatch_event
-> __linkwatch_run_queue
-> linkwatch_do_dev
-> rfc2863_policy
-> default_operstate
During the machine startup process, net_device->operstate has the
following two-step state changes:
STEP A: virtnet_probe detects the network card and triggers the
execution of linkwatch_fire_event.
Since linkwatch_nextevent is initialized to 0, linkwatch_work will run.
And since net_device->state is 6 (__LINK_STATE_PRESENT |
__LINK_STATE_NOCARRIER), net_device->operstate will be changed from
IF_OPER_UNKNOWN to IF_OPER_DOWN:
eth0 operstate:0 (IF_OPER_UNKNOWN) -> operstate:2 (IF_OPER_DOWN)
virtnet_probe then executes netif_carrier_on to update
net_device->state, it will be changed from ‘__LINK_STATE_PRESENT |
__LINK_STATE_NOCARRIER’ to __LINK_STATE_PRESENT:
eth0 state: 6 (__LINK_STATE_PRESENT | __LINK_STATE_NOCARRIER) -> 2
(__LINK_STATE_PRESENT)
STEP B: One second later (because linkwatch_nextevent = jiffies + HZ),
linkwatch_work is executed again.
At this time, since net_device->state is __LINK_STATE_PRESENT, so the
net_device->operstate will be changed from IF_OPER_DOWN to IF_OPER_UP:
eth0 operstate:2 (IF_OPER_DOWN) -> operstate:6 (IF_OPER_UP)
The above state change can be completed within 2 seconds.
Generally, the machine will load the initramfs first, and do some
initialization in the initramfs, which takes some time; then switch_root
to the system disk and continue the initialization, which will also take
some time, and finally start the systemd-networkd service, bringing
link, etc.,
In this way, the linkwatch_work work queue has enough time to run twice,
and the state of net_device->operstate is already IF_OPER_UP,
So bringing link up quickly returns the following information:
Aug 06 16:35:55.966121 iZuf6h1kfgutxc3el68z2lZ systemd-networkd[580]:
eth0: bringing link up
...
Aug 06 16:35:55.990461 iZuf6h1kfgutxc3el68z2lZ systemd-networkd[580]:
eth0: flags change: +UP +LOWER_UP +RUNNING
But we are now using MicroVM, which requires extreme speed to start,
bypassing the initramfs and directly booting the trimmed system on the disk.
systemd-networkd starts in less than 1 second after booting. the STEP B
has not been run yet, so it will wait for several hundred milliseconds
here, as follows:
Jul 20 22:00:47.432552 systemd-networkd[210]: eth0: bringing link up
...
Jul 20 22:00:47.446108 systemd-networkd[210]: eth0: flags change: +UP
+LOWER_UP
...
Jul 20 22:00:47.781463 systemd-networkd[210]: eth0: flags change: +RUNNING
Note: dhcp pays attention to IFF_RUNNING status, we may refer to:
https://www.kernel.org/doc/Documentation/networking/operstates.txt
A routing daemon or dhcp client just needs to care for IFF_RUNNING or
waiting for operstate to go IF_OPER_UP/IF_OPER_UNKNOWN before
considering the interface / querying a DHCP address.
Finally, the STEP B above only updates the value of operstate based on
the known state (operstate/state) on the net_device, without any
hardware interaction involved, so it is not very reasonable to wait for
1 second there.
By adding:
+ if ((dev->flags & IFF_UP) && dev->operstate == IF_OPER_DOWN)
+ return true;
+
We hope to improve the linkwatch_urgent_event function a bit.
Hope to get more of your advice and guidance.
Best wishes,
Wen
Powered by blists - more mailing lists