[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <21bd2d87-c784-06dd-6e1e-25569ebef040@linux.alibaba.com>
Date: Wed, 16 Sep 2020 11:07:47 +0800
From: Wen Yang <wenyang@...ux.alibaba.com>
To: David Miller <davem@...emloft.net>
Cc: kuba@...nel.org, andrew@...n.ch, edumazet@...gle.com,
jiri@...lanox.com, leon@...nel.org, jwi@...ux.ibm.com,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] net: core: explicitly call linkwatch_fire_event to speed
up the startup of network services
on 2020/8/6 PM5:09, Wen Yang wrote:
>
>
> 在 2020/8/5 上午6:58, David Miller 写道:
>> From: Wen Yang <wenyang@...ux.alibaba.com>
>> Date: Sat, 1 Aug 2020 16:58:45 +0800
>>
>>> diff --git a/net/core/link_watch.c b/net/core/link_watch.c
>>> index 75431ca..6b9d44b 100644
>>> --- a/net/core/link_watch.c
>>> +++ b/net/core/link_watch.c
>>> @@ -98,6 +98,9 @@ static bool linkwatch_urgent_event(struct
>>> net_device *dev)
>>> if (netif_is_lag_port(dev) || netif_is_lag_master(dev))
>>> return true;
>>> + if ((dev->flags & IFF_UP) && dev->operstate == IF_OPER_DOWN)
>>> + return true;
>>> +
>>> return netif_carrier_ok(dev) && qdisc_tx_changing(dev);
>>> }
>>
>> You're bypassing explicitly the logic here:
>>
>> /*
>> * Limit the number of linkwatch events to one
>> * per second so that a runaway driver does not
>> * cause a storm of messages on the netlink
>> * socket. This limit does not apply to up events
>> * while the device qdisc is down.
>> */
>> if (!urgent_only)
>> linkwatch_nextevent = jiffies + HZ;
>> /* Limit wrap-around effect on delay. */
>> else if (time_after(linkwatch_nextevent, jiffies + HZ))
>> linkwatch_nextevent = jiffies;
>>
>> Something about this isn't right. We need to analyze what you are
>> seeing,
>> what device you are using, and what systemd is doing to figure out what
>> the right place for the fix.
>>
>> Thank you.
>>
>
> Thank you very much for your comments.
> We are using virtio_net and the environment is a microvm similar to
> firecracker.
>
> Let's briefly explain.
> net_device->operstate is assigned through linkwatch_event, and the
> call stack is as follows:
> process_one_work
> -> linkwatch_event
> -> __linkwatch_run_queue
> -> linkwatch_do_dev
> -> rfc2863_policy
> -> default_operstate
>
> During the machine startup process, net_device->operstate has the
> following two-step state changes:
>
> STEP A: virtnet_probe detects the network card and triggers the
> execution of linkwatch_fire_event.
> Since linkwatch_nextevent is initialized to 0, linkwatch_work will run.
> And since net_device->state is 6 (__LINK_STATE_PRESENT |
> __LINK_STATE_NOCARRIER), net_device->operstate will be changed from
> IF_OPER_UNKNOWN to IF_OPER_DOWN:
> eth0 operstate:0 (IF_OPER_UNKNOWN) -> operstate:2 (IF_OPER_DOWN)
>
> virtnet_probe then executes netif_carrier_on to update
> net_device->state, it will be changed from ‘__LINK_STATE_PRESENT |
> __LINK_STATE_NOCARRIER’ to __LINK_STATE_PRESENT:
> eth0 state: 6 (__LINK_STATE_PRESENT | __LINK_STATE_NOCARRIER) -> 2
> (__LINK_STATE_PRESENT)
>
> STEP B: One second later (because linkwatch_nextevent = jiffies + HZ),
> linkwatch_work is executed again.
> At this time, since net_device->state is __LINK_STATE_PRESENT, so the
> net_device->operstate will be changed from IF_OPER_DOWN to IF_OPER_UP:
> eth0 operstate:2 (IF_OPER_DOWN) -> operstate:6 (IF_OPER_UP)
>
>
> The above state change can be completed within 2 seconds.
> Generally, the machine will load the initramfs first, and do some
> initialization in the initramfs, which takes some time; then
> switch_root to the system disk and continue the initialization, which
> will also take some time, and finally start the systemd-networkd
> service, bringing link, etc.,
> In this way, the linkwatch_work work queue has enough time to run
> twice, and the state of net_device->operstate is already IF_OPER_UP,
> So bringing link up quickly returns the following information:
> Aug 06 16:35:55.966121 iZuf6h1kfgutxc3el68z2lZ systemd-networkd[580]:
> eth0: bringing link up
> ...
> Aug 06 16:35:55.990461 iZuf6h1kfgutxc3el68z2lZ systemd-networkd[580]:
> eth0: flags change: +UP +LOWER_UP +RUNNING
>
> But we are now using MicroVM, which requires extreme speed to start,
> bypassing the initramfs and directly booting the trimmed system on the
> disk.
> systemd-networkd starts in less than 1 second after booting. the STEP
> B has not been run yet, so it will wait for several hundred
> milliseconds here, as follows:
> Jul 20 22:00:47.432552 systemd-networkd[210]: eth0: bringing link up
> ...
> Jul 20 22:00:47.446108 systemd-networkd[210]: eth0: flags change: +UP
> +LOWER_UP
> ...
> Jul 20 22:00:47.781463 systemd-networkd[210]: eth0: flags change:
> +RUNNING
>
>
> Note: dhcp pays attention to IFF_RUNNING status, we may refer to:
> https://www.kernel.org/doc/Documentation/networking/operstates.txt
>
> A routing daemon or dhcp client just needs to care for IFF_RUNNING or
> waiting for operstate to go IF_OPER_UP/IF_OPER_UNKNOWN before
> considering the interface / querying a DHCP address.
>
> Finally, the STEP B above only updates the value of operstate based on
> the known state (operstate/state) on the net_device, without any
> hardware interaction involved, so it is not very reasonable to wait
> for 1 second there.
>
> By adding:
> + if ((dev->flags & IFF_UP) && dev->operstate == IF_OPER_DOWN)
> + return true;
> +
> We hope to improve the linkwatch_urgent_event function a bit.
>
> Hope to get more of your advice and guidance.
>
> Best wishes,
> Wen
hi, this issue is worth continuing discussion.
In the microVM scenario, it is valuable to increase the startup speed by
a few hundred milliseconds.
Best wishes,
Wen
Powered by blists - more mailing lists