[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20210705004955.il4hicxlpm3ujhsm@skbuf>
Date: Mon, 5 Jul 2021 03:49:55 +0300
From: Vladimir Oltean <olteanv@...il.com>
To: Danielle Ratson <danieller@...dia.com>
Cc: Vladimir Oltean <vladimir.oltean@....com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"davem@...emloft.net" <davem@...emloft.net>,
"kuba@...nel.org" <kuba@...nel.org>,
"shuah@...nel.org" <shuah@...nel.org>,
Ido Schimmel <idosch@...dia.com>,
Nikolay Aleksandrov <nikolay@...dia.com>,
"gnault@...hat.com" <gnault@...hat.com>,
"baowen.zheng@...igine.com" <baowen.zheng@...igine.com>,
Amit Cohen <amcohen@...dia.com>
Subject: Re: [PATCH net-next] selftests: net: Change the indication for iface
is up
Hi Danielle,
On Thu, Jul 01, 2021 at 08:50:42AM +0000, Danielle Ratson wrote:
> > -----Original Message-----
> > From: Vladimir Oltean <vladimir.oltean@....com>
> > Sent: Friday, June 25, 2021 12:51 AM
> > To: Danielle Ratson <danieller@...dia.com>
> > Cc: netdev@...r.kernel.org; davem@...emloft.net; kuba@...nel.org; shuah@...nel.org; Ido Schimmel <idosch@...dia.com>;
> > Nikolay Aleksandrov <nikolay@...dia.com>; gnault@...hat.com; baowen.zheng@...igine.com; Amit Cohen
> > <amcohen@...dia.com>
> > Subject: Re: [PATCH net-next] selftests: net: Change the indication for iface is up
> >
> > Hi Danielle,
> >
> > On Thu, Jun 24, 2021 at 06:15:15PM +0300, Danielle Ratson wrote:
> > > Currently, the indication that an iface is up, is the mark 'state UP'
> > > in the iface info. That situation can be achieved also when the
> > > carrier is not ready, and therefore after the state was found as 'up',
> > > it can be still changed to 'down'.
> > >
> > > For example, the below presents a part of a test output with one of
> > > the ifaces involved detailed info and timestamps:
> > >
> > > In setup_wait()
> > > 45: swp13: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel master
> > > vswp13 state UP mode DEFAULT group default qlen 1000
> > > link/ether 7c:fe:90:fc:7d:f1 brd ff:ff:ff:ff:ff:ff promiscuity 0
> > > minmtu 0 maxmtu 65535
> > > vrf_slave table 1 addrgenmode eui64 numtxqueues 1 numrxqueues 1
> > > gso_max_size 65536 gso_max_segs 65535 portname p13 switchid
> > > 7cfe90fc7dc0
> > > 17:58:27:242634417
> > >
> > > In dst_mac_match_test()
> >
> > What is dst_mac_match_test()?
> >
> > > 45: swp13: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel
> > > master vswp13 state DOWN mode DEFAULT group default qlen 1000
> > > link/ether 7c:fe:90:fc:7d:f1 brd ff:ff:ff:ff:ff:ff promiscuity 0
> > > minmtu 0 maxmtu 65535
> > > vrf_slave table 1 addrgenmode eui64 numtxqueues 1 numrxqueues 1
> > > gso_max_size 65536 gso_max_segs 65535 portname p13 switchid
> > > 7cfe90fc7dc0
> > > 17:58:32:254535834
> > > TEST: dst_mac match (skip_hw) [FAIL]
> > > Did not match on correct filter
> > >
> > > In src_mac_match_test()
> >
> > What is src_mac_match_test()?
> >
> > > 45: swp13: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel
> > > master vswp13 state UP mode DEFAULT group default qlen 1000
> > > link/ether 7c:fe:90:fc:7d:f1 brd ff:ff:ff:ff:ff:ff promiscuity 0
> > > minmtu 0 maxmtu 65535
> > > vrf_slave table 1 addrgenmode eui64 numtxqueues 1 numrxqueues 1
> > > gso_max_size 65536 gso_max_segs 65535 portname p13 switchid
> > > 7cfe90fc7dc0
> > > 17:58:34:446367468
> >
> > Can you please really show the output of 'ip link show dev swp13 up'?
> > The format you are showing is not that and it is really confusing.
> >
> > > TEST: src_mac match (skip_hw) [ OK ]
> > >
> > > In dst_ip_match_test()
> > > 45: swp13: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel
> > > master vswp13 state UP mode DEFAULT group default qlen 1000
> > > link/ether 7c:fe:90:fc:7d:f1 brd ff:ff:ff:ff:ff:ff promiscuity 0
> > > minmtu 0 maxmtu 65535
> > > vrf_slave table 1 addrgenmode eui64 numtxqueues 1 numrxqueues 1
> > > gso_max_size 65536 gso_max_segs 65535 portname p13 switchid
> > > 7cfe90fc7dc0
> > > 17:58:35:633518622
> > >
> > > In the example, after the setup_prepare() phase, the iface state was
> > > indeed 'UP' so the setup_wait() phase pass successfully. But since
> > > 'LOWER_UP' flag was not set yet, the next print, which was right
> > > before the first test case has started, the status turned into 'DOWN'.
> >
> > Why?
> >
> > > While, UP is an indicator that the interface has been enabled and
> > > running, LOWER_UP is a physical layer link flag. It indicates that an
> > > Ethernet cable was plugged in and that the device is connected to the network.
> > >
> > > Therefore, the existence of the 'LOWER_UP' flag is necessary for
> > > indicating that the port is up before testing communication.
> >
> > Documentation/networking/operstates.rst says:
> >
> > IF_OPER_UP (6):
> > Interface is operational up and can be used.
> >
> > Additionally, RFC2863 says:
> >
> > ifOperStatus OBJECT-TYPE
> > SYNTAX INTEGER {
> > up(1), -- ready to pass packets
> >
> > You have not proven why the UP operstate is not sufficient and
> > additional checks must be made for link flags. Also you have not
> > explained how this fixes your problem.
> >
> > > Change the indication for iface is up to include the existence of
> > > 'LOWER_UP'.
> > >
>
> Hi Vladimir,
>
> After a consultation with Ido Schimmel, we came up with the commit
> message below:
>
> According to Documentation/networking/operstates.rst, the
> administrative and operational state of an interface can be determined
> via both the 'ifi_flags' field in the ancillary header of a
> RTM_NEWLINK message and the 'IFLA_OPERSTATE' attribute in the message.
>
> When a driver signals loss of carrier via netif_carrier_off(), the
> change is reflected immediately in the 'ifi_flags' field, but not in
> the 'IFLA_OPERSTATE' attribute. This is because changes in
> 'IFLA_OPERSTATE' are performed in a link watch delayed work. From the
> document:
>
> "Whenever the driver CHANGES one of these flags, a workqueue event is
> scheduled to translate the flag combination to IFLA_OPERSTATE as
> follows"
>
> This means that it is possible for user space that is constantly
> querying the kernel for interface state to see the output below when
> the following commands are run. Their purpose is to simulate tearing
> down of a selftest and start of a new one.
>
> Commands:
>
> # ip link set dev veth1 down
> # ip link set dev veth0 down
> # ip link set dev veth0 up
> # ip link set dev veth1 up
>
> Output:
>
> 11: veth0@...h1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
> link/ether 06:49:03:9b:96:12 brd ff:ff:ff:ff:ff:ff
> 11: veth0@...h1: <BROADCAST,MULTICAST,UP,M-DOWN> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
> link/ether 06:49:03:9b:96:12 brd ff:ff:ff:ff:ff:ff
> 11: veth0@...h1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
> link/ether 06:49:03:9b:96:12 brd ff:ff:ff:ff:ff:ff
> 11: veth0@...h1: <BROADCAST,MULTICAST,UP,M-DOWN> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
> link/ether 06:49:03:9b:96:12 brd ff:ff:ff:ff:ff:ff
> 11: veth0@...h1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
> link/ether 06:49:03:9b:96:12 brd ff:ff:ff:ff:ff:ff
>
> In the third line, the operational state is set by rtnl_fill_ifinfo()
> to 'IF_OPER_DOWN' because the interface is administratively down, but
> 'dev->operstate' is still 'IF_OPER_UP' as the delayed work has yet to
> run. That is why the interface's operational state is reported as
> 'IF_OPER_UP' in the next line after it was put administratively up
> again.
>
> The output in the fourth line would make user space believe that the
> interface is fully operational if only the 'IFLA_OPERSTATE' attribute
> is considered. This is false as the interface does not have a carrier
> ('LOWER_UP' is not set) at this stage.
>
> Solve this by determining if an interface is operational solely based
> on the presence of the 'IFF_UP' and 'IFF_LOWER_UP' bits in the
> 'ifi_flags' field.
>
> Hope it answers all your questions and we will send the new commit for net-next.
Thanks, this is a better explanation.
Do you mean that the test below is supposed to fail? Because I ran it
for a few tens of minutes and it didn't.
#!/bin/bash
do_cleanup() {
ip link del veth0
}
trap do_cleanup EXIT
check() {
local test_num=$1
local output=$(ip link show dev veth0 up)
# Fail if the line doesn't contain both 'state UP' and 'LOWER_UP'
if ! echo "${output}" | grep 'state UP' | grep -q 'LOWER_UP'; then
echo "test $test_num failed: ${output}"
exit 1
fi
}
test_num=0
ip link add veth0 type veth peer name veth1
while :; do
ip link set dev veth1 down
ip link set dev veth0 down
ip link set veth0 up
ip link set veth1 up
check $test_num
test_num=$((${test_num} + 1))
done
Nonetheless, it seems somewhat plausible that it might happen.
I'm sure you have considered adding something along the lines of a call
to linkwatch_run_queue() in rtnl_fill_ifinfo(), to make sure that the
operstate is actual at the time the netlink message is being processed
by the kernel. After all, it seems common sense to not hand out data to
user space that is already stale by the time you collect it. Why is the
naive [ kernel side ] fix not possible, and instead every user space
program should know that IFLA_OPERSTATE comes from a work item and might
be stale data?
Powered by blists - more mailing lists