lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 25 Feb 2019 16:58:07 -0800
From:   si-wei liu <si-wei.liu@...cle.com>
To:     "Michael S. Tsirkin" <mst@...hat.com>
Cc:     "Samudrala, Sridhar" <sridhar.samudrala@...el.com>,
        Siwei Liu <loseweigh@...il.com>, Jiri Pirko <jiri@...nulli.us>,
        Stephen Hemminger <stephen@...workplumber.org>,
        David Miller <davem@...emloft.net>,
        Netdev <netdev@...r.kernel.org>,
        virtualization@...ts.linux-foundation.org,
        virtio-dev <virtio-dev@...ts.oasis-open.org>,
        "Brandeburg, Jesse" <jesse.brandeburg@...el.com>,
        Alexander Duyck <alexander.h.duyck@...el.com>,
        Jakub Kicinski <kubakici@...pl>,
        Jason Wang <jasowang@...hat.com>, liran.alon@...cle.com
Subject: Re: [virtio-dev] Re: net_failover slave udev renaming (was Re: [RFC
 PATCH net-next v6 4/4] netvsc: refactor notifier/event handling code to use
 the bypass framework)



On 2/22/2019 7:14 AM, Michael S. Tsirkin wrote:
> On Thu, Feb 21, 2019 at 11:55:11PM -0800, si-wei liu wrote:
>>
>> On 2/21/2019 11:00 PM, Samudrala, Sridhar wrote:
>>>
>>> On 2/21/2019 7:33 PM, si-wei liu wrote:
>>>>
>>>> On 2/21/2019 5:39 PM, Michael S. Tsirkin wrote:
>>>>> On Thu, Feb 21, 2019 at 05:14:44PM -0800, Siwei Liu wrote:
>>>>>> Sorry for replying to this ancient thread. There was some remaining
>>>>>> issue that I don't think the initial net_failover patch got addressed
>>>>>> cleanly, see:
>>>>>>
>>>>>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1815268
>>>>>>
>>>>>> The renaming of 'eth0' to 'ens4' fails because the udev userspace was
>>>>>> not specifically writtten for such kernel automatic enslavement.
>>>>>> Specifically, if it is a bond or team, the slave would typically get
>>>>>> renamed *before* virtual device gets created, that's what udev can
>>>>>> control (without getting netdev opened early by the other part of
>>>>>> kernel) and other userspace components for e.g. initramfs,
>>>>>> init-scripts can coordinate well in between. The in-kernel
>>>>>> auto-enslavement of net_failover breaks this userspace convention,
>>>>>> which don't provides a solution if user care about consistent naming
>>>>>> on the slave netdevs specifically.
>>>>>>
>>>>>> Previously this issue had been specifically called out when IFF_HIDDEN
>>>>>> and the 1-netdev was proposed, but no one gives out a solution to this
>>>>>> problem ever since. Please share your mind how to proceed and solve
>>>>>> this userspace issue if netdev does not welcome a 1-netdev model.
>>>>> Above says:
>>>>>
>>>>>      there's no motivation in the systemd/udevd community at
>>>>>      this point to refactor the rename logic and make it work well with
>>>>>      3-netdev.
>>>>>
>>>>> What would the fix be? Skip slave devices?
>>>>>
>>>> There's nothing user can get if just skipping slave devices - the
>>>> name is still unchanged and unpredictable e.g. eth0, or eth1 the
>>>> next reboot, while the rest may conform to the naming scheme (ens3
>>>> and such). There's no way one can fix this in userspace alone - when
>>>> the failover is created the enslaved netdev was opened by the kernel
>>>> earlier than the userspace is made aware of, and there's no
>>>> negotiation protocol for kernel to know when userspace has done
>>>> initial renaming of the interface. I would expect netdev list should
>>>> at least provide the direction in general for how this can be
>>>> solved...
>
> I was just wondering what did you mean when you said
> "refactor the rename logic and make it work well with 3-netdev" -
> was there a proposal udev rejected?
No. I never believed this particular issue can be fixed in userspace 
alone. Previously someone had said it could be, but I never see any work 
or relevant discussion ever happened in various userspace communities 
(for e.g. dracut, initramfs-tools, systemd, udev, and NetworkManager). 
IMHO the root of the issue derives from the kernel, it makes more sense 
to start from netdev, work out and decide on a solution: see what can be 
done in the kernel in order to fix it, then after that engage userspace 
community for the feasibility...

> Anyway, can we write a time diagram for what happens in which order that
> leads to failure?  That would help look for triggers that we can tie
> into, or add new ones.
>

See attached diagram.

>
>
>
>
>>> Is there an issue if slave device names are not predictable? The user/admin scripts are expected
>>> to only work with the master failover device.
>> Where does this expectation come from?
>>
>> Admin users may have ethtool or tc configurations that need to deal with
>> predictable interface name. Third-party app which was built upon specifying
>> certain interface name can't be modified to chase dynamic names.
>>
>> Specifically, we have pre-canned image that uses ethtool to fine tune VF
>> offload settings post boot for specific workload. Those images won't work
>> well if the name is constantly changing just after couple rounds of live
>> migration.
> It should be possible to specify the ethtool configuration on the
> master and have it automatically propagated to the slave.
>
> BTW this is something we should look at IMHO.
I was elaborating a few examples that the expectation and assumption 
that user/admin scripts only deal with master failover device is 
incorrect. It had never been taken good care of, although I did try to 
emphasize it from the very beginning.

Basically what you said about propagating the ethtool configuration down 
to the slave is the key pursuance of 1-netdev model. However, what I am 
seeking now is any alternative that can also fix the specific udev 
rename problem, before concluding that 1-netdev is the only solution. 
Generally a 1-netdev scheme would take time to implement, while I'm 
trying to find a way out to fix this particular naming problem under 
3-netdev.

>
>>> Moreover, you were suggesting hiding the lower slave devices anyway. There was some discussion
>>> about moving them to a hidden network namespace so that they are not visible from the default namespace.
>>> I looked into this sometime back, but did not find the right kernel api to create a network namespace within
>>> kernel. If so, we could use this mechanism to simulate a 1-netdev model.
>> Yes, that's one possible implementation (IMHO the key is to make 1-netdev
>> model as much transparent to a real NIC as possible, while a hidden netns is
>> just the vehicle). However, I recall there was resistance around this
>> discussion that even the concept of hiding itself is a taboo for Linux
>> netdev. I would like to summon potential alternatives before concluding
>> 1-netdev is the only solution too soon.
>>
>> Thanks,
>> -Siwei
> Your scripts would not work at all then, right?
At this point we don't claim images with such usage as SR-IOV live 
migrate-able. We would flag it as live migrate-able until this ethtool 
config issue is fully addressed and a transparent live migration 
solution emerges in upstream eventually.


Thanks,
-Siwei
>
>
>>>> -Siwei
>>>>
>>>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@...ts.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@...ts.oasis-open.org
>


View attachment "net_failover_rename_race.txt" of type "text/plain" (3587 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ