lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 8 Aug 2012 08:23:15 +0300
From:	Or Gerlitz <or.gerlitz@...il.com>
To:	"Michael S. Tsirkin" <mst@...hat.com>
Cc:	"Eric W. Biederman" <ebiederm@...ssion.com>,
	Or Gerlitz <ogerlitz@...lanox.com>, davem@...emloft.net,
	roland@...nel.org, netdev@...r.kernel.org, ali@...lanox.com,
	sean.hefty@...el.com, Erez Shitrit <erezsh@...lanox.co.il>,
	Doug Ledford <dledford@...hat.com>
Subject: Re: [PATCH V2 09/12] net/eipoib: Add main driver functionality

On Sun, Aug 5, 2012 at 9:50 PM, Michael S. Tsirkin <mst@...hat.com> wrote:

[...]
> So it seems that a sane solution would involve an extra level of
> indirection, with guest addresses being translated to host IB addresses.
> As long as you do this, maybe using an ethernet frame format makes sense.
[...]

Yep, that's among the points we're trying to make, the way you've put
it makes it clearer.

> So far the things that make sense. Here are some that don't, to me:

> - Is a pdf presentation all you have in terms of documentation?
>   We are talking communication protocols here - I would expect a
>   proper spec, and some effort to standardize, otherwise where's the
>   guarantee it won't change in an incompatible way?

To be precise, the solution uses 100% IPoIB wire-protocol, so we don't
see a need
for any spec change / standardization effort. This might go to the 1st
point you've
brought... improve the documentation, will do that. The pdf you looked
at was presented
in a conference.

>   Other things that I would expect to be addressed in such a spec is
>   interaction with other IPoIB features, such as connected
>   mode, checksum offloading etc, and IB features such as multipath etc.

For the eipoib interface, it doesn't really matters if the underlyind
ipoib clones used by it (we call them VIFs) use connected or datagram
mode, what does matter is the MTU and offload features supported by
these VIFs, for which the eipoib interface will have the min among all
these VIFs. Since for a given eipoib nic, all its VIFs must originated
from the same IPoIB PIF (e.g ib0) its easy admin job to make sure they
all have the same mtu / features which are needed for that eipoib nic,
e.g by using the same mode (connected/datagram for all of them), hope
this is clear.


> - The way you encode LID/QPN in the MAC seems questionable. IIRC there's
>   more to IB addressing than just the LID.  Since everyone on the subnet
>   need access to this translation, I think it makes sense to store it in
>   the SM. I think this would also obviate some IPv4 specific hacks in kernel.

The idead beyond the encoding was uniqueness, LID/QPN is unique per IB
HCA end-node. I wasn't sure to understand the comment re the IPv4 hacks.

> - IGMP/MAC snooping in a driver is just too hairy.

mmm, any rough idea/direction how to do that otherwise?

>   As you point out, bridge currently needs the uplink in promisc mode.
>   I don't think a driver should work around that limitation.
>   For some setups, it might be interesting to remove the
>   promisc mode requirement, failing that,
>   I think you could use macvtap passthrough.

That's in the plans, the current code doesn't assume that the eipoib
has bridge on top, for VM networking it works with bridge + tap,
bridge + macvtap, but it would easily work with passthrough when we
allow to create multiple eipoib interfaces on the same ipoib PIF (e.g
today for the ib0 PIF we create eipoib eth0, and then two VIFs ib0.1
and ib0.2 that are enslaved by eth0, but next we will create eth1 and
eth2 which will use ib0.1 and ib0.2
respectively.

> - Currently migration works without host kernel help, would be
>   preferable to keep it that way.

OK
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ