lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120805185031.GA18640@redhat.com>
Date:	Sun, 5 Aug 2012 21:50:31 +0300
From:	"Michael S. Tsirkin" <mst@...hat.com>
To:	"Eric W. Biederman" <ebiederm@...ssion.com>
Cc:	Or Gerlitz <ogerlitz@...lanox.com>, davem@...emloft.net,
	roland@...nel.org, netdev@...r.kernel.org, ali@...lanox.com,
	sean.hefty@...el.com, Erez Shitrit <erezsh@...lanox.co.il>
Subject: Re: [PATCH V2 09/12] net/eipoib: Add main driver functionality

On Thu, Aug 02, 2012 at 10:15:23AM -0700, Eric W. Biederman wrote:
> Or Gerlitz <ogerlitz@...lanox.com> writes:
> 
> > From: Erez Shitrit <erezsh@...lanox.co.il>
> >
> > The eipoib driver provides a standard Ethernet netdevice over
> > the InfiniBand IPoIB interface .
> >
> > Some services can run only on top of Ethernet L2 interfaces, and cannot be
> > bound to an IPoIB interface. With this new driver, these services can run
> > seamlessly.
> 
> Do I read this code correctly that what you are doing is not tunneling
> ethernet over IB but instead you are removing an ethernet header and
> replacing it with an IB header?
> 
> Do I also read this code correctly if you can't find your destination
> mac address in your ""neighbor table"" you do a normal IPoIB arp
> for the infiniband GUID?
> 
> Do I read this right that if presented with a non-IPv4 or ARP packet
> this code will do something undefined and unpredictable?
> 
> Maybe this makes some sense but just skimming it looks like you
> are trying to force a square peg into a round hole resulting in
> some weird code and some very weird maintainability issues.
> 
> I am honestly surprised at this approach.  I would think it would be
> faster and simpler to run an IB queue pair directly to the hypervisor or
> possibly even the guest operating system bypassing the kernel and doing
> all of this translation in userspace.
> 
> Eric

I'm on vacation and I have not looked at the patches, at Erez' request,
just reacting to the presentation and the discussion.

Bypassing the kernel has its own set of issues, not the
least of which is the need to lock all of guest memory which breaks
overcommit. Running an IB queue pair directly to the hypervisor
will also break live migration.

Another problem with exposing IB to guests has to do with the fact that
IB addresses such as combinations of LIDs, GIDs and QPNs to best of my
knowledge do not support soft hardware address setting, which interferes
with live migration.

So it seems that a sane solution would involve an extra level of
indirection, with guest addresses being translated to host IB addresses.

As long as you do this, maybe using an ethernet frame format makes
sense.

So far the things that make sense. Here are some that don't, to me:

- Is a pdf presentation all you have in terms of documentation?
  We are talking communication protocols here - I would expect a
  proper spec, and some effort to standardize, otherwise where's the
  guarantee it won't change in an incompatible way?
  Other things that I would expect to be addressed in such a spec is
  interaction with other IPoIB features, such as connected
  mode, checksum offloading etc, and IB features such as multipath etc.

- The way you encode LID/QPN in the MAC seems questionable. IIRC there's
  more to IB addressing than just the LID.  Since everyone on the subnet
  need access to this translation, I think it makes sense to store it in
  the SM. I think this would also obviate some IPv4 specific hacks
  in kernel.

- IGMP/MAC snooping in a driver is just too hairy.
  As you point out, bridge currently needs the uplink in promisc mode.
  I don't think a driver should work around that limitation.
  For some setups, it might be interesting to remove the
  promisc mode requirement, failing that,
  I think you could use macvtap passthrough.

- Currently migration works without host kernel help, would be
  preferable to keep it that way.


Hope this helps,
MST

> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ