[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1427387258.21101.124.camel@redhat.com>
Date: Thu, 26 Mar 2015 12:27:38 -0400
From: Doug Ledford <dledford@...hat.com>
To: Michael Wang <yun.wang@...fitbricks.com>
Cc: linux-rdma@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-nfs@...r.kernel.org, netdev@...r.kernel.org,
Roland Dreier <roland@...nel.org>,
Sean Hefty <sean.hefty@...el.com>,
Hal Rosenstock <hal.rosenstock@...il.com>,
Ira Weiny <ira.weiny@...el.com>,
Trond Myklebust <trond.myklebust@...marydata.com>,
"J. Bruce Fields" <bfields@...ldses.org>,
"David S. Miller" <davem@...emloft.net>,
Moni Shoua <monis@...lanox.com>,
Or Gerlitz <ogerlitz@...lanox.com>,
Tatyana Nikolova <Tatyana.E.Nikolova@...el.com>,
Steve Wise <swise@...ngridcomputing.com>,
Yan Burman <yanb@...lanox.com>,
Jack Morgenstein <jackm@....mellanox.co.il>,
Bart Van Assche <bvanassche@....org>,
Yann Droneaud <ydroneaud@...eya.com>,
Colin Ian King <colin.king@...onical.com>,
Jiri Kosina <jkosina@...e.cz>,
Matan Barak <matanb@...lanox.com>,
Majd Dibbiny <majd@...lanox.com>,
Dan Carpenter <dan.carpenter@...cle.com>,
Mel Gorman <mgorman@...e.de>,
Alex Estrin <alex.estrin@...el.com>,
Eric Dumazet <edumazet@...gle.com>,
Erez Shitrit <erezsh@...lanox.com>,
Sagi Grimberg <sagig@...lanox.com>,
Haggai Eran <haggaie@...lanox.com>,
Shachar Raindel <raindel@...lanox.com>,
Mike Marciniszyn <mike.marciniszyn@...el.com>,
Tom Tucker <tom@....us>, Chuck Lever <chuck.lever@...cle.com>
Subject: Re: [PATCH 0/2 RESEND] IB/Verbs: Use helpers to refine the checking
on transport and link layer
On Thu, 2015-03-26 at 17:04 +0100, Michael Wang wrote:
> Hi, Doug
>
> Thanks for the excellent comments :-)
>
> On 03/26/2015 03:09 PM, Doug Ledford wrote:
> > On Wed, 2015-03-25 at 16:09 +0100, Michael Wang wrote:
> >> [snip]
> >>
> > [snip]
> >
> > So, I would suggest that we fix things up thusly:
> >
> > enum transport {
> > TRANSPORT_IB=1,
> > TRANSPORT_IWARP=2,
> > TRANSPORT_ROCE=4,
> > TRANSPORT_OPA=8,
> > TRANSPORT_USNIC=10,
> > };
> >
> > #define HAS_SA(ibdev) ((ibdev)->transport & (TRANSPORT_IB|TRANSPORT_OPA))
> > #define HAS_JUMBO_SA(ibdev) ((ibdev)->transport & TRANSPORT_OPA))
> >
> > or possibly
> >
> > static bool ib_dev_has_sa(struct ibv_device *ibdev)
> > {
> > return ibdev->transport & (TRANSPORT_IB | TRANSPORT_OPA);
> > }
>
> The idea sounds interesting, and here my silly questions come :-P
>
> So are you suggesting that we add a new bitmask 'transport' into 'struct ib_device'
> in kernel, and setup it at very beginning?
>
> Few more questions here is:
> 1. when to setup? (maybe inside ib_register_device() before doing client->add() callback?)
I don't think "we" can set it up here. The driver's have to set it up.
After all, the mlx4 driver will have to decide for itself what the port
transport is and tell us, we can't tell it.
> 2. how to setup? (still infer from the transport and link layer like we currently do?)
Find each point in each driver where they currently set the link layer
and transport fields today, and replace that with setting the new
transport bitmask instead.
> 3. in case if a device has ports with different link layer type (please correct
> me if this will never happen), then only one bitmask may not be enough to
> present the transport of all the ports? (maybe create a bitmask per port?)
Correct, a bitmask per port. And we can remove the existing transport
and link layer elements of the struct and replace it with just the new
transport. Then, whenever we need to copy a struct to user space, we
have a helper that looks something like this:
static void inline ib_set_user_transport(struct ib_device *ibdev,
struct user_ibv_device *uibdev)
{
switch(ibdev->port[port]->transport) {
case TRANSPORT_IB:
case TRANSPORT_OPA:
uibdev->port[port]->link_layer = INFINIBAND;
uibdev->port[port]->transport = INFINIBAND;
break;
case TRANSPORT_IWARP:
uibdev->port[port]->link_layer = INFINIBAND;
uibdev->port[port]->transport = IWARP;
break;
case TRANSPORT_ROCE:
uibdev->port[port]->link_layer = ETHERNET;
uibdev->port[port]->transport = INFINIBAND;
break;
case TRANSPORT_USNIC:
uibdev->port[port]->link_layer = ETHERNET;
uibdev->port[port]->transport = <whatever USNIC uses today>;
break;
default:
pr_err(ibdev, "unknown transport type %x\n",
ibdev->port[port]->transport);
}
}
That preserves the user space ABI and all user programs keep working,
while we update to an internal representation that makes more sense for
how things have evolved.
> Regards,
> Michael Wang
>
> >
> > If we do this, then the only thing we have to fix up to preserve ABI
> > with user space is to make sure that any time we export an ibv_device
> > struct and any time we import the same, we convert from our new internal
> > representation to the old representation that user space expects. And
> > we also need to make a few changes in the sysfs code to display the
> > properties as things expect. But, that would allow us to fix up what I
> > see as a problem right now, which is that we hide the information we
> > need to know what sort of device we are working on in two different
> > fields: the transport and the link layer. Instead, just use one field
> > with enough variants that we can store all of the relevant information
> > we need in that one field. This has the benefit that any comparisons
> > that happen in hot paths will now always be a single bitwise comparison
> > and will no longer need to hit two separate variables for two separate
> > compares.
> >
> >
> >
>
--
Doug Ledford <dledford@...hat.com>
GPG KeyID: 0E572FDD
Download attachment "signature.asc" of type "application/pgp-signature" (820 bytes)
Powered by blists - more mailing lists