[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <43F901BD926A4E43B106BF17856F0755018DF59D81@orsmsx508.amr.corp.intel.com>
Date: Wed, 27 Apr 2011 10:39:50 -0700
From: "Rose, Gregory V" <gregory.v.rose@...el.com>
To: Eric Dumazet <eric.dumazet@...il.com>
CC: Steve Hodgson <shodgson@...arflare.com>,
David Miller <davem@...emloft.net>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"bhutchings@...arflare.com" <bhutchings@...arflare.com>
Subject: RE: [RFC PATCH] netlink: Increase netlink dump skb message size
> -----Original Message-----
> From: Eric Dumazet [mailto:eric.dumazet@...il.com]
> Sent: Wednesday, April 27, 2011 10:30 AM
> To: Rose, Gregory V
> Cc: Steve Hodgson; David Miller; netdev@...r.kernel.org;
> bhutchings@...arflare.com
> Subject: RE: [RFC PATCH] netlink: Increase netlink dump skb message size
>
> Le mercredi 27 avril 2011 à 10:15 -0700, Rose, Gregory V a écrit :
> > > -----Original Message-----
> > > From: netdev-owner@...r.kernel.org [mailto:netdev-
> owner@...r.kernel.org]
> > > On Behalf Of Eric Dumazet
> > > Sent: Wednesday, April 27, 2011 9:30 AM
> > > To: Steve Hodgson
> > > Cc: Rose, Gregory V; David Miller; netdev@...r.kernel.org;
> > > bhutchings@...arflare.com
> > > Subject: Re: [RFC PATCH] netlink: Increase netlink dump skb message
> size
> > >
> > > Le mercredi 27 avril 2011 à 16:46 +0100, Steve Hodgson a écrit :
> > > > On 04/27/2011 04:24 PM, Eric Dumazet wrote:
> > > > > Le mardi 26 avril 2011 à 09:12 -0700, Rose, Gregory V a écrit :
> > > > >
> > > > >> I'm fine with however you folks want to approach this, just give
> me
> > > some direction.
> > > > >
> > > > > I would just try following patch :
> > > > >
> > > >
> > > > This allows the sfc driver to use 102 VFs, up from the current limit
> of
> > > > 45 VFs.
> > > >
> > > > It's unfortunate that this patch isn't sufficient to allow all 127
> VFs
> > > > to be used, but whilst we wait for a new netlink api this is an
> > > > improvement worth having.
> > > >
> > >
> > > netlink recvmsg() supports MSG_PEEK so user would get the needed size
> of
> > > its buffer before calling the real recvmsg()
> > >
> > > big blobs could be attached as skb fragments (up to 64Kbytes), but do
> we
> > > really want this...
> > [Greg Rose]
> >
> > I'm looking into an approach in which we make the get info dump for VFs
> orthogonal to the set VF info, i.e. like this:
> >
> > To set VF info we would follow the current convention:
> >
> > # ip link set eth(x) vf (n) mac xx:xx:xx:xx:xx:xx
> > # ip link set eth(x) vf (n) vlan (nnnn)
> >
> > To see VF info:
> >
> > # ip link show eth(x) vf (n)
> >
> > would show that VF's mac and vlan and could then be expanded in the
> future to display more information required for additional features that
> users are asking for.
> >
> > The IFLA_VF_INFO dump would be moved out of the info dump for the
> physical function interface and would no longer be nested which would get
> rid of the need for huge amounts of buffer for info dumps on VFs. The ip
> link show command for the PF would need to report the number of VFs
> currently allocated to the PF so that could fed into a script that loops
> to show each VFs info.
> >
> > I think this approach would fix the problems we're looking at right now.
> >
>
> Hmm, if you look at "ip link ..." you'll see it dumps everything from
> kernel and does the filter inside user command.
Right, but when I look in rtnetlink I see the routine to calculate the amount of buffer needed for VF info dump is the number of device parent (PF) VFs * the sizeof various IFLA_VF_INFO items. The more the VFs the bigger this gets, especially if you want to add more stuff to IFLA_VF_INFO. So when the kernel dumps this all out it can get bigger than the NLMSG_GOODSIZE (or DUMPSIZE) pretty quickly.
>
> BTW "ip" uses a 16384 bytes buffer, not a 8192 bytes one.
I know, that's why I suffered some confusion about which size to use. The ip command uses 16K but the NLMSG_GOODSIZE can be as small as 3712 bytes (depending on page size). Despite the user buffer being 16k if the size calculated by if_nlmsg_size() in rtnetlink.c is bigger than NLMSG_GOODSIZE then you don't see the info for more than 40 or so VFs. More VFs than that and nothing gets displayed.
- Greg
Powered by blists - more mailing lists