[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210116172119.2c68d4c2@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
Date: Sat, 16 Jan 2021 17:21:19 -0800
From: Jakub Kicinski <kuba@...nel.org>
To: Michal Kubecek <mkubecek@...e.cz>
Cc: netdev@...r.kernel.org, Edwin Peer <edwin.peer@...adcom.com>,
Stephen Hemminger <stephen@...workplumber.org>,
Andrew Gospodarek <andrew.gospodarek@...adcom.com>,
Michael Chan <michael.chan@...adcom.com>
Subject: Re: [PATCH iproute2] iplink: work around rtattr length limits for
IFLA_VFINFO_LIST
On Sat, 16 Jan 2021 22:12:23 +0100 Michal Kubecek wrote:
> On Fri, Jan 15, 2021 at 03:53:25PM -0800, Jakub Kicinski wrote:
> > On Fri, 15 Jan 2021 14:59:50 -0800 Edwin Peer wrote:
> > > The maximum possible length of an RTNL attribute is 64KB, but the
> > > nested VFINFO list exceeds this for more than about 220 VFs (each VF
> > > consumes approximately 300 bytes, depending on alignment and optional
> > > fields). Exceeding the limit causes IFLA_VFINFO_LIST's length to wrap
> > > modulo 16 bits in the kernel's nla_nest_end().
> >
> > Let's add Michal to CC, my faulty memory tells me he was fighting with
> > this in the past.
>
> I've been looking into this some time ago and even tried to open
> a discussion on this topic two or three times but there didn't seem
> sufficient interest.
>
> My idea back then was to use a separate query which would allow getting
> VF information using a dump request (one VF per message); the reply for
> RTM_GETLINK request would either list all VFs as now if possible or only
> as many as fit into a nested attribute and indicate that the information
> is incomplete (or maybe omit the VF information in such case as
> usefulness of the truncated list is questionable).
>
> However, my take from the discussions was that most developers who took
> part rather thought that there is no need for such rtnetlink feature as
> there is a devlink interface which does not suffer from this limit and
> NICs with so many VFs that IFLA_VFINFO_LIST exceeds 65535 bytes can
> provide devlink interface to handle them.
Indeed, that's still my position. AFAICT the options of "fixing" this
interface are rather limited and we don't want to perpetuate the
legacy-ndo-based method of configuring VFs - so reimplementation is
not appealing..
One way of working around the 64k limit we discussed with Edwin was
filtering attributes, effectively doing two dumps each with different
filtering flags, so that each one fits (e.g. dropping stats in one and
MAC addresses in the other).
> In any case, while the idea of handling the malformed messages composed
> by existing kernels makes sense, we should IMHO consider this a kernel
> bug which should be fixed so that kernel does not reply with malformed
> netlink messages (independently of whether this patch is applied to
> iproute2 or not).
I wonder. There is something inherently risky about making
a precedent for user space depending on invalid kernel output.
_If_ we want to fix the kernel, IMO we should only fix the kernel.
Powered by blists - more mailing lists