lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y3Y5phsWzatdnwok@localhost.localdomain>
Date:   Thu, 17 Nov 2022 14:39:50 +0100
From:   Michal Swiatkowski <michal.swiatkowski@...ux.intel.com>
To:     Leon Romanovsky <leon@...nel.org>
Cc:     "Samudrala, Sridhar" <sridhar.samudrala@...el.com>,
        netdev@...r.kernel.org, davem@...emloft.net, kuba@...nel.org,
        pabeni@...hat.com, edumazet@...gle.com,
        intel-wired-lan@...ts.osuosl.org, jiri@...dia.com,
        anthony.l.nguyen@...el.com, alexandr.lobakin@...el.com,
        wojciech.drewek@...el.com, lukasz.czapnik@...el.com,
        shiraz.saleem@...el.com, jesse.brandeburg@...el.com,
        mustafa.ismail@...el.com, przemyslaw.kitszel@...el.com,
        piotr.raczynski@...el.com, jacob.e.keller@...el.com,
        david.m.ertman@...el.com, leszek.kaliszczuk@...el.com
Subject: Re: [PATCH net-next 00/13] resource management using devlink reload

On Thu, Nov 17, 2022 at 01:45:38PM +0200, Leon Romanovsky wrote:
> On Thu, Nov 17, 2022 at 12:10:21PM +0100, Michal Swiatkowski wrote:
> > On Wed, Nov 16, 2022 at 07:59:43PM +0200, Leon Romanovsky wrote:
> > > On Wed, Nov 16, 2022 at 01:04:36PM +0100, Michal Swiatkowski wrote:
> > > > On Wed, Nov 16, 2022 at 08:04:56AM +0200, Leon Romanovsky wrote:
> > > > > On Tue, Nov 15, 2022 at 07:59:06PM -0600, Samudrala, Sridhar wrote:
> > > > > > On 11/15/2022 11:57 AM, Leon Romanovsky wrote:
> > > > > 
> > > > > <...>
> > > > > 
> > > > > > > > In case of ice driver lspci -vs shows:
> > > > > > > > Capabilities: [70] MSI-X: Enable+ Count=1024 Masked
> > > > > > > > 
> > > > > > > > so all vectors that hw supports (PFs, VFs, misc, etc). Because of that
> > > > > > > > total number of MSI-X in the devlink example from cover letter is 1024.
> > > > > > > > 
> > > > > > > > I see that mellanox shows:
> > > > > > > > Capabilities: [9c] MSI-X: Enable+ Count=64 Masked
> > > > > > > > 
> > > > > > > > I assume that 64 is in this case MSI-X ony for this one PF (it make
> > > > > > > > sense).
> > > > > > > Yes and PF MSI-X count can be changed through FW configuration tool, as
> > > > > > > we need to write new value when the driver is unbound and we need it to
> > > > > > > be persistent. Users are expecting to see "stable" number any time they
> > > > > > > reboot the server. It is not the case for VFs, as they are explicitly
> > > > > > > created after reboots and start "fresh" after every boot.
> > > > > > > 
> > > > > > > So we set large enough but not too large value as a default for PFs.
> > > > > > > If you find sane model of how to change it through kernel, you can count
> > > > > > > on our support.
> > > > > > 
> > > > > > I guess one main difference is that in case of ice, PF driver manager resources
> > > > > > for all its associated functions, not the FW. So the MSI-X count reported for PF
> > > > > > shows the total vectors(PF netdev, VFs, rdma, SFs). VFs talk to PF over a mailbox
> > > > > > to get their MSI-X vector information.
> > > > > 
> > > > > What is the output of lspci for ice VF when the driver is not bound?
> > > > > 
> > > > > Thanks
> > > > > 
> > > > 
> > > > It is the same after creating and after unbonding:
> > > > Capabilities: [70] MSI-X: Enable- Count=17 Masked-
> > > > 
> > > > 17, because 16 for traffic and 1 for mailbox.
> > > 
> > > Interesting, I think that your PF violates PCI spec as it always
> > > uses word "function" and not "device" while talks about MSI-X related
> > > registers.
> > > 
> > > Thanks
> > > 
> > 
> > I made mistake in one comment. 1024 isn't MSI-X amount for device. On
> > ice we have 2048 for the whole device. On two ports card each PF have
> > 1024 MSI-X. Our control register mapping to the internal space looks
> > like that (Assuming two port card; one VF with 5 MSI-X created):
> > INT[PF0].FIRST	0
> > 		1
> > 		2
> > 		
> > 		.
> > 		.
> > 		.
> > 
> > 		1019	INT[VF0].FIRST	__
> > 		1020			  | interrupts used
> > 		1021			  | by VF on PF0
> > 		1022			  |
> > INT[PF0].LAST	1023	INT[VF0].LAST	__|
> > INT[PF1].FIRST	1024
> > 		1025
> > 		1026
> > 
> > 		.
> > 		.
> > 		.
> > 		
> > INT[PF1].LAST	2047
> > 
> > MSI-X entry table size for PF0 is 1024, but entry table for VF is a part
> > of PF0 physical space.
> > 
> > Do You mean that "sharing" the entry between PF and VF is a violation of
> > PCI spec? 
> 
> You should consult with your PCI specification experts. It was my
> spec interpretation, which can be wrong.
> 

Sure, I will

> 
> > Sum of MSI-X count on all function within device shouldn't be
> > grater than 2048? 
> 
> No, it is 2K per-control message/per-function.
> 
> 
> > It is hard to find sth about this in spec. I only read
> > that: "MSI-X supports a maximum table size of 2048 entries". I will
> > continue searching for information about that.
> > 
> > I don't think that from driver perspective we can change the table size
> > located in message control register.
> 
> No, you can't, unless you decide explicitly violate spec.
> 
> > 
> > I assume in mlnx the tool that You mentioned can modify this table size?
> 
> Yes, it is FW configuration tool.
>

Thanks for clarification.
In summary, if this is really violation of PCI spec we for sure will
have to fix that and resource managment implemented in this patchset
will not make sense (as config for PF MSI-X amount will have to happen
in FW).

If it isn't, is it still NACK from You? I mean, if we implement the
devlink resources managment (maybe not generic one, only define in ice)
and sysfs for setting VF MSI-X, will You be ok with that? Or using
devlink to manage MSI-X resources isn't in general good solution?

Our motivation here is to allow user to configure MSI-X allocation for
his own use case. For example giving only 1 MSI-X for ethernet and RDMA
and spending rest on VFs. If I understand correctly, on mlnx card user
can set PF MSI-X to 1 in FW configuration tool and achieve the same.
Currently we don't have other mechanism to change default number of
MSI-X on ethernet.

> Thanks
> 
> > 
> > Thanks
> > 
> > > > 
> > > > Thanks
> > > > > > 
> > > > > > 
> > > > > > 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ