linux-kernel - Re: [PATCH v2 3/3] net: mana: Allocate MSI-X vectors dynamically as required

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250506084919.GA5952@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net>
Date: Tue, 6 May 2025 01:49:19 -0700
From: Shradha Gupta <shradhagupta@...ux.microsoft.com>
To: Michael Kelley <mhklinux@...look.com>
Cc: "linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
	"linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Nipun Gupta <nipun.gupta@....com>,
	Yury Norov <yury.norov@...il.com>, Jason Gunthorpe <jgg@...pe.ca>,
	Jonathan Cameron <Jonathan.Cameron@...ei.com>,
	Anna-Maria Behnsen <anna-maria@...utronix.de>,
	Kevin Tian <kevin.tian@...el.com>, Long Li <longli@...rosoft.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Bjorn Helgaas <bhelgaas@...gle.com>, Rob Herring <robh@...nel.org>,
	Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org>,
	Krzysztof Wilczy?ski <kw@...ux.com>,
	Lorenzo Pieralisi <lpieralisi@...nel.org>,
	Dexuan Cui <decui@...rosoft.com>, Wei Liu <wei.liu@...nel.org>,
	Haiyang Zhang <haiyangz@...rosoft.com>,
	"K. Y. Srinivasan" <kys@...rosoft.com>,
	Andrew Lunn <andrew+netdev@...n.ch>,
	"David S. Miller" <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>,
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
	Konstantin Taranov <kotaranov@...rosoft.com>,
	Simon Horman <horms@...nel.org>, Leon Romanovsky <leon@...nel.org>,
	Maxim Levitsky <mlevitsk@...hat.com>,
	Erni Sri Satya Vennela <ernis@...ux.microsoft.com>,
	Peter Zijlstra <peterz@...radead.org>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	"linux-rdma@...r.kernel.org" <linux-rdma@...r.kernel.org>,
	Paul Rosswurm <paulros@...rosoft.com>,
	Shradha Gupta <shradhagupta@...rosoft.com>
Subject: Re: [PATCH v2 3/3] net: mana: Allocate MSI-X vectors dynamically as
 required

On Thu, May 01, 2025 at 11:08:09PM -0700, Shradha Gupta wrote:
> On Thu, May 01, 2025 at 03:56:48PM +0000, Michael Kelley wrote:
> > From: Shradha Gupta <shradhagupta@...ux.microsoft.com> Sent: Thursday, May 1, 2025 7:24 AM
> > > 
> > > On Thu, May 01, 2025 at 05:27:49AM +0000, Michael Kelley wrote:
> > > > From: Shradha Gupta <shradhagupta@...ux.microsoft.com> Sent: Friday, April 25,
> > > 2025 3:55 AM
> > > > >
> > > > > Currently, the MANA driver allocates MSI-X vectors statically based on
> > > > > MANA_MAX_NUM_QUEUES and num_online_cpus() values and in some cases ends
> > > > > up allocating more vectors than it needs. This is because, by this time
> > > > > we do not have a HW channel and do not know how many IRQs should be
> > > > > allocated.
> > > > >
> > > > > To avoid this, we allocate 1 MSI-X vector during the creation of HWC and
> > > > > after getting the value supported by hardware, dynamically add the
> > > > > remaining MSI-X vectors.
> > > >
> > > > I have a top-level thought about the data structures used to manage a
> > > > dynamic number of MSI-X vectors. The current code allocates a fixed size
> > > > array of struct gdma_irq_context, with one entry in the array for each
> > > > MSI-X vector. To find the entry for a particular msi_index, the code can
> > > > just index into the array, which is nice and simple.
> > > >
> > > > The new code uses a linked list of struct gdma_irq_context entries, with
> > > > one entry in the list for each MSI-X vector.  In the dynamic case, you can
> > > > start with one entry in the list, and then add to the list however many
> > > > additional entries the hardware will support.
> > > >
> > > > But this additional linked list adds significant complexity to the code
> > > > because it must be linearly searched to find the entry for a particular
> > > > msi_index, and there's the messiness of putting entries onto the list
> > > > and taking them off.  A spin lock is required.  Etc., etc.
> > > >
> > > > Here's an intermediate approach that would be simpler. Allocate a fixed
> > > > size array of pointers to struct gdma_irq_context. The fixed size is the
> > > > maximum number of possible MSI-X vectors for the device, which I
> > > > think is MANA_MAX_NUM_QUEUES, or 64 (correct me if I'm wrong
> > > > about that). Allocate a new struct gdma_irq_context when needed,
> > > > but store the address in the array rather than adding it onto a list.
> > > > Code can then directly index into the array to access the entry.
> > > >
> > > > Some entries in the array will be unused (and "wasted") if the device
> > > > uses fewer MSI-X vector, but each unused entry is only 8 bytes. The
> > > > max space unused is fewer than 512 bytes (assuming 64 entries in
> > > > the array), which is neglible in the grand scheme of things. With the
> > > > simpler code, and not having the additional list entry embedded in
> > > > each struct gmda_irq_context, you'll get some of that space back
> > > > anyway.
> > > >
> > > > Maybe there's a reason for the list that I missed in my initial
> > > > review of the code. But if not, it sure seems like the code could
> > > > be simpler, and having some unused 8 bytes entries in the array
> > > > is worth the tradeoff for the simplicity.
> > > >
> > > > Michael
> > > 
> > > Hey  Michael,
> > > 
> > > Thanks for your inputs. We did think of this approach and in fact that
> > > was how this patch was implemented(fixed size array) in the v1 of our
> > > internal reviews.
> > > 
> > > However, it came up in those reviews that we want to move away
> > > from the 64(MANA_MAX_NUM_QUEUES) as a hard limit for some new
> > > requirements, atleast for the dynamic IRQ allocation path. And now the
> > > new limit for all hardening purposes would be num_online_cpus().
> > > 
> > > Using this limit and the fixed array size approach creates problems,
> > > especially in machines with high number of vCPUs. It would lead to
> > > quite a bit of memory/resource wastage.
> > > 
> > > Hence, we decided to go ahead with this design.
> > > 
> > > Regards,
> > > Shradha.
> > 
> > One other thought:  Did you look at using an xarray? See
> > https://www.kernel.org/doc/html/latest/core-api/xarray.html.
> > It has most of or all the properties you need to deal with
> > a variable number of entries, while handling all the locking
> > automatically. Entries can be accessed with just a simple
> > index value.
> > 
> > I don't have first-hand experience writing code using xarrays,
> > so I can't be sure that it would simplify things for MANA IRQ
> > allocation, but it seems to be a very appropriate abstraction
> > for this use case.
> > 
> > Michael
> >
> Thanks Michael,
> 
> This does look promising for our usecase. I will try it with this patch,
> update the thread and then send out the next version as required.
> 
> Regards,
> Shradha.

Hi Michael,

going ahead with xarray implementation of the irq_contexts structure for
the next version.

Thanks.