[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1b7fba6f-85c3-50d6-a951-78db1ccfd04a@redhat.com>
Date: Tue, 22 Sep 2020 09:34:02 -0400
From: Nitesh Narayan Lal <nitesh@...hat.com>
To: Frederic Weisbecker <frederic@...nel.org>, bhelgaas@...gle.com
Cc: Jesse Brandeburg <jesse.brandeburg@...el.com>,
linux-kernel@...r.kernel.org, netdev@...r.kernel.org,
linux-pci@...r.kernel.org, mtosatti@...hat.com,
sassmann@...hat.com, jeffrey.t.kirsher@...el.com,
jacob.e.keller@...el.com, jlelli@...hat.com, hch@...radead.org,
mike.marciniszyn@...el.com, dennis.dalessandro@...el.com,
thomas.lendacky@....com, jerinj@...vell.com,
mathias.nyman@...el.com, jiri@...dia.com
Subject: Re: [RFC][Patch v1 2/3] i40e: limit msix vectors based on
housekeeping CPUs
On 9/22/20 5:54 AM, Frederic Weisbecker wrote:
> On Mon, Sep 21, 2020 at 11:08:20PM -0400, Nitesh Narayan Lal wrote:
>> On 9/21/20 6:58 PM, Frederic Weisbecker wrote:
>>> On Thu, Sep 17, 2020 at 11:23:59AM -0700, Jesse Brandeburg wrote:
>>>> Nitesh Narayan Lal wrote:
>>>>
>>>>> In a realtime environment, it is essential to isolate unwanted IRQs from
>>>>> isolated CPUs to prevent latency overheads. Creating MSIX vectors only
>>>>> based on the online CPUs could lead to a potential issue on an RT setup
>>>>> that has several isolated CPUs but a very few housekeeping CPUs. This is
>>>>> because in these kinds of setups an attempt to move the IRQs to the
>>>>> limited housekeeping CPUs from isolated CPUs might fail due to the per
>>>>> CPU vector limit. This could eventually result in latency spikes because
>>>>> of the IRQ threads that we fail to move from isolated CPUs.
>>>>>
>>>>> This patch prevents i40e to add vectors only based on available
>>>>> housekeeping CPUs by using num_housekeeping_cpus().
>>>>>
>>>>> Signed-off-by: Nitesh Narayan Lal <nitesh@...hat.com>
>>>> The driver changes are straightforward, but this isn't the only driver
>>>> with this issue, right? I'm sure ixgbe and ice both have this problem
>>>> too, you should fix them as well, at a minimum, and probably other
>>>> vendors drivers:
>>>>
>>>> $ rg -c --stats num_online_cpus drivers/net/ethernet
>>>> ...
>>>> 50 files contained matches
>>> Ouch, I was indeed surprised that these MSI vector allocations were done
>>> at the driver level and not at some $SUBSYSTEM level.
>>>
>>> The logic is already there in the driver so I wouldn't oppose to this very patch
>>> but would a shared infrastructure make sense for this? Something that would
>>> also handle hotplug operations?
>>>
>>> Does it possibly go even beyond networking drivers?
>> From a generic solution perspective, I think it makes sense to come up with a
>> shared infrastructure.
>> Something that can be consumed by all the drivers and maybe hotplug operations
>> as well (I will have to further explore the hotplug part).
> That would be great. I'm completely clueless about those MSI things and the
> actual needs of those drivers. Now it seems to me that if several CPUs become
> offline, or as is planned in the future, CPU isolation gets enabled/disabled
> through cpuset, then the vectors may need some reorganization.
+1
>
> But I don't also want to push toward a complicated solution to handle CPU hotplug
> if there is no actual problem to solve there.
Sure, even I am not particularly sure about the hotplug scenarios.
> So I let you guys judge.
>
>> However, there are RT workloads that are getting affected because of this
>> issue, so does it make sense to go ahead with this per-driver basis approach
>> for now?
> Yep that sounds good.
Thank you for confirming.
>
>> Since a generic solution will require a fair amount of testing and
>> understanding of different drivers. Having said that, I can definetly start
>> looking in that direction.
> Thanks a lot!
>
--
Nitesh
Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)
Powered by blists - more mailing lists