[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALzJLG-Z2fJibyZ74XU7XCSx-g3=YapV9VyekAYZkS5JwvkwWw@mail.gmail.com>
Date: Wed, 1 Nov 2017 15:41:41 -0700
From: Saeed Mahameed <saeedm@....mellanox.co.il>
To: Jes Sorensen <jsorensen@...com>, Tariq Toukan <tariqt@...lanox.com>
Cc: Sagi Grimberg <sagi@...mberg.me>,
Networking <netdev@...r.kernel.org>,
Leon Romanovsky <leonro@...lanox.com>,
Saeed Mahameed <saeedm@...lanox.com>,
Kernel Team <kernel-team@...com>,
Christoph Hellwig <hch@....de>
Subject: Re: mlx5 broken affinity
On Wed, Nov 1, 2017 at 11:20 AM, Jes Sorensen <jsorensen@...com> wrote:
> On 11/01/2017 01:21 PM, Sagi Grimberg wrote:
>>> Hi,
>>
>> Hi Jes,
>>
>>> The below patch seems to have broken PCI IRQ affinity assignments for
>>> mlx5.
>>
>> I wouldn't call it breaking IRQ affinity assignments. It just makes
>> them automatic.
>
> Well it explicitly breaks the option for an admin to assign them
> manually, which contradicts what is written in
> Documentation/IRQ-affinity.txt
>
>>> Prior to this patch I could echo a value to /proc/irq/<x>/smp_affinity
>>> and it would get assigned. With this patch applied I get -EIO
>>
>> Adding Christoph,
>>
>> Ideally the affinity assignments would be better left alone, but
>> I wasn't aware that they are now immutable. Christoph?
>>
>>> The actual affinity assignments seem to have changed too, but I assume
>>> this is a result of the generic allocation?
>>
>> The affinity assignments should be giving you perfect linear assignment
>> of the rx/tx rings completion vectors to CPU cores:
>> first -> 0
>> second -> 1
>> third -> 2
>> ...
>>
>> Before, mlx5 spread affinity starting from the local numa node as it
>> relied on that when constructing RSS indirection table only to the local
>> numa node (which as a side effect hurt applications running on the far
>> node as the traffic was guaranteed to arrive rx rings located in the
>> "wrong" node).
>>
>> Now the RSS indirection table is linear across both numa nodes just like
>> the irq affinity settings. Another thing that was improved was the
>> pre/post vectors which blacklisted any non rx/tx completion vectors from
>> the affinity assignments (like port events completion vectors from
>> example).
>
> I am all in favor of making the automatic setup better, but assuming an
> automatic setup is always right seems problematic. There could be
> workloads where you may want to assign affinity explicitly.
>
> Jes
I vaguely remember Nacking Sagi's patch as we knew it would break
mlx5e netdev affinity assumptions.
Anyway we already submitted the mlx5e patch that removed such assumption
https://patchwork.ozlabs.org/patch/809196/ ("net/mlx5e: Distribute RSS
table among all RX rings")
Jes please confirm you have it.
And I agree here that user should be able to read
/proc/irq/x/smp_affinity and even modify it if required.
Powered by blists - more mailing lists