[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1510240106.2482.11.camel@mellanox.com>
Date: Fri, 10 Nov 2017 00:08:26 +0900
From: Saeed Mahameed <saeedm@...lanox.com>
To: Sagi Grimberg <sagi@...mberg.me>,
Thomas Gleixner <tglx@...utronix.de>
Cc: Jes Sorensen <jsorensen@...com>,
Tariq Toukan <tariqt@...lanox.com>,
Saeed Mahameed <saeedm@....mellanox.co.il>,
Networking <netdev@...r.kernel.org>,
Leon Romanovsky <leonro@...lanox.com>,
Kernel Team <kernel-team@...com>,
Christoph Hellwig <hch@....de>
Subject: Re: mlx5 broken affinity
On Wed, 2017-11-08 at 09:27 +0200, Sagi Grimberg wrote:
> > Depending on the machine and the number of queues this might even
> > result in
> > completely losing the ability to suspend/hibernate because the
> > number of
> > available vectors on CPU0 is not sufficient to accomodate all queue
> > interrupts.
> >
> > > Would it be possible to keep the managed facility until a user
> > > overrides
> > > an affinity assignment? This way if the user didn't touch it, we
> > > keep
> > > all the perks, and in case the user overrides it, we log the
> > > implication
> > > so the user is aware?
> >
> > A lot of things are possible, the question is whether it makes
> > sense. The
> > whole point is to have resources (queues, interrupts etc.) per CPU
> > and have
> > them strictly associated.
>
> Not arguing here.
>
> > Why would you give the user a knob to destroy what you carefully
> > optimized?
>
> Well, looks like someone relies on this knob, the question is if he
> is
> doing something better for his workload. I don't know, its really up
> to
> the user to say.
>
> > Just because we can and just because users love those knobs or is
> > there any
> > real technical reason?
>
> Again, I think Jes or others can provide more information.
Sagi, I believe Jes is not trying to argue about what initial affinity
values you give to the driver, We have a very critical regression that
is afflicting Live systems today and common tools that already exists
in various distros, such as irqblanace which solely depends on
smp_affinity sysfs entry which is now not write-able due to this
regression. please see https://github.com/Irqbalance/irqbalance/blob/ma
ster/activate.c
Some users would like to have thier network traffic handled in some
cores and free up other cores for other purposes, you just can't take
that away from them.
If revereting mlx5 patches would solve the issue, I am afraid that is
the solution i am going to go with, until the regression is fixed.
Powered by blists - more mailing lists