netdev - Re: mlx5 broken affinity

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.20.1711081822450.1937@nanos>
Date:   Wed, 8 Nov 2017 18:33:34 +0100 (CET)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Jes Sorensen <jsorensen@...com>
cc:     Sagi Grimberg <sagi@...mberg.me>,
        Tariq Toukan <tariqt@...lanox.com>,
        Saeed Mahameed <saeedm@....mellanox.co.il>,
        Networking <netdev@...r.kernel.org>,
        Leon Romanovsky <leonro@...lanox.com>,
        Saeed Mahameed <saeedm@...lanox.com>,
        Kernel Team <kernel-team@...com>,
        Christoph Hellwig <hch@....de>
Subject: Re: mlx5 broken affinity

On Wed, 8 Nov 2017, Jes Sorensen wrote:
> On 11/07/2017 10:07 AM, Thomas Gleixner wrote:
> > On Sun, 5 Nov 2017, Sagi Grimberg wrote:
> >> I do agree that the user would lose better cpu online/offline behavior,
> >> but it seems that users want to still have some control over the IRQ
> >> affinity assignments even if they lose this functionality.
> > 
> > Depending on the machine and the number of queues this might even result in
> > completely losing the ability to suspend/hibernate because the number of
> > available vectors on CPU0 is not sufficient to accomodate all queue
> > interrupts.
> 
> Depending on the system, suspend/resume is really the lesser interesting
> issue to the user. Pretty much any system with a 10/25GBps mlx5 NIC in
> it will be in some sort of rack and is unlikely to ever want to
> suspend/resume.

The discussions with Intel about that tell a different story and cpu
online/offline for power management purposes is - while debatable - widely
used.

That's where the whole idea for managed affinities originated from along
with avoiding the affinity hint and affinity notifier machinery which
creates more problems than it solves.

> >> Would it be possible to keep the managed facility until a user overrides
> >> an affinity assignment? This way if the user didn't touch it, we keep
> >> all the perks, and in case the user overrides it, we log the implication
> >> so the user is aware?
> > 
> > A lot of things are possible, the question is whether it makes sense. The
> > whole point is to have resources (queues, interrupts etc.) per CPU and have
> > them strictly associated.
> > 
> > Why would you give the user a knob to destroy what you carefully optimized?
> > 
> > Just because we can and just because users love those knobs or is there any
> > real technical reason?
> 
> Because the user sometimes knows better based on statically assigned
> loads, or the user wants consistency across kernels. It's great that the
> system is better at allocating this now, but we also need to allow for a
> user to change it. Like anything on Linux, a user wanting to blow off
> his/her own foot, should be allowed to do so.

That's fine, but that's not what the managed affinity facility provides. If
you want to leverage the spread mechanism, but avoid the managed part, then
this is a different story and we need to figure out how to provide that
without breaking the managed side of it.

As I said it's possible, but I vehemently disagree, that this is a bug in
the core code, as it was claimed several times in this thread.

The real issue is that the driver was converted to something which was
expected to behave differently. That's hardly a bug in the core code, at
most it's a documentation problem.

Thanks,

	tglx