lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <47f42d31-4b74-0273-62c1-0b75fffbf066@efficios.com>
Date:   Thu, 13 Apr 2023 11:37:18 -0400
From:   Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Aaron Lu <aaron.lu@...el.com>, linux-kernel@...r.kernel.org,
        Olivier Dion <odion@...icios.com>, michael.christie@...cle.com
Subject: Re: [RFC PATCH v4] sched: Fix performance regression introduced by
 mm_cid

On 2023-04-13 11:20, Peter Zijlstra wrote:
> On Thu, Apr 13, 2023 at 09:56:38AM -0400, Mathieu Desnoyers wrote:
> 
>>> Mathieu, WDYT? -- other than that the patch is an obvious hack :-)
>>
>> I hate it with passion :-)
>>
>> It is quite specific to your workload/configuration.
>>
>> If we take for instance a process with a large mm_users count which is
>> eventually affined to a subset of the cpus with cpusets or
>> sched_setaffinity, your patch will prevent compaction of the concurrency ids
>> when it really should not.
> 
> I don't think it will, it will only kick in once the higest cid is
> handed out (I should've used num_online_cpus() instead of nr_cpu_ids),
> and with affinity at play that should never happen.

So in that case, this optimization will only work if affinity is not 
set. E.g. a hackbench with cpuset or sched_setaffinity excluding one
core from the set will still be slower.

> 
> Now, the more fancy scheme with:
> 
> 	min(t->nr_cpus_allowed, atomic_read(&t->mm->mm_users))
> 
> that does get to be more complex; and I've yet to find a working version
> that doesn't also need a for_each_cpu() loop on for reclaim :/

Indeed. And with a allowed cpus approach, we need to carefully consider 
what happens if we change a allowed cpu mask from one set to another 
set, e.g, from allowing cpus 0, 1 to allowing only cpus 2, 3. There will 
be task migration, and we need to reclaim the cids from 0, 1, but we can 
very well be in a case where the number of mm_users is above the number 
of allowed cpus.

> 
> Anyway, I think the hack as presented is safe, but a hack none-the-less.

I don't think it is _unsafe_, but it will only trigger in specific 
scenarios, which makes it harder to understand more subtle performance 
regressions for scenarios that are not covered by this hack.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ