linux-kernel - Re: [RFC PATCH v6] sched: Fix performance regression introduced by mm

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <ebfb79e8-a5a6-0bc4-c46c-4c1bc80777ac@efficios.com>
Date:   Fri, 14 Apr 2023 10:09:55 -0400
From:   Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:     Aaron Lu <aaron.lu@...el.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        linux-kernel@...r.kernel.org, Olivier Dion <odion@...icios.com>,
        michael.christie@...cle.com
Subject: Re: [RFC PATCH v6] sched: Fix performance regression introduced by
 mm_cid

On 2023-04-14 10:07, Aaron Lu wrote:
> On Thu, Apr 13, 2023 at 06:33:56PM -0400, Mathieu Desnoyers wrote:
>> Introduce per-mm/cpu current concurrency id (mm_cid) to fix a PostgreSQL
>> sysbench regression reported by Aaron Lu.
>>
>> Keep track of the currently allocated mm_cid for each mm/cpu rather than
>> freeing them immediately on context switch. This eliminates most atomic
>> operations when context switching back and forth between threads
>> belonging to different memory spaces in multi-threaded scenarios (many
>> processes, each with many threads). The per-mm/per-cpu mm_cid values are
>> serialized by their respective runqueue locks.
>>
>> Thread migration is handled by introducing a task-work executed
>> periodically, similarly to NUMA work, which delays reclaim of cid
>> values when they are unused for a period of time.
>>
>> Keep track of the allocation time for each per-cpu cid, and let the task
>> work clear them when they are observed to be older than
>> SCHED_MM_CID_PERIOD_NS and unused.
>>
>> This fix is going for a task-work and delayed reclaim approach rather
>> than adding hooks to migrate-from and migrate-to because migration
>> happens to be a hot path for various real-world workloads.
>>
>> Because we want to ensure the mm_cid converges towards the smaller
>> values as migrations happen, the prior optimization that was done when
>> context switching between threads belonging to the same mm is removed,
>> because it could delay the lazy release of the destination runqueue
>> mm_cid after it has been replaced by a migration. Removing this prior
>> optimization is not an issue performance-wise because the introduced
>> per-mm/per-cpu mm_cid tracking also covers this more specific case.
> 
> I was wondering, if a thread was migrated to all possible cpus in the
> SCHED_MM_CID_PERIOD_NS window, its mm_cidmask will be full. For user
> space, if cid can be the full set of cpus, then it will have to prepare
> storage for the full set. Then what's the point of doing compaction? Or
> do I understand it wrong?

Yes, that's a limit of this approach I am aware of. I'm currently trying 
to combine the best parts of v5 and v6 together to add back a low 
overhead migration hook that will preserve the compactness in those 
migration scenarios.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com