linux-kernel - Re: [PATCH] sched: Move task_mm_cid

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <445b4203-940d-4817-bd45-9da757f22450@efficios.com>
Date: Fri, 6 Dec 2024 09:06:10 -0500
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Gabriele Monaco <gmonaco@...hat.com>, Ingo Molnar <mingo@...hat.com>,
 Peter Zijlstra <peterz@...radead.org>,
 Andrew Morton <akpm@...ux-foundation.org>, Mel Gorman <mgorman@...e.de>,
 linux-mm@...ck.org, linux-kernel@...r.kernel.org
Cc: Juri Lelli <juri.lelli@...hat.com>,
 Vincent Guittot <vincent.guittot@...aro.org>
Subject: Re: [PATCH] sched: Move task_mm_cid_work to mm delayed work

On 2024-12-06 03:53, Gabriele Monaco wrote:
> On Thu, 2024-12-05 at 11:25 -0500, Mathieu Desnoyers wrote:
[...]
> 
>>
>>> The behaviour imposed by this patch (at least the intended one) is
>>> to
>>> run the task_mm_cid_work with the configured periodicity (plus
>>> scheduling latency) for each active mm.
>>
>> What you propose looks like a more robust design than running under
>> the tick.
>>
>>> This behaviour seem to me more predictable, but would that even be
>>> required for rseq or is it just an overkill?
>>
>> Your approach looks more robust, so I would be tempted to introduce
>> it as a fix. Is the space/runtime overhead similar between the
>> tick/task work approach vs yours ?
> 
> I'm going to fix the implementation and come up with some runtime stats
> to compare the overhead of both methods.
> As for the space overhead, I think I can answer this question already:
> * The current approach uses a callback_head per thread (16 bytes)
> * Mine relies on a delayed work per mm (88 bytes)
> 
> Tasks with 5 threads or less have lower memory footprint with the
> current approach.
> I checked quickly on some systems I have access to and I'd say my
> approach introduces some memory overhead on an average system, but
> considering a task_struct can be 7-13 kB and an mm_struct is about 1.4
> kB, the overhead should be acceptable.

ok!

> 
>>
>>>
>>> In other words, was the tick chosen out of simplicity or is there
>>> some
>>> property that has to be preserved?
>>
>> Out of simplicity, and "do like what NUMA has done". But I am not
>> particularly attached to it. :-)
>>
>>>
>>> P.S. I run the rseq self tests on both this and the previous patch
>>> (both broken) and saw no failure.
>>
>> That's expected, because the tests do not so much depend on the
>> compactness of the mm_cid allocation. They way I validated this
>> in the past is by creating a simple multi-threaded program that
>> periodically prints the current mm_cid from userspace, and
>> sleep for a few seconds between printing, from many threads on
>> a many-core system.
>>
>> Then see how it reacts when run: are the mm_cid close to 0, or
>> are there large values of mm_cid allocated without compaction
>> over time ? I have not found a good way to translate this into
>> an automated test though. Ideas are welcome.
>>
>> You can look at the librseq basic_test as a starting point. [1]
> 
> Perfect, will try those!

Thinking back on this, you'll want a program that does the following
on a system with N CPUs:

- Phase 1: run one thread per cpu, pinned on each cpu. Print the
   mm_cid from each thread with the cpu number every second or so.

- Exit all threads except the main thread, join them from the main
   thread,

- Phase 2: the program is now single-threaded. We'd expect the
   mm_cid value to converge towards 0 as the periodic task clears
   unused CIDs.

So I think in phase 2 we can have an actual automated test: If after
an order of magnitude more time than the 100ms delay between periodic
tasks we still observe mm_cid > 0 in phase 2, then something is
wrong.

Thoughts ?

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com