linux-kernel - Re: [RFC PATCH v9 2/2] sched: Fix performance regression introduced by mm

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <7a0c1db1-103d-d518-ed96-1584a28fbf32@efficios.com>
Date:   Thu, 20 Apr 2023 09:54:29 -0400
From:   Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:     Aaron Lu <aaron.lu@...el.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        linux-kernel@...r.kernel.org, Olivier Dion <odion@...icios.com>,
        michael.christie@...cle.com
Subject: Re: [RFC PATCH v9 2/2] sched: Fix performance regression introduced
 by mm_cid

On 2023-04-20 09:35, Aaron Lu wrote:
[...]
>>>>
>>>> Then we clearly have another member of mm_struct on the same cache line as
>>>> pcpu_cid which is bouncing all over the place and causing false-sharing. Any
>>>> idea which field(s) are causing this ?
>>>
>>> That's my first reaction too but as I said in an earlier reply:
>>> https://lore.kernel.org/lkml/20230419080606.GA4247@ziqianlu-desk2/
>>> I've tried to place pcpu_cid into a dedicate cacheline with no other
>>> fields sharing a cacheline with it in mm_struct but it didn't help...
>>
>> I see two possible culprits there:
>>
>> 1) The mm_struct pcpu_cid field is suffering from false-sharing. I would be
>>     interested to look at your attempt to move it to a separate cache line to
>>     try to figure out what is going on.
> 
> Brain damaged...my mistake, I only made sure its following fields not
> share the same cacheline but forgot to exclude its preceding fields and
> turned out it's one(some?) of the preceeding fields that caused false
> sharing. When I did:
> 
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 5eab61156f0e..a6f9d815991c 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -606,6 +606,7 @@ struct mm_struct {
>                   */
>                  atomic_t mm_count;
>   #ifdef CONFIG_SCHED_MM_CID
> +               CACHELINE_PADDING(_pad1_);
>                  /**
>                   * @pcpu_cid: Per-cpu current cid.
>                   *
> mm_cid_get() dropped to 0.0x% when running hackbench :-)

Now we are talking! :)

> 
> sched_mm_cid_migrate_to() is about 4% with most cycles spent on
> accessing mm->mm_users:
> 
>         │     dst_cid = READ_ONCE(dst_pcpu_cid->cid);
>    0.03 │       mov     0x8(%r12),%r15d
>         │     if (!mm_cid_is_unset(dst_cid) &&
>    0.07 │       cmp     $0xffffffff,%r15d
>         │     ↓ je      87
>         │     arch_atomic_read():
>         │     {
>         │     /*
>         │     * Note for KASAN: we deliberately don't use READ_ONCE_NOCHECK() here,
>         │     * it's non-inlined function that increases binary size and stack usage.
>         │     */
>         │     return __READ_ONCE((v)->counter);
>   76.13 │       mov     0x54(%r13),%eax
>         │     sched_mm_cid_migrate_to():
>         │       cmp     %eax,0x410(%rdx)
>   21.71 │     ↓ jle     1d8
>         │     atomic_read(&mm->mm_users) >= t->nr_cpus_allowed)
> 
> With this info, it should be mm_users that caused false sharing for
> pcpu_cid previously. Looks like mm_users is bouncing.

I suspect that the culprit here is mm_count rather than mm_users. 
mm_users just happens to share the same cache line as mm_count.

mm_count is incremented/decremented with mmgrab()/mmdrop() during
context switch.

This is likely causing other issues, for instance, the
membarrier_state field is AFAIR read-mostly, used for 
membarrier_mm_sync_core_before_usermode() to issue core
sync before every return to usermode if needed.

Other things like mm_struct pgd pointer appear to be likely
read-mostly variables.

I suspect it's mm_count which should be moved to its own cache line
to eliminate false-sharing with all the other read-mostly fields
of mm_struct.

Thoughts ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com