lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a6340beb-3b4a-2518-9340-ea0fc7583dbe@redhat.com>
Date:   Thu, 30 Sep 2021 12:58:51 -0400
From:   Waiman Long <llong@...hat.com>
To:     Barry Song <21cnbao@...il.com>, alex.kogan@...cle.com
Cc:     arnd@...db.de, bp@...en8.de, daniel.m.jordan@...cle.com,
        dave.dice@...cle.com, guohanjun@...wei.com, hpa@...or.com,
        jglauber@...vell.com, linux-arch@...r.kernel.org,
        linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
        linux@...linux.org.uk, mingo@...hat.com, peterz@...radead.org,
        steven.sistare@...cle.com, tglx@...utronix.de, will.deacon@....com,
        x86@...nel.org
Subject: Re: [PATCH v15 0/6] Add NUMA-awareness to qspinlock

On 9/30/21 5:44 AM, Barry Song wrote:
>> We have done some performance evaluation with the locktorture module
>> as well as with several benchmarks from the will-it-scale repo.
>> The following locktorture results are from an Oracle X5-4 server
>> (four Intel Xeon E7-8895 v3 @ 2.60GHz sockets with 18 hyperthreaded
>> cores each). Each number represents an average (over 25 runs) of the
>> total number of ops (x10^7) reported at the end of each run. The
>> standard deviation is also reported in (), and in general is about 3%
>> from the average. The 'stock' kernel is v5.12.0,
> I assume x5-4 server has the crossbar topology and its numa diameter is
> 1hop, and all tests were done on this kind of symmetrical topology. Am
> I right?
>
>      ┌─┐                 ┌─┐
>      │ ├─────────────────┤ │
>      └─┤1               1└┬┘
>        │  1           1   │
>        │    1       1     │
>        │      1   1       │
>        │        1         │
>        │      1   1       │
>        │     1      1     │
>        │   1         1    │
>       ┌┼┐1             1  ├─┐
>       │┼┼─────────────────┤ │
>       └─┘                 └─┘
>
>
> what if the hardware is using the ring topology and other topologies with
> 2-hops or even 3-hops such as:
>
>       ┌─┐                 ┌─┐
>       │ ├─────────────────┤ │
>       └─┤                 └┬┘
>         │                  │
>         │                  │
>         │                  │
>         │                  │
>         │                  │
>         │                  │
>         │                  │
>        ┌┤                  ├─┐
>        │┼┬─────────────────┤ │
>        └─┘                 └─┘
>
>
> or:
>
>
>      ┌───┐       ┌───┐      ┌────┐      ┌─────┐
>      │   │       │   │      │    │      │     │
>      │   │       │   │      │    │      │     │
>      ├───┼───────┼───┼──────┼────┼──────┼─────┤
>      │   │       │   │      │    │      │     │
>      └───┘       └───┘      └────┘      └─────┘
>
> do we need to consider the distances of numa nodes in the secondary
> queue? does it still make sense to treat everyone else equal in
> secondary queue?

The purpose of this patch series is to minimize cacheline transfer from 
one numa node to another. Taking the fine grained detail of the numa 
topology into account will complicate the code without much performance 
benefit from my point of view. Let's keep it simple first. We can always 
improve it later on if one can show real benefit of doing so.

Cheers,
Longman


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ