linux-kernel - Re: [x86/mm/tlb] 6035152d8e: will-it-scale.per_thread

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <d4f62008-faa7-2931-5690-f29f9544b81b@intel.com>
Date:   Thu, 17 Mar 2022 17:45:41 -0700
From:   Dave Hansen <dave.hansen@...el.com>
To:     Nadav Amit <namit@...are.com>
Cc:     kernel test robot <oliver.sang@...el.com>,
        Ingo Molnar <mingo@...nel.org>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        LKML <linux-kernel@...r.kernel.org>,
        "lkp@...ts.01.org" <lkp@...ts.01.org>,
        "lkp@...el.com" <lkp@...el.com>,
        "ying.huang@...el.com" <ying.huang@...el.com>,
        "feng.tang@...el.com" <feng.tang@...el.com>,
        "zhengjun.xing@...ux.intel.com" <zhengjun.xing@...ux.intel.com>,
        "fengwei.yin@...el.com" <fengwei.yin@...el.com>,
        Andy Lutomirski <luto@...nel.org>
Subject: Re: [x86/mm/tlb] 6035152d8e: will-it-scale.per_thread_ops -13.2%
 regression

On 3/17/22 17:20, Nadav Amit wrote:
> I don’t have other data right now. Let me run some measurements later
> tonight. I understand your explanation, but I still do not see how
> much “later” can the lazy check be that it really matters. Just
> strange.

These will-it-scale tests are really brutal.  They're usually sitting in
really tight kernel entry/exit loops.  Everything is pounding on kernel
locks and bouncing cachelines around like crazy.  It might only be a few
thousand cycles between two successive kernel entries.

Things like the call_single_queue cacheline have to be dragged from
other CPUs *and* there are locks that you can spin on.  While a thread
is doing all this spinning, it is forcing more and more threads into the
lazy TLB state.  The longer you spin, the more threads have entered the
kernel, contended on the mmap_lock and gone idle.

Is it really surprising that a loop that can take hundreds of locks can
take a long time?

                for_each_cpu(cpu, cfd->cpumask) {
                        csd_lock(csd);
			...
		}