linux-kernel - Re: [PATCH v5 7/7] locking/lockdep: Add a fast path for chain

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <40c89da9-999d-63e6-5a6b-9e7001af678f@redhat.com>
Date:   Tue, 4 Feb 2020 10:07:15 -0500
From:   Waiman Long <longman@...hat.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Ingo Molnar <mingo@...hat.com>, Will Deacon <will.deacon@....com>,
        linux-kernel@...r.kernel.org, Bart Van Assche <bvanassche@....org>
Subject: Re: [PATCH v5 7/7] locking/lockdep: Add a fast path for chain_hlocks
 allocation

On 2/4/20 7:47 AM, Peter Zijlstra wrote:
> On Mon, Feb 03, 2020 at 11:41:47AM -0500, Waiman Long wrote:
>> When alloc_chain_hlocks() is called, the most likely scenario is
>> to allocate from the primordial chain block which holds the whole
>> chain_hlocks[] array initially. It is the primordial chain block if its
>> size is bigger than MAX_LOCK_DEPTH. As long as the number of entries left
>> after splitting is still bigger than MAX_CHAIN_BUCKETS it will remain
>> in bucket 0. By splitting out a sub-block at the end, we only need to
>> adjust the size without changing any of the existing linkage information.
>> This optimized fast path can reduce the latency of allocation requests.
>>
>> This patch does change the order by which chain_hlocks entries are
>> allocated. The original code allocates entries from the beginning of
>> the array. Now it will be allocated from the end of the array backward.
> Cute; but why do we care? Is there any measurable performance indicator?
>
I used parallel kernel compilation test to see if there is a performance
benefit. I did see the compile time get reduced by a few seconds out of
several minutes of total time on average. So it is only about 1% or so.
I didn't mention it as it is within the margin of error.

One of the goals of this patchset is to make sure that little or no
performance regression is introduced. That was why I was hesitant to
adopt the single allocator approach as suggested. That is also why I add
this patch to try to get some performance back.

Cheers,
Longman