lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <40c89da9-999d-63e6-5a6b-9e7001af678f@redhat.com>
Date:   Tue, 4 Feb 2020 10:07:15 -0500
From:   Waiman Long <longman@...hat.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Ingo Molnar <mingo@...hat.com>, Will Deacon <will.deacon@....com>,
        linux-kernel@...r.kernel.org, Bart Van Assche <bvanassche@....org>
Subject: Re: [PATCH v5 7/7] locking/lockdep: Add a fast path for chain_hlocks
 allocation

On 2/4/20 7:47 AM, Peter Zijlstra wrote:
> On Mon, Feb 03, 2020 at 11:41:47AM -0500, Waiman Long wrote:
>> When alloc_chain_hlocks() is called, the most likely scenario is
>> to allocate from the primordial chain block which holds the whole
>> chain_hlocks[] array initially. It is the primordial chain block if its
>> size is bigger than MAX_LOCK_DEPTH. As long as the number of entries left
>> after splitting is still bigger than MAX_CHAIN_BUCKETS it will remain
>> in bucket 0. By splitting out a sub-block at the end, we only need to
>> adjust the size without changing any of the existing linkage information.
>> This optimized fast path can reduce the latency of allocation requests.
>>
>> This patch does change the order by which chain_hlocks entries are
>> allocated. The original code allocates entries from the beginning of
>> the array. Now it will be allocated from the end of the array backward.
> Cute; but why do we care? Is there any measurable performance indicator?
>
I used parallel kernel compilation test to see if there is a performance
benefit. I did see the compile time get reduced by a few seconds out of
several minutes of total time on average. So it is only about 1% or so.
I didn't mention it as it is within the margin of error.

One of the goals of this patchset is to make sure that little or no
performance regression is introduced. That was why I was hesitant to
adopt the single allocator approach as suggested. That is also why I add
this patch to try to get some performance back.

Cheers,
Longman

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ