linux-kernel - Re: [PATCH 3/3] debugobjects: Use hlist_cut_number() to optimize performance and improve readability

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <c7c42fef-187b-a218-f4dd-cc21aa733a90@huawei.com>
Date: Wed, 11 Sep 2024 17:38:38 +0800
From: "Leizhen (ThunderTown)" <thunder.leizhen@...wei.com>
To: Thomas Gleixner <tglx@...utronix.de>, Andrew Morton
	<akpm@...ux-foundation.org>, <linux-kernel@...r.kernel.org>, David Gow
	<davidgow@...gle.com>, <linux-kselftest@...r.kernel.org>,
	<kunit-dev@...glegroups.com>
Subject: Re: [PATCH 3/3] debugobjects: Use hlist_cut_number() to optimize
 performance and improve readability



On 2024/9/11 16:54, Thomas Gleixner wrote:
> On Wed, Sep 11 2024 at 15:44, Leizhen wrote:
>> On 2024/9/10 19:44, Thomas Gleixner wrote:
>>> That minimizes the pool lock contention and the cache foot print. The
>>> global to free pool must have an extra twist to accomodate non-batch
>>> sized drops and to handle the all slots are full case, but that's just a
>>> trivial detail.
>>
>> That's great. I really admire you for completing the refactor in such a
>> short of time.
> 
> The trick is to look at it from the data model and not from the
> code. You need to sit down and think about which data model is required
> to achieve what you want. So the goal was batching, right?

Yes, when I found a hole in the road, I thought about how to fill it. But
you think more deeply, why is there a pit, is there a problem with the
foundation? I've benefited a lot from communicating with you these days.

> 
> That made it clear that the global pools need to be stacks of batches
> and never handle single objects because that makes it complex. As a
> consequence the per cpu pool is the one which does single object
> alloc/free and then either gets a full batch from the global pool or
> drops one into it. The rest is just mechanical.
> 
>> But I have a few minor comments.
>> 1. When kmem_cache_zalloc() is called to allocate objs for filling,
>>    if less than one batch of objs are allocated, all of them can be
>>    pushed to the local CPU. That's, call pcpu_free() one by one.
> 
> If that's the case then we should actually immediately give them back
> because thats a sign of memory pressure.

Yes, that makes sense, and that's a solution too.

> 
>> 2. Member tot_cnt of struct global_pool can be deleted. We can get it
>>    simply and quickly through (slot_idx * ODEBUG_BATCH_SIZE). Avoid
>>    redundant maintenance.
> 
> Agreed.
> 
>> 3. debug_objects_pool_min_level also needs to be adjusted accordingly,
>>    the number of batches of the min level.
> 
> Sure. There are certainly more problems with that code. As I said, it's
> untested and way to big to be reviewed. I'll split it up into more
> manageable bits and pieces.

Looking forward to...

> 
> Thanks,
> 
>         tglx
> .
> 

-- 
Regards,
  Zhen Lei