[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c7c42fef-187b-a218-f4dd-cc21aa733a90@huawei.com>
Date: Wed, 11 Sep 2024 17:38:38 +0800
From: "Leizhen (ThunderTown)" <thunder.leizhen@...wei.com>
To: Thomas Gleixner <tglx@...utronix.de>, Andrew Morton
<akpm@...ux-foundation.org>, <linux-kernel@...r.kernel.org>, David Gow
<davidgow@...gle.com>, <linux-kselftest@...r.kernel.org>,
<kunit-dev@...glegroups.com>
Subject: Re: [PATCH 3/3] debugobjects: Use hlist_cut_number() to optimize
performance and improve readability
On 2024/9/11 16:54, Thomas Gleixner wrote:
> On Wed, Sep 11 2024 at 15:44, Leizhen wrote:
>> On 2024/9/10 19:44, Thomas Gleixner wrote:
>>> That minimizes the pool lock contention and the cache foot print. The
>>> global to free pool must have an extra twist to accomodate non-batch
>>> sized drops and to handle the all slots are full case, but that's just a
>>> trivial detail.
>>
>> That's great. I really admire you for completing the refactor in such a
>> short of time.
>
> The trick is to look at it from the data model and not from the
> code. You need to sit down and think about which data model is required
> to achieve what you want. So the goal was batching, right?
Yes, when I found a hole in the road, I thought about how to fill it. But
you think more deeply, why is there a pit, is there a problem with the
foundation? I've benefited a lot from communicating with you these days.
>
> That made it clear that the global pools need to be stacks of batches
> and never handle single objects because that makes it complex. As a
> consequence the per cpu pool is the one which does single object
> alloc/free and then either gets a full batch from the global pool or
> drops one into it. The rest is just mechanical.
>
>> But I have a few minor comments.
>> 1. When kmem_cache_zalloc() is called to allocate objs for filling,
>> if less than one batch of objs are allocated, all of them can be
>> pushed to the local CPU. That's, call pcpu_free() one by one.
>
> If that's the case then we should actually immediately give them back
> because thats a sign of memory pressure.
Yes, that makes sense, and that's a solution too.
>
>> 2. Member tot_cnt of struct global_pool can be deleted. We can get it
>> simply and quickly through (slot_idx * ODEBUG_BATCH_SIZE). Avoid
>> redundant maintenance.
>
> Agreed.
>
>> 3. debug_objects_pool_min_level also needs to be adjusted accordingly,
>> the number of batches of the min level.
>
> Sure. There are certainly more problems with that code. As I said, it's
> untested and way to big to be reviewed. I'll split it up into more
> manageable bits and pieces.
Looking forward to...
>
> Thanks,
>
> tglx
> .
>
--
Regards,
Zhen Lei
Powered by blists - more mailing lists