linux-kernel - Re: [PATCH 3/3] debugobjects: Use hlist_cut_number() to optimize performance and improve readability

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5333927d-3f21-b7cc-8c57-6e21f1b4a3e5@huawei.com>
Date: Tue, 10 Sep 2024 12:00:57 +0800
From: "Leizhen (ThunderTown)" <thunder.leizhen@...wei.com>
To: Thomas Gleixner <tglx@...utronix.de>, Andrew Morton
	<akpm@...ux-foundation.org>, <linux-kernel@...r.kernel.org>, David Gow
	<davidgow@...gle.com>, <linux-kselftest@...r.kernel.org>,
	<kunit-dev@...glegroups.com>
Subject: Re: [PATCH 3/3] debugobjects: Use hlist_cut_number() to optimize
 performance and improve readability



On 2024/9/10 2:41, Thomas Gleixner wrote:
> On Wed, Sep 04 2024 at 21:41, Zhen Lei wrote:
> 
>> Currently, there are multiple instances where several nodes are extracted
>> from one list and added to another list. One by one extraction, and then
>> one by one splicing, not only low efficiency, readability is also poor.
>> The work can be done well with hlist_cut_number() and hlist_splice_init(),
>> which move the entire sublist at once.
>>
>> When the number of nodes expected to be moved is less than or equal to 0,
>> or the source list is empty, hlist_cut_number() safely returns 0. The
>> splicing is performed only when the return value of hlist_cut_number() is
>> greater than 0.
>>
>> For two calls to hlist_cut_number() in __free_object(), the result is
>> obviously positive, the check of the return value is omitted.
> 
> Sure but hlist_cut_number() suffers from the same problem as the current
> code. If is a massive cache line chase as you actually have to walk the
> list to figure out where to cut it off.
> 
> All related functions have this problem and all of this code is very
> strict about boundaries. Instead of accurately doing the refill, purge
> etc. we should look into proper batch mode mechanisms. Let me think
> about it.

It may be helpful to add several arrays to record the first node of each batch
in each free list. Take 'percpu_pool' as an example:

struct debug_percpu_free {
	struct hlist_head	free_objs;
	int			obj_free;
+	int			batch_idx;
+	struct hlist_node	*batch_first[4]; // ODEBUG_POOL_PERCPU_SIZE / ODEBUG_BATCH_SIZE
};

A new free node is added to the header of the list, and the batch is cut from the tail
of the list.
  NodeA<-->...<-->NodeB<-->...<-->NodeC<-->NodeD<--> free_objs
    |---one batch---|---one batch---|
                    |               |
        batch_first[0]  batch_first[1]

__free_object():
	//add obj into percpu_pool
	obj_free++;
	if (obj_free % ODEBUG_BATCH_SIZE == 0) {
		idx = 0x3 & (batch_idx + (obj_free / ODEBUG_BATCH_SIZE) - 1);
		//update batch_first[idx]
	}

	if (obj_free >= ODEBUG_POOL_PERCPU_SIZE) {
		//move one batch
		//cut batch at 'batch_idx' into obj_to_free (or obj_pool, if less than debug_objects_pool_min_level)
		batch_idx++;
		obj_free -= ODEBUG_BATCH_SIZE
	}

> 
> Thanks,
> 
>         tglx
> 
> 
> .
> 

-- 
Regards,
  Zhen Lei