lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <15d382af-512f-cb06-d8c2-cd9e16d870f6@bytedance.com>
Date:   Fri, 4 Nov 2022 00:45:19 +0800
From:   wuqiang <wuqiang.matt@...edance.com>
To:     "Masami Hiramatsu (Google)" <mhiramat@...nel.org>
Cc:     davem@...emloft.net, anil.s.keshavamurthy@...el.com,
        naveen.n.rao@...ux.ibm.com, rostedt@...dmis.org,
        peterz@...radead.org, akpm@...ux-foundation.org,
        sander@...nheule.net, ebiggers@...gle.com,
        dan.j.williams@...el.com, jpoimboe@...nel.org,
        linux-kernel@...r.kernel.org, mattwu@....com,
        kernel test robot <lkp@...el.com>
Subject: Re: [PATCH v4] kprobes,lib: kretprobe scalability improvement

On 2022/11/3 10:51, Masami Hiramatsu (Google) wrote:
> Hi wuqiang (or Matt?),
> 
> Thanks for updating the patch! I have some comments below,

Thanks for your time :)

> On Wed,  2 Nov 2022 10:30:12 +0800
> wuqiang <wuqiang.matt@...edance.com> wrote:
> 
...
>> Changes since V3:
>> 1) build warning: unused variable in fprobe_init_rethook
>> Reported-by: kernel test robot <lkp@...el.com>
>>
>> Changes since V2:
>> 1) the percpu-extended version of the freelist replaced by new percpu-
>>     ring-array. freelist has data-contention in freelist_node (refs and
>>     next) even after node is removed from freelist and the node could
>>     be polluted easily (with freelist_node defined in union)
>> 2) routines split to objpool.h and objpool.c according to cold & hot
>>     pathes, and the latter moved to lib, as suggested by Masami
>> 3) test module (test_objpool.ko) added to lib for functional testings
>>
>> Changes since V1:
>> 1) reformat to a single patch as Masami Hiramatsu suggested
>> 2) use __vmalloc_node to replace vmalloc_node for vmalloc
>> 3) a few minor fixes: typo and coding-style issues
> 
> Recording change log is very good. But if it becomes too long,
> you can put a URL of the previous series on LKML instead of
> appending the change logs.
> You can get the URL (permlink) by "lkml.kernel.org/r/<your-message-id>"

Got it.

>>
>> Signed-off-by: wuqiang <wuqiang.matt@...edance.com>
>> ---
>>   include/linux/freelist.h |  129 -----
>>   include/linux/kprobes.h  |    9 +-
>>   include/linux/objpool.h  |  151 ++++++
>>   include/linux/rethook.h  |   15 +-
>>   kernel/kprobes.c         |   95 ++--
>>   kernel/trace/fprobe.c    |   17 +-
>>   kernel/trace/rethook.c   |   80 +--
>>   lib/Kconfig.debug        |   11 +
>>   lib/Makefile             |    4 +-
>>   lib/objpool.c            |  480 ++++++++++++++++++
>>   lib/test_objpool.c       | 1031 ++++++++++++++++++++++++++++++++++++++
>>   11 files changed, 1772 insertions(+), 250 deletions(-)
> 
> Hmm, this does too much things in 1 patch.
> Can you split this in below 5 patches? This also makes clear that who
> needs to review which part.

I was ever considering of splitting, but finally decided to mix them in a big 
one mostly because it's only for kretprobe improvement.

Next version I'll split to smaller patches. Thank you for the advice.

> 
> - Add generic scalable objpool
> - Add objpool test
> - Use objpool in kretprobe
> - Use objpool in fprobe and rethook
> - Remove unused freelist
> 
>>   delete mode 100644 include/linux/freelist.h
>>   create mode 100644 include/linux/objpool.h
>>   create mode 100644 lib/objpool.c
>>   create mode 100644 lib/test_objpool.c
>>
> [...]
>> +
>> +struct objpool_slot {
>> +	uint32_t                os_head;	/* head of ring array */
> 
> If all fields have "os_" prefix, it is meaningless.
> 
>> +	uint32_t                os_tail;	/* tail of ring array */
>> +	uint32_t                os_size;	/* max item slots, pow of 2 */
>> +	uint32_t                os_mask;	/* os_size - 1 */
>> +/*
>> + *	uint32_t                os_ages[];	// ring epoch id
>> + *	void                   *os_ents[];	// objects array
> 
> "entries[]"

I'll refine the comments here to better explain the memory layout.

> 
>> + */
>> +};
>> +
>> +/* caller-specified object initial callback to setup each object, only called once */
>> +typedef int (*objpool_init_node_cb)(void *context, void *obj);
>> +
>> +/* caller-specified cleanup callback for private objects/pool/context */
>> +typedef int (*objpool_release_cb)(void *context, void *ptr, uint32_t flags);
>> +
>> +/* called for object releasing: ptr points to an object */
>> +#define OBJPOOL_FLAG_NODE        (0x00000001)
>> +/* for user pool and context releasing, ptr could be NULL */
>> +#define OBJPOOL_FLAG_POOL        (0x00001000)
>> +/* the object or pool to be released is user-managed */
>> +#define OBJPOOL_FLAG_USER        (0x00008000)
>> +
>> +/*
>> + * objpool_head: object pooling metadata
>> + */
>> +
>> +struct objpool_head {
>> +	uint32_t                oh_objsz;	/* object & element size */
>> +	uint32_t                oh_nobjs;	/* total objs (pre-allocated) */
>> +	uint32_t                oh_nents;	/* max objects per cpuslot */
>> +	uint32_t                oh_ncpus;	/* num of possible cpus */
> 
> If all fields have "oh_" prefix, it is meaningless.
> Also, if there is no reason to be 32bit (like align the structure size
> for cache, or pack the structure for streaming etc.) use appropriate types.
> 
> And please do not align the length of field name unnaturally. e.g.

Kind of obsessive-compulsive symptom, haha :) The struct size of objpool_head 
doesn't matter. The size of objpool_slot does matter, as managed in a single 
cache-line.

> 
> size_t obj_size; /* size_t or unsigned int, I don't care. */
> int nr_objs; /* I think just 'int' is enough because the value should be
>                 checked and limited under INT_MAX */
> int max_entries;
> unsigned int nr_cpus;
> 
> (BTW why we need to limit the nr_cpus here? we have num_possible_cpus())

You are right that nr_cpus is unnecessary. num_possible_cpus() just keeps the 
same even with new hot-plugged cpus.

> 
>> +	uint32_t                oh_in_user:1;	/* user-specified buffer */
>> +	uint32_t                oh_in_slot:1;	/* objs alloced with slots */
>> +	uint32_t                oh_vmalloc:1;	/* alloc from vmalloc zone */
> 
> Please use "bool" or "unsigned long flags" with flag bits.
> 
>> +	gfp_t                   oh_gfp;		/* k/vmalloc gfp flags */
>> +	uint32_t                oh_sz_pool;	/* user pool size in byes */
> 
> size_t pool_size
> 
>> +	void                   *oh_pool;	/* user managed memory pool */
>> +	struct objpool_slot   **oh_slots;	/* array of percpu slots */
>> +	uint32_t               *oh_sz_slots;	/* size in bytes of slots */
> 
> size_t slot_size
> 

Will apply these changes in next version.

>> +	objpool_release_cb      oh_release;	/* resource cleanup callback */
>> +	void                   *oh_context;	/* caller-provided context */
>> +};
> 
> Thank you,
> 

Best regards,

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ