[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <15d382af-512f-cb06-d8c2-cd9e16d870f6@bytedance.com>
Date: Fri, 4 Nov 2022 00:45:19 +0800
From: wuqiang <wuqiang.matt@...edance.com>
To: "Masami Hiramatsu (Google)" <mhiramat@...nel.org>
Cc: davem@...emloft.net, anil.s.keshavamurthy@...el.com,
naveen.n.rao@...ux.ibm.com, rostedt@...dmis.org,
peterz@...radead.org, akpm@...ux-foundation.org,
sander@...nheule.net, ebiggers@...gle.com,
dan.j.williams@...el.com, jpoimboe@...nel.org,
linux-kernel@...r.kernel.org, mattwu@....com,
kernel test robot <lkp@...el.com>
Subject: Re: [PATCH v4] kprobes,lib: kretprobe scalability improvement
On 2022/11/3 10:51, Masami Hiramatsu (Google) wrote:
> Hi wuqiang (or Matt?),
>
> Thanks for updating the patch! I have some comments below,
Thanks for your time :)
> On Wed, 2 Nov 2022 10:30:12 +0800
> wuqiang <wuqiang.matt@...edance.com> wrote:
>
...
>> Changes since V3:
>> 1) build warning: unused variable in fprobe_init_rethook
>> Reported-by: kernel test robot <lkp@...el.com>
>>
>> Changes since V2:
>> 1) the percpu-extended version of the freelist replaced by new percpu-
>> ring-array. freelist has data-contention in freelist_node (refs and
>> next) even after node is removed from freelist and the node could
>> be polluted easily (with freelist_node defined in union)
>> 2) routines split to objpool.h and objpool.c according to cold & hot
>> pathes, and the latter moved to lib, as suggested by Masami
>> 3) test module (test_objpool.ko) added to lib for functional testings
>>
>> Changes since V1:
>> 1) reformat to a single patch as Masami Hiramatsu suggested
>> 2) use __vmalloc_node to replace vmalloc_node for vmalloc
>> 3) a few minor fixes: typo and coding-style issues
>
> Recording change log is very good. But if it becomes too long,
> you can put a URL of the previous series on LKML instead of
> appending the change logs.
> You can get the URL (permlink) by "lkml.kernel.org/r/<your-message-id>"
Got it.
>>
>> Signed-off-by: wuqiang <wuqiang.matt@...edance.com>
>> ---
>> include/linux/freelist.h | 129 -----
>> include/linux/kprobes.h | 9 +-
>> include/linux/objpool.h | 151 ++++++
>> include/linux/rethook.h | 15 +-
>> kernel/kprobes.c | 95 ++--
>> kernel/trace/fprobe.c | 17 +-
>> kernel/trace/rethook.c | 80 +--
>> lib/Kconfig.debug | 11 +
>> lib/Makefile | 4 +-
>> lib/objpool.c | 480 ++++++++++++++++++
>> lib/test_objpool.c | 1031 ++++++++++++++++++++++++++++++++++++++
>> 11 files changed, 1772 insertions(+), 250 deletions(-)
>
> Hmm, this does too much things in 1 patch.
> Can you split this in below 5 patches? This also makes clear that who
> needs to review which part.
I was ever considering of splitting, but finally decided to mix them in a big
one mostly because it's only for kretprobe improvement.
Next version I'll split to smaller patches. Thank you for the advice.
>
> - Add generic scalable objpool
> - Add objpool test
> - Use objpool in kretprobe
> - Use objpool in fprobe and rethook
> - Remove unused freelist
>
>> delete mode 100644 include/linux/freelist.h
>> create mode 100644 include/linux/objpool.h
>> create mode 100644 lib/objpool.c
>> create mode 100644 lib/test_objpool.c
>>
> [...]
>> +
>> +struct objpool_slot {
>> + uint32_t os_head; /* head of ring array */
>
> If all fields have "os_" prefix, it is meaningless.
>
>> + uint32_t os_tail; /* tail of ring array */
>> + uint32_t os_size; /* max item slots, pow of 2 */
>> + uint32_t os_mask; /* os_size - 1 */
>> +/*
>> + * uint32_t os_ages[]; // ring epoch id
>> + * void *os_ents[]; // objects array
>
> "entries[]"
I'll refine the comments here to better explain the memory layout.
>
>> + */
>> +};
>> +
>> +/* caller-specified object initial callback to setup each object, only called once */
>> +typedef int (*objpool_init_node_cb)(void *context, void *obj);
>> +
>> +/* caller-specified cleanup callback for private objects/pool/context */
>> +typedef int (*objpool_release_cb)(void *context, void *ptr, uint32_t flags);
>> +
>> +/* called for object releasing: ptr points to an object */
>> +#define OBJPOOL_FLAG_NODE (0x00000001)
>> +/* for user pool and context releasing, ptr could be NULL */
>> +#define OBJPOOL_FLAG_POOL (0x00001000)
>> +/* the object or pool to be released is user-managed */
>> +#define OBJPOOL_FLAG_USER (0x00008000)
>> +
>> +/*
>> + * objpool_head: object pooling metadata
>> + */
>> +
>> +struct objpool_head {
>> + uint32_t oh_objsz; /* object & element size */
>> + uint32_t oh_nobjs; /* total objs (pre-allocated) */
>> + uint32_t oh_nents; /* max objects per cpuslot */
>> + uint32_t oh_ncpus; /* num of possible cpus */
>
> If all fields have "oh_" prefix, it is meaningless.
> Also, if there is no reason to be 32bit (like align the structure size
> for cache, or pack the structure for streaming etc.) use appropriate types.
>
> And please do not align the length of field name unnaturally. e.g.
Kind of obsessive-compulsive symptom, haha :) The struct size of objpool_head
doesn't matter. The size of objpool_slot does matter, as managed in a single
cache-line.
>
> size_t obj_size; /* size_t or unsigned int, I don't care. */
> int nr_objs; /* I think just 'int' is enough because the value should be
> checked and limited under INT_MAX */
> int max_entries;
> unsigned int nr_cpus;
>
> (BTW why we need to limit the nr_cpus here? we have num_possible_cpus())
You are right that nr_cpus is unnecessary. num_possible_cpus() just keeps the
same even with new hot-plugged cpus.
>
>> + uint32_t oh_in_user:1; /* user-specified buffer */
>> + uint32_t oh_in_slot:1; /* objs alloced with slots */
>> + uint32_t oh_vmalloc:1; /* alloc from vmalloc zone */
>
> Please use "bool" or "unsigned long flags" with flag bits.
>
>> + gfp_t oh_gfp; /* k/vmalloc gfp flags */
>> + uint32_t oh_sz_pool; /* user pool size in byes */
>
> size_t pool_size
>
>> + void *oh_pool; /* user managed memory pool */
>> + struct objpool_slot **oh_slots; /* array of percpu slots */
>> + uint32_t *oh_sz_slots; /* size in bytes of slots */
>
> size_t slot_size
>
Will apply these changes in next version.
>> + objpool_release_cb oh_release; /* resource cleanup callback */
>> + void *oh_context; /* caller-provided context */
>> +};
>
> Thank you,
>
Best regards,
Powered by blists - more mailing lists