lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20221104102810.b2c10d3334483eb0b3a2ef42@kernel.org>
Date:   Fri, 4 Nov 2022 10:28:10 +0900
From:   Masami Hiramatsu (Google) <mhiramat@...nel.org>
To:     wuqiang <wuqiang.matt@...edance.com>
Cc:     davem@...emloft.net, anil.s.keshavamurthy@...el.com,
        naveen.n.rao@...ux.ibm.com, rostedt@...dmis.org,
        peterz@...radead.org, akpm@...ux-foundation.org,
        sander@...nheule.net, ebiggers@...gle.com,
        dan.j.williams@...el.com, jpoimboe@...nel.org,
        linux-kernel@...r.kernel.org, mattwu@....com,
        kernel test robot <lkp@...el.com>
Subject: Re: [PATCH v4] kprobes,lib: kretprobe scalability improvement

On Fri, 4 Nov 2022 00:45:19 +0800
wuqiang <wuqiang.matt@...edance.com> wrote:

> On 2022/11/3 10:51, Masami Hiramatsu (Google) wrote:
> > Hi wuqiang (or Matt?),
> > 
> > Thanks for updating the patch! I have some comments below,
> 
> Thanks for your time :)
> 
> > On Wed,  2 Nov 2022 10:30:12 +0800
> > wuqiang <wuqiang.matt@...edance.com> wrote:
> > 
> ...
> >> Changes since V3:
> >> 1) build warning: unused variable in fprobe_init_rethook
> >> Reported-by: kernel test robot <lkp@...el.com>
> >>
> >> Changes since V2:
> >> 1) the percpu-extended version of the freelist replaced by new percpu-
> >>     ring-array. freelist has data-contention in freelist_node (refs and
> >>     next) even after node is removed from freelist and the node could
> >>     be polluted easily (with freelist_node defined in union)
> >> 2) routines split to objpool.h and objpool.c according to cold & hot
> >>     pathes, and the latter moved to lib, as suggested by Masami
> >> 3) test module (test_objpool.ko) added to lib for functional testings
> >>
> >> Changes since V1:
> >> 1) reformat to a single patch as Masami Hiramatsu suggested
> >> 2) use __vmalloc_node to replace vmalloc_node for vmalloc
> >> 3) a few minor fixes: typo and coding-style issues
> > 
> > Recording change log is very good. But if it becomes too long,
> > you can put a URL of the previous series on LKML instead of
> > appending the change logs.
> > You can get the URL (permlink) by "lkml.kernel.org/r/<your-message-id>"
> 
> Got it.
> 
> >>
> >> Signed-off-by: wuqiang <wuqiang.matt@...edance.com>
> >> ---
> >>   include/linux/freelist.h |  129 -----
> >>   include/linux/kprobes.h  |    9 +-
> >>   include/linux/objpool.h  |  151 ++++++
> >>   include/linux/rethook.h  |   15 +-
> >>   kernel/kprobes.c         |   95 ++--
> >>   kernel/trace/fprobe.c    |   17 +-
> >>   kernel/trace/rethook.c   |   80 +--
> >>   lib/Kconfig.debug        |   11 +
> >>   lib/Makefile             |    4 +-
> >>   lib/objpool.c            |  480 ++++++++++++++++++
> >>   lib/test_objpool.c       | 1031 ++++++++++++++++++++++++++++++++++++++
> >>   11 files changed, 1772 insertions(+), 250 deletions(-)
> > 
> > Hmm, this does too much things in 1 patch.
> > Can you split this in below 5 patches? This also makes clear that who
> > needs to review which part.
> 
> I was ever considering of splitting, but finally decided to mix them in a big 
> one mostly because it's only for kretprobe improvement.
> 
> Next version I'll split to smaller patches. Thank you for the advice.
> 
> > 
> > - Add generic scalable objpool
> > - Add objpool test
> > - Use objpool in kretprobe
> > - Use objpool in fprobe and rethook
> > - Remove unused freelist
> > 
> >>   delete mode 100644 include/linux/freelist.h
> >>   create mode 100644 include/linux/objpool.h
> >>   create mode 100644 lib/objpool.c
> >>   create mode 100644 lib/test_objpool.c
> >>
> > [...]
> >> +
> >> +struct objpool_slot {
> >> +	uint32_t                os_head;	/* head of ring array */
> > 
> > If all fields have "os_" prefix, it is meaningless.
> > 
> >> +	uint32_t                os_tail;	/* tail of ring array */
> >> +	uint32_t                os_size;	/* max item slots, pow of 2 */
> >> +	uint32_t                os_mask;	/* os_size - 1 */
> >> +/*
> >> + *	uint32_t                os_ages[];	// ring epoch id
> >> + *	void                   *os_ents[];	// objects array
> > 
> > "entries[]"
> 
> I'll refine the comments here to better explain the memory layout.
> 
> > 
> >> + */
> >> +};
> >> +
> >> +/* caller-specified object initial callback to setup each object, only called once */
> >> +typedef int (*objpool_init_node_cb)(void *context, void *obj);
> >> +
> >> +/* caller-specified cleanup callback for private objects/pool/context */
> >> +typedef int (*objpool_release_cb)(void *context, void *ptr, uint32_t flags);
> >> +
> >> +/* called for object releasing: ptr points to an object */
> >> +#define OBJPOOL_FLAG_NODE        (0x00000001)
> >> +/* for user pool and context releasing, ptr could be NULL */
> >> +#define OBJPOOL_FLAG_POOL        (0x00001000)
> >> +/* the object or pool to be released is user-managed */
> >> +#define OBJPOOL_FLAG_USER        (0x00008000)
> >> +
> >> +/*
> >> + * objpool_head: object pooling metadata
> >> + */
> >> +
> >> +struct objpool_head {
> >> +	uint32_t                oh_objsz;	/* object & element size */
> >> +	uint32_t                oh_nobjs;	/* total objs (pre-allocated) */
> >> +	uint32_t                oh_nents;	/* max objects per cpuslot */
> >> +	uint32_t                oh_ncpus;	/* num of possible cpus */
> > 
> > If all fields have "oh_" prefix, it is meaningless.
> > Also, if there is no reason to be 32bit (like align the structure size
> > for cache, or pack the structure for streaming etc.) use appropriate types.
> > 
> > And please do not align the length of field name unnaturally. e.g.
> 
> Kind of obsessive-compulsive symptom, haha :) The struct size of objpool_head 
> doesn't matter. The size of objpool_slot does matter, as managed in a single 
> cache-line.

Yeah, so objpool_slot is good to use uint32_t. You may also need __packed and
__cacheline_aligned for objpool_slot ;)

Thank you!

> 
> > 
> > size_t obj_size; /* size_t or unsigned int, I don't care. */
> > int nr_objs; /* I think just 'int' is enough because the value should be
> >                 checked and limited under INT_MAX */
> > int max_entries;
> > unsigned int nr_cpus;
> > 
> > (BTW why we need to limit the nr_cpus here? we have num_possible_cpus())
> 
> You are right that nr_cpus is unnecessary. num_possible_cpus() just keeps the 
> same even with new hot-plugged cpus.
> 
> > 
> >> +	uint32_t                oh_in_user:1;	/* user-specified buffer */
> >> +	uint32_t                oh_in_slot:1;	/* objs alloced with slots */
> >> +	uint32_t                oh_vmalloc:1;	/* alloc from vmalloc zone */
> > 
> > Please use "bool" or "unsigned long flags" with flag bits.
> > 
> >> +	gfp_t                   oh_gfp;		/* k/vmalloc gfp flags */
> >> +	uint32_t                oh_sz_pool;	/* user pool size in byes */
> > 
> > size_t pool_size
> > 
> >> +	void                   *oh_pool;	/* user managed memory pool */
> >> +	struct objpool_slot   **oh_slots;	/* array of percpu slots */
> >> +	uint32_t               *oh_sz_slots;	/* size in bytes of slots */
> > 
> > size_t slot_size
> > 
> 
> Will apply these changes in next version.
> 
> >> +	objpool_release_cb      oh_release;	/* resource cleanup callback */
> >> +	void                   *oh_context;	/* caller-provided context */
> >> +};
> > 
> > Thank you,
> > 
> 
> Best regards,
> 


-- 
Masami Hiramatsu (Google) <mhiramat@...nel.org>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ