lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 14 May 2024 21:11:51 +0000
From: Carlos Llamas <cmllamas@...gle.com>
To: Christophe JAILLET <christophe.jaillet@...adoo.fr>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Arve Hjønnevåg <arve@...roid.com>,
	Todd Kjos <tkjos@...roid.com>, Martijn Coenen <maco@...roid.com>,
	Joel Fernandes <joel@...lfernandes.org>,
	Christian Brauner <brauner@...nel.org>,
	Suren Baghdasaryan <surenb@...gle.com>,
	linux-kernel@...r.kernel.org, kernel-team@...roid.com,
	Tim Murray <timmurray@...gle.com>,
	Alice Ryhl <aliceryhl@...gle.com>, John Stultz <jstultz@...gle.com>,
	Steven Moreland <smoreland@...gle.com>,
	Nick Chen <chenjia3@...o.com>
Subject: Re: [PATCH v2] binder: use bitmap for faster descriptor lookup

On Tue, May 14, 2024 at 10:35:56PM +0200, Christophe JAILLET wrote:
> Le 14/05/2024 à 18:09, Carlos Llamas a écrit :
> > When creating new binder references, the driver assigns a descriptor id
> > that is shared with userspace. Regrettably, the driver needs to keep the
> > descriptors small enough to accommodate userspace potentially using them
> > as Vector indexes. Currently, the driver performs a linear search on the
> > rb-tree of references to find the smallest available descriptor id. This
> > approach, however, scales poorly as the number of references grows.
> > 
> > This patch introduces the usage of bitmaps to boost the performance of
> > descriptor assignments. This optimization results in notable performance
> > gains, particularly in processes with a large number of references. The
> > following benchmark with 100,000 references showcases the difference in
> > latency between the dbitmap implementation and the legacy approach:
> > 
> >    [  587.145098] get_ref_desc_olocked: 15us (dbitmap on)
> >    [  602.788623] get_ref_desc_olocked: 47343us (dbitmap off)
> > 
> > Note the bitmap size is dynamically adjusted in line with the number of
> > references, ensuring efficient memory usage. In cases where growing the
> > bitmap is not possible, the driver falls back to the slow legacy method.
> > 
> > A previous attempt to solve this issue was proposed in [1]. However,
> > such method involved adding new ioctls which isn't great, plus older
> > userspace code would not have benefited from the optimizations either.
> > 
> > Link: https://lore.kernel.org/all/20240417191418.1341988-1-cmllamas@google.com/ [1]
> 
> ...
> 
> > +static int get_ref_desc_olocked(struct binder_proc *proc,
> > +				struct binder_node *node,
> > +				u32 *desc)
> > +{
> > +	struct dbitmap *dmap = &proc->dmap;
> > +	unsigned long *new, bit;
> > +	unsigned int nbits;
> > +
> > +	/* 0 is reserved for the context manager */
> > +	if (node == proc->context->binder_context_mgr_node) {
> > +		*desc = 0;
> > +		return 0;
> > +	}
> > +
> > +	if (unlikely(!dbitmap_enabled(dmap))) {
> > +		*desc = slow_desc_lookup_olocked(proc);
> > +		return 0;
> > +	}
> > +
> > +	if (dbitmap_get_first_zero_bit(dmap, &bit) == 0) {
> > +		*desc = bit;
> > +		return 0;
> > +	}
> > +
> > +	/*
> > +	 * The descriptors bitmap is full and needs to be expanded.
> > +	 * The proc->outer_lock is briefly released to allocate the
> > +	 * new bitmap safely.
> > +	 */
> > +	nbits = dbitmap_expand_nbits(dmap);
> > +	binder_proc_unlock(proc);
> > +	new = bitmap_zalloc(nbits, GFP_KERNEL | __GFP_ZERO);
> 
> Hi,
> 
> Nitpick: No need to __GFP_ZERO when using zalloc().

Oops, you are right. I'll drop this for v2.

> 
> CJ
> 
> > +	binder_proc_lock(proc);
> > +	dbitmap_expand(dmap, new, nbits);
> > +
> > +	return -EAGAIN;
> > +}
> 
> ...
> 
> > +#define NBITS_MIN	BITS_PER_TYPE(unsigned long)
> > +
> > +struct dbitmap {
> > +	unsigned int nbits;
> 
> Should it be long (or unsigned long) to better match the bitmap API?

I picked 'unsigned int' precisely because that is what the bitmap API
uses for the nbits (mostly). Unfortunately, there seems to be some
inconsistencies across bitmap.h and the find.h APIs on the bit type.
Ultimately the decision was made on a type that would work with both.

For extra fun, checkout the types of set_bit() and clear_bit(). Those
are using 'long' for the n bit.

> 
> (not sure if it can overflow in this use case, but at least for consistancy)

Bitmaps are allocated with "unsigned int" via bitmap_zalloc() so it
can't overflow.

> 
> > +	unsigned long *map;
> > +};
> 
> ...
> 
> > +static inline unsigned int dbitmap_shrink_nbits(struct dbitmap *dmap)
> > +{
> > +	unsigned int bit;
> > +
> > +	if (dmap->nbits <= NBITS_MIN)
> > +		return 0;
> > +
> > +	bit = find_last_bit(dmap->map, dmap->nbits);
> 
> find_last_bit() returns an unsigned long.
> 
> (bitmap_get_first_zero_bit() below uses unsigned long)

Yes. However, these are both restricted to the @size parameter.

> 
> CJ
> 
> > +	if (unlikely(bit == dmap->nbits))
> > +		return NBITS_MIN;
> > +
> > +	if (unlikely(bit <= (dmap->nbits >> 2)))
> > +		return dmap->nbits >> 1;
> > +
> > +	return 0;
> > +}
> 
> ...
> 
> > +static inline int
> > +dbitmap_get_first_zero_bit(struct dbitmap *dmap, unsigned long *bit)
> > +{
> > +	unsigned long n;
> > +
> > +	n = find_first_zero_bit(dmap->map, dmap->nbits);
> > +	if (unlikely(n == dmap->nbits))
> > +		return -ENOSPC;
> > +
> > +	*bit = n;
> > +	set_bit(n, dmap->map);
> > +
> > +	return 0;
> > +}
> 
> ...
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ