[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e9c3a7c20908181102q6db970bav737cf26939f84c4c@mail.gmail.com>
Date: Tue, 18 Aug 2009 11:02:26 -0700
From: Dan Williams <dan.j.williams@...el.com>
To: Dave Hansen <dave@...ux.vnet.ibm.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
linux-kernel@...r.kernel.org
Subject: Re: [RFC][PATCH] flex_array: conditionally optimize out divides
On Mon, Aug 17, 2009 at 1:43 PM, Dave Hansen<dave@...ux.vnet.ibm.com> wrote:
>
> There are three flex_array operations that require divides:
> 1. figuring out into which "part" we should access
> 2. figuring out where into that part we fit
> 3. figuring out in how many elements fit into a part
>
> Division can get expensive, and we may incur one or two
> divides for each put() or get() that is performed. If we
> rounded the elements to a power-of-two and stored shifts
> and masks, we could rid ourselves of the divides, but we
> would lose storage space with oddly-sized objects. We
> could code the implementation to handle divides and special-
> case the shifts when they can be used, but that would
> complicate the code.
>
> This is an alternative. We introduce variants of
> flex_array_get() and flex_array_put() since they are the
> most common operations. We append an _es() to their names
> (for Element Size) and get flex_array_get_es() and
> flex_array_put_es(). The allocation and free functions
> remain unoptimized since they're not indended to be hot
> paths.
>
> Passing the element size into each operation, and using it
> like this:
>
> flex_array_get(fa, nr, sizeof(struct my_struct));
>
> lets the compiler turn the divides into shifts if 'my_struct'
> is a power-of-two in size.
>
> It seems that only gcc 4.1 and up are smart enough to figure
> this out, though.
>
> ---
Hi Dave,
Thanks for this. I'll give it a shot hopefully in the next few days.
One comment below...
> +/*
> + * Use the _es() variants when you want the compiler to
> + * be able to optimize the divides like when you have a
> + * power-of-two element_size.
> + */
> +static inline void *flex_array_get_es(struct flex_array *fa,
> + int element_nr, int element_size)
> +{
> + int part_nr = __fa_element_to_part_nr(element_size, element_nr);
> + int index_inside = __fa_index_inside_part(element_size, element_nr);
> +
> + if (element_nr >= fa->total_nr_elements)
> + return NULL;
This if()...
> +
> + return flex_array_get_precalc(fa, part_nr, index_inside);
> +}
> +
> +static inline int flex_array_put_es(struct flex_array *fa, int element_nr,
> + int element_size, void *src, gfp_t flags)
> +{
> + int part_nr = __fa_element_to_part_nr(element_size, element_nr);
> + int index_inside = __fa_index_inside_part(element_size, element_nr);
> +
> + if (element_nr >= fa->total_nr_elements)
> + return -ENOSPC;
...and this one look like good candidates for unlikely() as these
additional branches may be a concern for the fast path.
Thanks,
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists