[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1fb679ad-04b0-c1f6-1675-1921539382bd@bytedance.com>
Date: Fri, 13 Oct 2023 01:36:05 +0800
From: "wuqiang.matt" <wuqiang.matt@...edance.com>
To: "Masami Hiramatsu (Google)" <mhiramat@...nel.org>
Cc: linux-trace-kernel@...r.kernel.org, davem@...emloft.net,
anil.s.keshavamurthy@...el.com, naveen.n.rao@...ux.ibm.com,
rostedt@...dmis.org, peterz@...radead.org,
akpm@...ux-foundation.org, sander@...nheule.net,
ebiggers@...gle.com, dan.j.williams@...el.com, jpoimboe@...nel.org,
linux-kernel@...r.kernel.org, lkp@...el.com, mattwu@....com
Subject: Re: [PATCH v9 1/5] lib: objpool added: ring-array based lockless MPMC
On 2023/10/12 22:02, Masami Hiramatsu (Google) wrote:
> Hi Wuqiang,
>
> On Mon, 9 Oct 2023 17:23:34 +0800
> wuqiang <wuqiang.matt@...edance.com> wrote:
>
>> Hello Masami,
>>
>> Just got time for the new patch and got that ages[] was removed. ages[] is
>> introduced the way like 2-phase commit to keep consitency and must be kept.
>>
>> Thinking of the following 2 cases that two cpu nodes are operating the same
>> objpool_slot simultaneously:
>>
>> Case 1:
>>
>> NODE 1: NODE 2:
>> push to an empty slot pop will get wrong value
>>
>> try_cmpxchg_acquire(&slot->tail, &tail, next)
>> try_cmpxchg_release(&slot->head, &head, head + 1)
>> return slot->entries[head & slot->mask]
>> WRITE_ONCE(slot->entries[tail & slot->mask], obj)
>
> Today, I rethink the algorithm.
>
> For this case, we can just add a `commit` to the slot for committing the tail
> commit instead of the ages[].
>
> CPU1 CPU2
> push to an empty slot pop from the same slot
>
> commit = tail = slot->tail;
> next = tail + 1;
> try_cmpxchg_acquire(&slot->tail,
> &tail, next);
> while (head != commit) -> NG1
> WRITE_ONCE(slot->entries[tail & slot->mask],
> obj)
> while (head != commit) -> NG2
> WRITE_ONCE(&slot->commit, next);
> while (head != commit) -> OK
> try_cmpxchg_acquire(&slot->head,
> &head, next);
> return slot->entries[head & slot->mask]
>
> So the NG1 and NG2 timing, the pop will fail.
>
> This does not expect the nested "push" case because the reserve-commit block
> will be interrupt disabled. This doesn't support NMI but that is OK.
If 2 push are performed in a row, slot->commit will be 'tail + 2', CPU2 won't
meet the condition "head == commit".
Since slot->commit' is always synced to slot->tail after a successful push,
should pop check "tail != commit" ? In this case when a push is ongoing with
the slot, pop() has to wait for its completion even if there are objects in
the same slot.
> If we need to support such NMI or remove irq-disable, we can do following
> (please remember the push operation only happens on the slot owned by that
> CPU. So this is per-cpu process)
>
> ---
> do {
> commit = tail = slot->tail;
> next = tail + 1;
> } while (!try_cmpxchg_acquire(&slot->tail, &tail, next));
>
> WRITE_ONCE(slot->entries[tail & slot->mask], obj);
>
> // At this point, "commit == slot->tail - 1" in this nest level.
> // Only outmost (non-nexted) case, "commit == slot->commit"
> if (commit != slot->commit)
> return; // do nothing. it must be updated by outmost one.
>
> // catch up commit to tail.
> do {
> commit = READ_ONCE(slot->tail);
> WRITE_ONCE(slot->commit, commit);
> // note that someone can interrupt this loop too.
> } while (commit != READ_ONCE(slot->tail));
> ---
Yes, I got you: push can only happen on local node and the first try has the
right to extend 'commit'. It does resolve nested push attempts, and preemption
must be disabled.
The overflow/ABA issue can still happen if the cpu is being interrupted for
long time.
> For the rethook this may be too much.
>
> Thank you,
>
>>
>>
>> Case 2:
>>
>> NODE 1: NODE 2
>> push to slot w/ 1 obj pop will get wrong value
>>
>> try_cmpxchg_release(&slot->head, &head, head + 1)
>> try_cmpxchg_acquire(&slot->tail, &tail, next)
>> WRITE_ONCE(slot->entries[tail & slot->mask], obj)
>> return slot->entries[head & slot->mask]
The pre-condition should be: CPU 1 tries to push to a full slot, in this case
tail = head + capacity but tail & mask == head & mask
>>
>> Regards,
>> wuqiang
>>
>> On 2023/9/25 17:42, Masami Hiramatsu (Google) wrote:
>>> Hi Wuqiang,
>>>
>>> On Tue, 5 Sep 2023 09:52:51 +0800
>>> "wuqiang.matt" <wuqiang.matt@...edance.com> wrote:
>>>
>>>> The object pool is a scalable implementaion of high performance queue
>>>> for object allocation and reclamation, such as kretprobe instances.
>>>>
>>>> With leveraging percpu ring-array to mitigate the hot spot of memory
>>>> contention, it could deliver near-linear scalability for high parallel
>>>> scenarios. The objpool is best suited for following cases:
>>>> 1) Memory allocation or reclamation are prohibited or too expensive
>>>> 2) Consumers are of different priorities, such as irqs and threads
>>>>
>>>> Limitations:
>>>> 1) Maximum objects (capacity) is determined during pool initializing
>>>> and can't be modified (extended) after objpool creation
>>>> 2) The memory of objects won't be freed until objpool is finalized
>>>> 3) Object allocation (pop) may fail after trying all cpu slots
>>>
>>> I made a simplifying patch on this by (mainly) removing ages array.
>>> I also rename local variable to use more readable names, like slot,
>>> pool, and obj.
>>>
>>> Here the results which I run the test_objpool.ko.
>>>
>>> Original:
>>> [ 50.500235] Summary of testcases:
>>> [ 50.503139] duration: 1027135us hits: 30628293 miss: 0 sync: percpu objpool
>>> [ 50.510416] duration: 1047977us hits: 30453023 miss: 0 sync: percpu objpool from vmalloc
>>> [ 50.517421] duration: 1047098us hits: 31145034 miss: 0 sync & hrtimer: percpu objpool
>>> [ 50.524671] duration: 1053233us hits: 30919640 miss: 0 sync & hrtimer: percpu objpool from vmalloc
>>> [ 50.531382] duration: 1055822us hits: 3407221 miss: 830219 sync overrun: percpu objpool
>>> [ 50.538135] duration: 1055998us hits: 3404624 miss: 854160 sync overrun: percpu objpool from vmalloc
>>> [ 50.546686] duration: 1046104us hits: 19464798 miss: 0 async: percpu objpool
>>> [ 50.552989] duration: 1033054us hits: 18957836 miss: 0 async: percpu objpool from vmalloc
>>> [ 50.560289] duration: 1041778us hits: 33806868 miss: 0 async & hrtimer: percpu objpool
>>> [ 50.567425] duration: 1048901us hits: 34211860 miss: 0 async & hrtimer: percpu objpool from vmalloc
>>>
>>> Simplified:
>>> [ 48.393236] Summary of testcases:
>>> [ 48.395384] duration: 1013002us hits: 29661448 miss: 0 sync: percpu objpool
>>> [ 48.400351] duration: 1057231us hits: 31035578 miss: 0 sync: percpu objpool from vmalloc
>>> [ 48.405660] duration: 1043196us hits: 30546652 miss: 0 sync & hrtimer: percpu objpool
>>> [ 48.411216] duration: 1047128us hits: 30601306 miss: 0 sync & hrtimer: percpu objpool from vmalloc
>>> [ 48.417231] duration: 1051695us hits: 3468287 miss: 892881 sync overrun: percpu objpool
>>> [ 48.422405] duration: 1054003us hits: 3549491 miss: 898141 sync overrun: percpu objpool from vmalloc
>>> [ 48.428425] duration: 1052946us hits: 19005228 miss: 0 async: percpu objpool
>>> [ 48.433597] duration: 1051757us hits: 19670866 miss: 0 async: percpu objpool from vmalloc
>>> [ 48.439280] duration: 1042951us hits: 37135332 miss: 0 async & hrtimer: percpu objpool
>>> [ 48.445085] duration: 1029803us hits: 37093248 miss: 0 async & hrtimer: percpu objpool from vmalloc
>>>
>>> Can you test it too?
>>>
>>> Thanks,
>>>
>>> From f1f442ff653e329839e5452b8b88463a80a12ff3 Mon Sep 17 00:00:00 2001
>>> From: "Masami Hiramatsu (Google)" <mhiramat@...nel.org>
>>> Date: Mon, 25 Sep 2023 16:07:12 +0900
>>> Subject: [PATCH] objpool: Simplify objpool by removing ages array
>>>
>>> Simplify the objpool code by removing ages array from per-cpu slot.
>>> It chooses enough big capacity (which is a rounded up power of 2 value
>>> of nr_objects + 1) for the entries array, the tail never catch up to
>>> the head in per-cpu slot. Thus tail == head means the slot is empty.
>>>
>>> This also uses consistent local variable names for pool, slot and obj.
>>>
>>> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@...nel.org>
>>> ---
>>> include/linux/objpool.h | 61 ++++----
>>> lib/objpool.c | 310 ++++++++++++++++------------------------
>>> 2 files changed, 147 insertions(+), 224 deletions(-)
>>>
>>> diff --git a/include/linux/objpool.h b/include/linux/objpool.h
>>> index 33c832216b98..ecd5ecaffcd3 100644
>>> --- a/include/linux/objpool.h
>>> +++ b/include/linux/objpool.h
>>> @@ -38,33 +38,23 @@
>>> * struct objpool_slot - percpu ring array of objpool
>>> * @head: head of the local ring array (to retrieve at)
>>> * @tail: tail of the local ring array (to append at)
>>> - * @bits: log2 of capacity (for bitwise operations)
>>> - * @mask: capacity - 1
>>> + * @mask: capacity of entries - 1
>>> + * @entries: object entries on this slot.
>>> *
>>> * Represents a cpu-local array-based ring buffer, its size is specialized
>>> * during initialization of object pool. The percpu objpool slot is to be
>>> * allocated from local memory for NUMA system, and to be kept compact in
>>> - * continuous memory: ages[] is stored just after the body of objpool_slot,
>>> - * and then entries[]. ages[] describes revision of each item, solely used
>>> - * to avoid ABA; entries[] contains pointers of the actual objects
>>> - *
>>> - * Layout of objpool_slot in memory:
>>> - *
>>> - * 64bit:
>>> - * 4 8 12 16 32 64
>>> - * | head | tail | bits | mask | ages[4] | ents[4]: (8 * 4) | objects
>>> - *
>>> - * 32bit:
>>> - * 4 8 12 16 32 48 64
>>> - * | head | tail | bits | mask | ages[4] | ents[4] | unused | objects
>>> + * continuous memory: CPU assigned number of objects are stored just after
>>> + * the body of objpool_slot.
>>> *
>>> */
>>> struct objpool_slot {
>>> - uint32_t head;
>>> - uint32_t tail;
>>> - uint32_t bits;
>>> - uint32_t mask;
>>> -} __packed;
>>> + uint32_t head;
>>> + uint32_t tail;
>>> + uint32_t mask;
>>> + uint32_t dummyr;
>>> + void * entries[];
>>> +};
>>>
>>> struct objpool_head;
>>>
>>> @@ -82,12 +72,11 @@ typedef int (*objpool_fini_cb)(struct objpool_head *head, void *context);
>>> * @obj_size: object & element size
>>> * @nr_objs: total objs (to be pre-allocated)
>>> * @nr_cpus: nr_cpu_ids
>>> - * @capacity: max objects per cpuslot
>>> + * @capacity: max objects on percpu slot
>>> * @gfp: gfp flags for kmalloc & vmalloc
>>> * @ref: refcount for objpool
>>> * @flags: flags for objpool management
>>> * @cpu_slots: array of percpu slots
>>> - * @slot_sizes: size in bytes of slots
>>> * @release: resource cleanup callback
>>> * @context: caller-provided context
>>> */
>>> @@ -100,7 +89,6 @@ struct objpool_head {
>>> refcount_t ref;
>>> unsigned long flags;
>>> struct objpool_slot **cpu_slots;
>>> - int *slot_sizes;
>>> objpool_fini_cb release;
>>> void *context;
>>> };
>>> @@ -108,9 +96,12 @@ struct objpool_head {
>>> #define OBJPOOL_FROM_VMALLOC (0x800000000) /* objpool allocated from vmalloc area */
>>> #define OBJPOOL_HAVE_OBJECTS (0x400000000) /* objects allocated along with objpool */
>>>
>>> +#define OBJPOOL_NR_OBJECT_MAX (1 << 24)
>>> +#define OBJPOOL_OBJECT_SIZE_MAX (1 << 16)
>>> +
>>> /**
>>> * objpool_init() - initialize objpool and pre-allocated objects
>>> - * @head: the object pool to be initialized, declared by caller
>>> + * @pool: the object pool to be initialized, declared by caller
>>> * @nr_objs: total objects to be pre-allocated by this object pool
>>> * @object_size: size of an object (should be > 0)
>>> * @gfp: flags for memory allocation (via kmalloc or vmalloc)
>>> @@ -128,47 +119,47 @@ struct objpool_head {
>>> * pop (object allocation) or do clearance before each push (object
>>> * reclamation).
>>> */
>>> -int objpool_init(struct objpool_head *head, int nr_objs, int object_size,
>>> +int objpool_init(struct objpool_head *pool, int nr_objs, int object_size,
>>> gfp_t gfp, void *context, objpool_init_obj_cb objinit,
>>> objpool_fini_cb release);
>>>
>>> /**
>>> * objpool_pop() - allocate an object from objpool
>>> - * @head: object pool
>>> + * @pool: object pool
>>> *
>>> * return value: object ptr or NULL if failed
>>> */
>>> -void *objpool_pop(struct objpool_head *head);
>>> +void *objpool_pop(struct objpool_head *pool);
>>>
>>> /**
>>> * objpool_push() - reclaim the object and return back to objpool
>>> * @obj: object ptr to be pushed to objpool
>>> - * @head: object pool
>>> + * @pool: object pool
>>> *
>>> * return: 0 or error code (it fails only when user tries to push
>>> * the same object multiple times or wrong "objects" into objpool)
>>> */
>>> -int objpool_push(void *obj, struct objpool_head *head);
>>> +int objpool_push(void *obj, struct objpool_head *pool);
>>>
>>> /**
>>> * objpool_drop() - discard the object and deref objpool
>>> * @obj: object ptr to be discarded
>>> - * @head: object pool
>>> + * @pool: object pool
>>> *
>>> * return: 0 if objpool was released or error code
>>> */
>>> -int objpool_drop(void *obj, struct objpool_head *head);
>>> +int objpool_drop(void *obj, struct objpool_head *pool);
>>>
>>> /**
>>> * objpool_free() - release objpool forcely (all objects to be freed)
>>> - * @head: object pool to be released
>>> + * @pool: object pool to be released
>>> */
>>> -void objpool_free(struct objpool_head *head);
>>> +void objpool_free(struct objpool_head *pool);
>>>
>>> /**
>>> * objpool_fini() - deref object pool (also releasing unused objects)
>>> - * @head: object pool to be dereferenced
>>> + * @pool: object pool to be dereferenced
>>> */
>>> -void objpool_fini(struct objpool_head *head);
>>> +void objpool_fini(struct objpool_head *pool);
>>>
>>> #endif /* _LINUX_OBJPOOL_H */
>>> diff --git a/lib/objpool.c b/lib/objpool.c
>>> index 22e752371820..f8e8f70d7253 100644
>>> --- a/lib/objpool.c
>>> +++ b/lib/objpool.c
>>> @@ -15,104 +15,55 @@
>>> * Copyright: wuqiang.matt@...edance.com
>>> */
>>>
>>> -#define SLOT_AGES(s) ((uint32_t *)((char *)(s) + sizeof(struct objpool_slot)))
>>> -#define SLOT_ENTS(s) ((void **)((char *)(s) + sizeof(struct objpool_slot) + \
>>> - (sizeof(uint32_t) << (s)->bits)))
>>> -#define SLOT_OBJS(s) ((void *)((char *)(s) + sizeof(struct objpool_slot) + \
>>> - ((sizeof(uint32_t) + sizeof(void *)) << (s)->bits)))
>>> -#define SLOT_CORE(n) cpumask_nth((n) % num_possible_cpus(), cpu_possible_mask)
>>> -
>>> -/* compute the suitable num of objects to be managed per slot */
>>> -static int objpool_nobjs(int size)
>>> -{
>>> - return rounddown_pow_of_two((size - sizeof(struct objpool_slot)) /
>>> - (sizeof(uint32_t) + sizeof(void *)));
>>> -}
>>> -
>>> /* allocate and initialize percpu slots */
>>> static int
>>> -objpool_init_percpu_slots(struct objpool_head *head, int nobjs,
>>> - void *context, objpool_init_obj_cb objinit)
>>> +objpool_init_percpu_slots(struct objpool_head *pool, void *context,
>>> + objpool_init_obj_cb objinit)
>>> {
>>> - int i, j, n, size, objsz, cpu = 0, nents = head->capacity;
>>> -
>>> - /* aligned object size by sizeof(void *) */
>>> - objsz = ALIGN(head->obj_size, sizeof(void *));
>>> - /* shall we allocate objects along with percpu-slot */
>>> - if (objsz)
>>> - head->flags |= OBJPOOL_HAVE_OBJECTS;
>>> -
>>> - /* vmalloc is used in default to allocate percpu-slots */
>>> - if (!(head->gfp & GFP_ATOMIC))
>>> - head->flags |= OBJPOOL_FROM_VMALLOC;
>>> -
>>> - for (i = 0; i < head->nr_cpus; i++) {
>>> - struct objpool_slot *os;
>>> + int i, j, n, size, slot_size, cpu_count = 0;
>>> + struct objpool_slot *slot;
>>>
>>> + for (i = 0; i < pool->nr_cpus; i++) {
>>> /* skip the cpus which could never be present */
>>> if (!cpu_possible(i))
>>> continue;
>>>
>>> /* compute how many objects to be managed by this slot */
>>> - n = nobjs / num_possible_cpus();
>>> - if (cpu < (nobjs % num_possible_cpus()))
>>> + n = pool->nr_objs / num_possible_cpus();
>>> + if (cpu_count < (pool->nr_objs % num_possible_cpus()))
>>> n++;
>>> - size = sizeof(struct objpool_slot) + sizeof(void *) * nents +
>>> - sizeof(uint32_t) * nents + objsz * n;
>>> + cpu_count++;
>>> +
>>> + slot_size = struct_size(slot, entries, pool->capacity);
>>> + size = slot_size + pool->obj_size * n;
>>>
>>> /*
>>> * here we allocate percpu-slot & objects together in a single
>>> - * allocation, taking advantage of warm caches and TLB hits as
>>> - * vmalloc always aligns the request size to pages
>>> + * allocation, taking advantage on NUMA.
>>> */
>>> - if (head->flags & OBJPOOL_FROM_VMALLOC)
>>> - os = __vmalloc_node(size, sizeof(void *), head->gfp,
>>> + if (pool->flags & OBJPOOL_FROM_VMALLOC)
>>> + slot = __vmalloc_node(size, sizeof(void *), pool->gfp,
>>> cpu_to_node(i), __builtin_return_address(0));
>>> else
>>> - os = kmalloc_node(size, head->gfp, cpu_to_node(i));
>>> - if (!os)
>>> + slot = kmalloc_node(size, pool->gfp, cpu_to_node(i));
>>> + if (!slot)
>>> return -ENOMEM;
>>>
>>> /* initialize percpu slot for the i-th slot */
>>> - memset(os, 0, size);
>>> - os->bits = ilog2(head->capacity);
>>> - os->mask = head->capacity - 1;
>>> - head->cpu_slots[i] = os;
>>> - head->slot_sizes[i] = size;
>>> - cpu = cpu + 1;
>>> -
>>> - /*
>>> - * manually set head & tail to avoid possible conflict:
>>> - * We assume that the head item is ready for retrieval
>>> - * iff head is equal to ages[head & mask]. but ages is
>>> - * initialized as 0, so in view of the caller of pop(),
>>> - * the 1st item (0th) is always ready, but the reality
>>> - * could be: push() is stalled before the final update,
>>> - * thus the item being inserted will be lost forever
>>> - */
>>> - os->head = os->tail = head->capacity;
>>> -
>>> - if (!objsz)
>>> - continue;
>>> + memset(slot, 0, size);
>>> + slot->mask = pool->capacity - 1;
>>> + pool->cpu_slots[i] = slot;
>>>
>>> for (j = 0; j < n; j++) {
>>> - uint32_t *ages = SLOT_AGES(os);
>>> - void **ents = SLOT_ENTS(os);
>>> - void *obj = SLOT_OBJS(os) + j * objsz;
>>> - uint32_t ie = os->tail & os->mask;
>>> + void *obj = (void *)slot + slot_size + pool->obj_size * j;
>>>
>>> - /* perform object initialization */
>>> if (objinit) {
>>> int rc = objinit(obj, context);
>>> if (rc)
>>> return rc;
>>> }
>>> -
>>> - /* add obj into the ring array */
>>> - ents[ie] = obj;
>>> - ages[ie] = os->tail;
>>> - os->tail++;
>>> - head->nr_objs++;
>>> + slot->entries[j] = obj;
>>> + slot->tail++;
>>> }
>>> }
>>>
>>> @@ -120,164 +71,135 @@ objpool_init_percpu_slots(struct objpool_head *head, int nobjs,
>>> }
>>>
>>> /* cleanup all percpu slots of the object pool */
>>> -static void objpool_fini_percpu_slots(struct objpool_head *head)
>>> +static void objpool_fini_percpu_slots(struct objpool_head *pool)
>>> {
>>> int i;
>>>
>>> - if (!head->cpu_slots)
>>> + if (!pool->cpu_slots)
>>> return;
>>>
>>> - for (i = 0; i < head->nr_cpus; i++) {
>>> - if (!head->cpu_slots[i])
>>> - continue;
>>> - if (head->flags & OBJPOOL_FROM_VMALLOC)
>>> - vfree(head->cpu_slots[i]);
>>> - else
>>> - kfree(head->cpu_slots[i]);
>>> - }
>>> - kfree(head->cpu_slots);
>>> - head->cpu_slots = NULL;
>>> - head->slot_sizes = NULL;
>>> + for (i = 0; i < pool->nr_cpus; i++)
>>> + kvfree(pool->cpu_slots[i]);
>>> + kfree(pool->cpu_slots);
>>> }
>>>
>>> /* initialize object pool and pre-allocate objects */
>>> -int objpool_init(struct objpool_head *head, int nr_objs, int object_size,
>>> +int objpool_init(struct objpool_head *pool, int nr_objs, int object_size,
>>> gfp_t gfp, void *context, objpool_init_obj_cb objinit,
>>> objpool_fini_cb release)
>>> {
>>> int nents, rc;
>>>
>>> /* check input parameters */
>>> - if (nr_objs <= 0 || object_size <= 0)
>>> + if (nr_objs <= 0 || object_size <= 0 ||
>>> + nr_objs > OBJPOOL_NR_OBJECT_MAX || object_size > OBJPOOL_OBJECT_SIZE_MAX)
>>> + return -EINVAL;
>>> +
>>> + /* Align up to unsigned long size */
>>> + object_size = ALIGN(object_size, sizeof(unsigned long));
>>> +
>>> + /*
>>> + * To avoid filling up the entries array in the per-cpu slot,
>>> + * use the power of 2 which is more than N + 1. Thus, the tail never
>>> + * catch up to the pool, and able to use pool/tail as the sequencial
>>> + * number.
>>> + */
>>> + nents = roundup_pow_of_two(nr_objs + 1);
>>> + if (!nents)
>>> return -EINVAL;
>>>
>>> - /* calculate percpu slot size (rounded to pow of 2) */
>>> - nents = max_t(int, roundup_pow_of_two(nr_objs),
>>> - objpool_nobjs(L1_CACHE_BYTES));
>>> -
>>> - /* initialize objpool head */
>>> - memset(head, 0, sizeof(struct objpool_head));
>>> - head->nr_cpus = nr_cpu_ids;
>>> - head->obj_size = object_size;
>>> - head->capacity = nents;
>>> - head->gfp = gfp & ~__GFP_ZERO;
>>> - head->context = context;
>>> - head->release = release;
>>> -
>>> - /* allocate array for percpu slots */
>>> - head->cpu_slots = kzalloc(head->nr_cpus * sizeof(void *) +
>>> - head->nr_cpus * sizeof(int), head->gfp);
>>> - if (!head->cpu_slots)
>>> + /* initialize objpool pool */
>>> + memset(pool, 0, sizeof(struct objpool_head));
>>> + pool->nr_cpus = nr_cpu_ids;
>>> + pool->obj_size = object_size;
>>> + pool->nr_objs = nr_objs;
>>> + /* just prevent to fullfill the per-cpu ring array */
>>> + pool->capacity = nents;
>>> + pool->gfp = gfp & ~__GFP_ZERO;
>>> + pool->context = context;
>>> + pool->release = release;
>>> + /* vmalloc is used in default to allocate percpu-slots */
>>> + if (!(pool->gfp & GFP_ATOMIC))
>>> + pool->flags |= OBJPOOL_FROM_VMALLOC;
>>> +
>>> + pool->cpu_slots = kzalloc(pool->nr_cpus * sizeof(void *), pool->gfp);
>>> + if (!pool->cpu_slots)
>>> return -ENOMEM;
>>> - head->slot_sizes = (int *)&head->cpu_slots[head->nr_cpus];
>>>
>>> /* initialize per-cpu slots */
>>> - rc = objpool_init_percpu_slots(head, nr_objs, context, objinit);
>>> + rc = objpool_init_percpu_slots(pool, context, objinit);
>>> if (rc)
>>> - objpool_fini_percpu_slots(head);
>>> + objpool_fini_percpu_slots(pool);
>>> else
>>> - refcount_set(&head->ref, nr_objs + 1);
>>> + refcount_set(&pool->ref, nr_objs + 1);
>>>
>>> return rc;
>>> }
>>> EXPORT_SYMBOL_GPL(objpool_init);
>>>
>>> /* adding object to slot, abort if the slot was already full */
>>> -static inline int objpool_try_add_slot(void *obj, struct objpool_slot *os)
>>> +static inline int objpool_try_add_slot(void *obj, struct objpool_head *pool, int cpu)
>>> {
>>> - uint32_t *ages = SLOT_AGES(os);
>>> - void **ents = SLOT_ENTS(os);
>>> - uint32_t head, tail;
>>> + struct objpool_slot *slot = pool->cpu_slots[cpu];
>>> + uint32_t tail, next;
>>>
>>> do {
>>> - /* perform memory loading for both head and tail */
>>> - head = READ_ONCE(os->head);
>>> - tail = READ_ONCE(os->tail);
>>> - /* just abort if slot is full */
>>> - if (tail - head > os->mask)
>>> - return -ENOENT;
>>> - /* try to extend tail by 1 using CAS to avoid races */
>>> - if (try_cmpxchg_acquire(&os->tail, &tail, tail + 1))
>>> - break;
>>> - } while (1);
>>> + uint32_t head = READ_ONCE(slot->head);
>>>
>>> - /* the tail-th of slot is reserved for the given obj */
>>> - WRITE_ONCE(ents[tail & os->mask], obj);
>>> - /* update epoch id to make this object available for pop() */
>>> - smp_store_release(&ages[tail & os->mask], tail);
>>> + tail = READ_ONCE(slot->tail);
>>> + next = tail + 1;
>>> +
>>> + /* This must never happen because capacity >= N + 1 */
>>> + if (WARN_ON_ONCE((next > head && next - head > pool->nr_objs) ||
>>> + (next < head && next > head + pool->nr_objs)))
>>> + return -EINVAL;
>>> +
>>> + } while (!try_cmpxchg_acquire(&slot->tail, &tail, next));
>>> +
>>> + WRITE_ONCE(slot->entries[tail & slot->mask], obj);
>>> return 0;
>>> }
>>>
>>> /* reclaim an object to object pool */
>>> -int objpool_push(void *obj, struct objpool_head *oh)
>>> +int objpool_push(void *obj, struct objpool_head *pool)
>>> {
>>> unsigned long flags;
>>> - int cpu, rc;
>>> + int rc;
>>>
>>> /* disable local irq to avoid preemption & interruption */
>>> raw_local_irq_save(flags);
>>> - cpu = raw_smp_processor_id();
>>> - do {
>>> - rc = objpool_try_add_slot(obj, oh->cpu_slots[cpu]);
>>> - if (!rc)
>>> - break;
>>> - cpu = cpumask_next_wrap(cpu, cpu_possible_mask, -1, 1);
>>> - } while (1);
>>> + rc = objpool_try_add_slot(obj, pool, raw_smp_processor_id());
>>> raw_local_irq_restore(flags);
>>>
>>> return rc;
>>> }
>>> EXPORT_SYMBOL_GPL(objpool_push);
>>>
>>> -/* drop the allocated object, rather reclaim it to objpool */
>>> -int objpool_drop(void *obj, struct objpool_head *head)
>>> -{
>>> - if (!obj || !head)
>>> - return -EINVAL;
>>> -
>>> - if (refcount_dec_and_test(&head->ref)) {
>>> - objpool_free(head);
>>> - return 0;
>>> - }
>>> -
>>> - return -EAGAIN;
>>> -}
>>> -EXPORT_SYMBOL_GPL(objpool_drop);
>>> -
>>> /* try to retrieve object from slot */
>>> -static inline void *objpool_try_get_slot(struct objpool_slot *os)
>>> +static inline void *objpool_try_get_slot(struct objpool_slot *slot)
>>> {
>>> - uint32_t *ages = SLOT_AGES(os);
>>> - void **ents = SLOT_ENTS(os);
>>> /* do memory load of head to local head */
>>> - uint32_t head = smp_load_acquire(&os->head);
>>> + uint32_t head = smp_load_acquire(&slot->head);
>>>
>>> /* loop if slot isn't empty */
>>> - while (head != READ_ONCE(os->tail)) {
>>> - uint32_t id = head & os->mask, prev = head;
>>> + while (head != READ_ONCE(slot->tail)) {
>>>
>>> /* do prefetching of object ents */
>>> - prefetch(&ents[id]);
>>> -
>>> - /* check whether this item was ready for retrieval */
>>> - if (smp_load_acquire(&ages[id]) == head) {
>>> - /* node must have been udpated by push() */
>>> - void *node = READ_ONCE(ents[id]);
>>> - /* commit and move forward head of the slot */
>>> - if (try_cmpxchg_release(&os->head, &head, head + 1))
>>> - return node;
>>> - /* head was already updated by others */
>>> - }
>>> + prefetch(&slot->entries[head & slot->mask]);
>>> +
>>> + /* commit and move forward head of the slot */
>>> + if (try_cmpxchg_release(&slot->head, &head, head + 1))
>>> + /*
>>> + * TBD: check overwrap the tail/head counter and warn
>>> + * if it is broken. But this happens only if this
>>> + * process slows down a lot and another CPU updates
>>> + * the haed/tail just 2^32 + 1 times, and this slot
>>> + * is empty.
>>> + */
>>> + return slot->entries[head & slot->mask];
>>>
>>> /* re-load head from memory and continue trying */
>>> - head = READ_ONCE(os->head);
>>> - /*
>>> - * head stays unchanged, so it's very likely there's an
>>> - * ongoing push() on other cpu nodes but yet not update
>>> - * ages[] to mark it's completion
>>> - */
>>> - if (head == prev)
>>> - break;
>>> + head = READ_ONCE(slot->head);
>>> }
>>>
>>> return NULL;
>>> @@ -307,32 +229,42 @@ void *objpool_pop(struct objpool_head *head)
>>> EXPORT_SYMBOL_GPL(objpool_pop);
>>>
>>> /* release whole objpool forcely */
>>> -void objpool_free(struct objpool_head *head)
>>> +void objpool_free(struct objpool_head *pool)
>>> {
>>> - if (!head->cpu_slots)
>>> + if (!pool->cpu_slots)
>>> return;
>>>
>>> /* release percpu slots */
>>> - objpool_fini_percpu_slots(head);
>>> + objpool_fini_percpu_slots(pool);
>>>
>>> /* call user's cleanup callback if provided */
>>> - if (head->release)
>>> - head->release(head, head->context);
>>> + if (pool->release)
>>> + pool->release(pool, pool->context);
>>> }
>>> EXPORT_SYMBOL_GPL(objpool_free);
>>>
>>> -/* drop unused objects and defref objpool for releasing */
>>> -void objpool_fini(struct objpool_head *head)
>>> +/* drop the allocated object, rather reclaim it to objpool */
>>> +int objpool_drop(void *obj, struct objpool_head *pool)
>>> {
>>> - void *nod;
>>> + if (!obj || !pool)
>>> + return -EINVAL;
>>>
>>> - do {
>>> - /* grab object from objpool and drop it */
>>> - nod = objpool_pop(head);
>>> + if (refcount_dec_and_test(&pool->ref)) {
>>> + objpool_free(pool);
>>> + return 0;
>>> + }
>>> +
>>> + return -EAGAIN;
>>> +}
>>> +EXPORT_SYMBOL_GPL(objpool_drop);
>>> +
>>> +/* drop unused objects and defref objpool for releasing */
>>> +void objpool_fini(struct objpool_head *pool)
>>> +{
>>> + void *obj;
>>>
>>> - /* drop the extra ref of objpool */
>>> - if (refcount_dec_and_test(&head->ref))
>>> - objpool_free(head);
>>> - } while (nod);
>>> + /* grab object from objpool and drop it */
>>> + while ((obj = objpool_pop(pool)))
>>> + objpool_drop(obj, pool);
>>> }
>>> EXPORT_SYMBOL_GPL(objpool_fini);
>>
>
>
Powered by blists - more mailing lists