[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <711bf7dd-1f57-7cee-54b0-e439a70db967@huawei.com>
Date: Thu, 5 Sep 2024 15:04:11 +0800
From: "Leizhen (ThunderTown)" <thunder.leizhen@...wei.com>
To: Andrew Morton <akpm@...ux-foundation.org>, Thomas Gleixner
<tglx@...utronix.de>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2 4/6] debugobjects: Don't start fill if there are
remaining nodes locally
On 2024/9/5 11:45, Leizhen (ThunderTown) wrote:
>
>
> On 2024/9/5 11:11, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2024/9/4 21:39, Zhen Lei wrote:
>>> If the conditions for starting fill are met, it means that all cores that
>>> call fill() later are blocked until the first core completes the fill
>>> operation. But obviously, for a core that has free nodes locally, it does
>>> not need to be blocked(see below for why). This is good in stress
>>> situations.
>>>
>>> 1. In the case of no nesting, a core uses only one node at a time. As long
>>> as there is a local node, there is no need to use the free node in
>>> obj_pool.
>>> 2. In the case of nesting depth is one, nodes in obj_pool need to be used
>>> only when there is only one local node.
>>> #define ODEBUG_POOL_PERCPU_SIZE 64
>>> #define ODEBUG_BATCH_SIZE 16
>>> Assume that when nested, the probability of percpu_obj_pool having each
>>> number of nodes is the same. The probability of only one node is less
>>> than 1/17=6%. Assuming the probability of nesting is 5%, that's a
>>> pretty high estimate. Then the probability of using obj_pool is
>>> 6% * 5% = 0.3%. In other words, a 333-core environment produces only
>>> one core to compete for obj_pool.
>>> #define ODEBUG_POOL_MIN_LEVEL 256
>>> #define ODEBUG_BATCH_SIZE 16
>>> But we can tolerate "256 / (16 + 1)" = 15 cores competing at the same
>>> time.
>>
>> One detail is omitted. In function debug_objects_mem_init(), an extra batch
>> is reserved for each core.
>> extras = num_possible_cpus() * ODEBUG_BATCH_SIZE;
>> debug_objects_pool_min_level += extras;
>>
>> In addition, above method of calculating probabilities is wrong. The correct
>> calculation method is as follows:
>> When the number of local nodes is 0, fill is performed. When the number of
>> local nodes is 1 and nested, 16 nodes are moved from obj_pool to obj_pool.
>> As a result, the obj_pool resource pool keeps decreasing. When this happens
>> continuously(The number of local nodes equal 0 is not met), the resource
>> pool will eventually be exhausted. The error probability is:
>> (1/2)^((256+16^ncpus)/17) * (5% + 5%^2 + ... + 5%^N) * 2/17 < 1e-7 (ncpus=1).
>
> Should be:
> ==> (1/2)^((256+16^ncpus)/17) * 5% * 2/17 < 9e-8 (ncpus=1).
>
>> 1/2 ==> denominator sequence: 0,1; numerator sequence: 1
>> (5% + 5%^2 + ... + 5%^N) < 5% + (5%^2) * 2 = 0.055
>> 17 = ODEBUG_BATCH_SIZ + 1, amount moved from obj_pool when the number of local nodes is 0.
>> 2/17 ==> denominator sequence: 0-16; numerator sequence: 0,1
>> The more cores, the lower the probability of exhaustion.
>>
>> If obj_pool is not filled only when there are more than two local nodes,
>> the probability of exhaustion is:
>> (1/3)^((256+16^ncpus)/17) * (5% + 5%^2 + ... + 5%^N) * 3/17 < < 2.3e-10
>
> Should be:
> ==> (1/3)^((256+16^ncpus)/17) * (5%^2) * 3/17 < 1.03e-11 (ncpus=1).
>
>> 1/3 ==> denominator sequence: 0,1,2; numerator sequence: 2
>> 3/17 ==> denominator sequence: 0-16; numerator sequence: 0,1,2
>
> Hi, Thomas Gleixner:
> Seems to need to add an additional patch as follows to be foolproof.
> I'll prepare it.
I've rethinked this problem, and there's another remedy. When the number
of remaining free nodes is less than half of debug_objects_pool_min_level,
still do fill. In this way, "nr_cpus/2 + 256/(16+1)" cores are required to
bypass the check before obj_pool_free is updated, which is almost impossible.
But then again, I'm just theoretical, and I don't have the data, so maybe
the best solution is to give up this patch and talk about it in future.
By the way, I've found a new concern.
static int debug_objects_pool_size = ODEBUG_POOL_SIZE; //1024
static int debug_objects_pool_min_level = ODEBUG_POOL_MIN_LEVEL; //256
extras = num_possible_cpus() * ODEBUG_BATCH_SIZE; //16
debug_objects_pool_size += extras;
debug_objects_pool_min_level += extras;
When there are so many cores, it should be easier to walk back and forth
around debug_objects_pool_min_level. For example, nr_cpus=128,
debug_objects_pool_min_level = 128 *16 + 256 = 2304
debug_objects_pool_size - debug_objects_pool_min_level = 768 //fixed
There are many more loafers than workers. As above, it's better to
discuss it in future.
diff --git a/lib/debugobjects.c b/lib/debugobjects.c
index 58de57090ac6389..2eb246901cf5367 100644
--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -135,6 +135,10 @@ static void fill_pool(void)
if (likely(READ_ONCE(obj_pool_free) >= debug_objects_pool_min_level))
return;
+ if (likely(obj_cache) &&
+ this_cpu_read(percpu_obj_pool.obj_free) > 0 &&
+ likely(READ_ONCE(obj_pool_free) >= debug_objects_pool_min_level / 2))
+ return;
/*
* Reuse objs from the global tofree list; they will be reinitialized
* when allocating.
>
> diff --git a/lib/debugobjects.c b/lib/debugobjects.c
> index e175cc74f7b7899..d3f8cc7dc1c9291 100644
> --- a/lib/debugobjects.c
> +++ b/lib/debugobjects.c
> @@ -245,6 +245,21 @@ alloc_object(void *addr, struct debug_bucket *b, const struct debug_obj_descr *d
>
> raw_spin_lock(&pool_lock);
> obj = __alloc_object(&obj_pool);
> + if (!obj) {
> + raw_spin_unlock(&pool_lock);
> + obj = kmem_cache_zalloc(obj_cache, __GFP_HIGH | GFP_NOWAIT);
> + if (!obj)
> + return NULL;
> +
> + raw_spin_lock(&pool_lock);
> + debug_objects_allocated++;
> +
> + /*
> + * It can be understood that obj is allocated immediately after
> + * being added to obj_pool.
> + */
> + obj_pool_used++;
> + }
> if (obj) {
> int cnt = 0;
>
>
>
>>
>>> 3. In the case of nesting depth more than one, the probability is lower
>>> and negligible.
>>> Nesting Depth=2: "2/17 * 5% * 5%" = 0.03%
>>> Nesting Depth=3: "3/17 * 5% * 5% * 5%" = 0.002%
>>>
>>> However, to ensure sufficient reliability, obj_pool is not filled only
>>> when there are more than two local nodes, reduce the probability of
>>> problems to the impossible.
>>>
>>> Signed-off-by: Zhen Lei <thunder.leizhen@...wei.com>
>>> ---
>>> lib/debugobjects.c | 10 ++++++++++
>>> 1 file changed, 10 insertions(+)
>>>
>>> diff --git a/lib/debugobjects.c b/lib/debugobjects.c
>>> index 7a8ccc94cb037ba..4f64b5d4329c27d 100644
>>> --- a/lib/debugobjects.c
>>> +++ b/lib/debugobjects.c
>>> @@ -131,6 +131,16 @@ static void fill_pool(void)
>>> struct debug_obj *obj;
>>> unsigned long flags;
>>>
>>> + /*
>>> + * The upper-layer function uses only one node at a time. If there are
>>> + * more than two local nodes, it means that even if nesting occurs, it
>>> + * doesn't matter. The probability of nesting depth >= 2 is extremely
>>> + * low, and the number of global free nodes guarded by
>>> + * debug_objects_pool_min_level is adequate.
>>> + */
>>> + if (likely(obj_cache) && this_cpu_read(percpu_obj_pool.obj_free) >= 2)
>>> + return;
>>> +
>>> if (likely(READ_ONCE(obj_pool_free) >= debug_objects_pool_min_level))
>>> return;
>>>
>>>
>>
>
--
Regards,
Zhen Lei
Powered by blists - more mailing lists