[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <32e0c2ea-6035-4ec9-b99c-e6b686f04cf3@vivo.com>
Date: Tue, 18 Jun 2024 10:50:17 +0800
From: Lei Liu <liulei.rjpt@...o.com>
To: Carlos Llamas <cmllamas@...gle.com>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Arve Hjønnevåg <arve@...roid.com>,
Todd Kjos <tkjos@...roid.com>, Martijn Coenen <maco@...roid.com>,
Joel Fernandes <joel@...lfernandes.org>,
Christian Brauner <brauner@...nel.org>,
Suren Baghdasaryan <surenb@...gle.com>, linux-kernel@...r.kernel.org,
opensource.kernel@...o.com
Subject: Re: [PATCH v3] binder_alloc: Replace kcalloc with kvcalloc to
mitigate OOM issues
On 2024/6/18 2:43, Carlos Llamas wrote:
> On Mon, Jun 17, 2024 at 12:01:26PM +0800, Lei Liu wrote:
>> On 6/15/2024 at 2:38, Carlos Llamas wrote:
>>> My understanding is that kvcalloc() == kcalloc() if there is enough
>>> contiguous memory no?
>>>
>>> I would expect the performance to be the same at best.
>> 1.The main reason is memory fragmentation, where we are unable to
>> allocate contiguous order3 memory. Additionally, using the GFP_KERNEL
>> allocation flag in the kernel's __alloc_pages_slowpath function results
>> in multiple retry attempts, and if direct_reclaim and memory_compact
>> are unsuccessful, OOM occurs.
>>
>> 2.When fragmentation is severe, we observed that kvmalloc is faster
>> than kmalloc, as it eliminates the need for multiple retry attempts to
>> allocate order3. In such cases, falling back to order0 may result in
>> higher allocation efficiency.
>>
>> 3.Another crucial point is that in the kernel, allocations greater than
>> order3 are considered PAGE_ALLOC_COSTLY_ORDER. This leads to a reduced
>> number of retry attempts in __alloc_pages_slowpath, which explains the
>> increased time for order3 allocation in fragmented scenarios.
>>
>> In summary, under high memory pressure scenarios, the system is prone
>> to fragmentation. Instead of waiting for order3 allocation, it is more
>> efficient to allow kvmalloc to automatically select between order0 and
>> order3, reducing wait times in high memory pressure scenarios. This is
>> also the reason why kvmalloc can improve throughput.
> Yes, all this makes sense. What I don't understand is how "performance
> of kvcalloc is better". This is not supposed to be.
Based on my current understanding:
1.kvmalloc may allocate memory faster than kmalloc in cases of memory
fragmentation, which could potentially improve the performance of binder.
2.Memory allocated by kvmalloc may not be contiguous, which could
potentially degrade the data read and write speed of binder.
I'm uncertain about the relative impact of the points mentioned above.
I'm interested in hearing your perspective on this matter.
>>> I'm not so sure about the results and performance improvements that are
>>> claimed here. However, the switch to kvcalloc() itself seems reasonable
>>> to me.
>>>
>>> I'll run these tests myself as the results might have some noise. I'll
>>> get back with the results.
>>>
>>> Thanks,
>>> Carlos Llamas
>> Okay, thank you for the suggestion. I look forward to receiving your
>> test results and continuing our discussion.
>>
> I ran several iterations of the benchmark test on a Pixel device and as
> expected I didn't see any significant differences. This is a good thing,
> but either we need to understand how you obtained a better performance
> from using kvcalloc(), or it would be better to drop this claim from the
> commit log.
>
> The following are two individual samples of each form. However, if we
> could average the output and get rid of the noise it seems the numbers
> are pretty much the same.
>
> Sample with kcalloc():
> ------------------------------------------------------------------
> Benchmark Time CPU Iterations
> ------------------------------------------------------------------
> BM_sendVec_binder/4 19983 ns 9832 ns 60255
> BM_sendVec_binder/8 19766 ns 9690 ns 71699
> BM_sendVec_binder/16 19785 ns 9722 ns 72086
> BM_sendVec_binder/32 20067 ns 9864 ns 71535
> BM_sendVec_binder/64 20077 ns 9941 ns 69141
> BM_sendVec_binder/128 20147 ns 9944 ns 71016
> BM_sendVec_binder/256 20424 ns 10044 ns 69451
> BM_sendVec_binder/512 20518 ns 10064 ns 69179
> BM_sendVec_binder/1024 21073 ns 10319 ns 67599
> BM_sendVec_binder/2048 21482 ns 10502 ns 66767
> BM_sendVec_binder/4096 22308 ns 10809 ns 63841
> BM_sendVec_binder/8192 24022 ns 11649 ns 60795
> BM_sendVec_binder/16384 27172 ns 13426 ns 51940
> BM_sendVec_binder/32768 32853 ns 16345 ns 42211
> BM_sendVec_binder/65536 80177 ns 39787 ns 17557
>
> Sample with kvalloc():
> ------------------------------------------------------------------
> Benchmark Time CPU Iterations
> ------------------------------------------------------------------
> BM_sendVec_binder/4 19900 ns 9711 ns 68626
> BM_sendVec_binder/8 19903 ns 9756 ns 71524
> BM_sendVec_binder/16 19601 ns 9541 ns 71069
> BM_sendVec_binder/32 19514 ns 9530 ns 72469
> BM_sendVec_binder/64 20042 ns 10006 ns 69753
> BM_sendVec_binder/128 20142 ns 9965 ns 70392
> BM_sendVec_binder/256 20274 ns 9958 ns 70173
> BM_sendVec_binder/512 20305 ns 9966 ns 70347
> BM_sendVec_binder/1024 20883 ns 10250 ns 67813
> BM_sendVec_binder/2048 21364 ns 10455 ns 67366
> BM_sendVec_binder/4096 22350 ns 10888 ns 65689
> BM_sendVec_binder/8192 24113 ns 11707 ns 58149
> BM_sendVec_binder/16384 27122 ns 13346 ns 52515
> BM_sendVec_binder/32768 32158 ns 15901 ns 44139
> BM_sendVec_binder/65536 87594 ns 43627 ns 16040
>
> To reiterate, the switch to kvcalloc() sounds good to me. Let's just fix
> the commit log and Greg's suggestions too.
>
> Thanks,
> Carlos Llamas
Hmm, this is really good news. From the current test results, it seems
that kvmalloc does not degrade performance for binder.
I will retest the data on our phone to see if we reach the same
conclusion. If kvmalloc still proves to be better, we will provide you
with the reproduction method.
Powered by blists - more mailing lists