linux-kernel - Re: [PATCH v3] binder_alloc: Replace kcalloc with kvcalloc to mitigate OOM issues

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <32e0c2ea-6035-4ec9-b99c-e6b686f04cf3@vivo.com>
Date: Tue, 18 Jun 2024 10:50:17 +0800
From: Lei Liu <liulei.rjpt@...o.com>
To: Carlos Llamas <cmllamas@...gle.com>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
 Arve Hjønnevåg <arve@...roid.com>,
 Todd Kjos <tkjos@...roid.com>, Martijn Coenen <maco@...roid.com>,
 Joel Fernandes <joel@...lfernandes.org>,
 Christian Brauner <brauner@...nel.org>,
 Suren Baghdasaryan <surenb@...gle.com>, linux-kernel@...r.kernel.org,
 opensource.kernel@...o.com
Subject: Re: [PATCH v3] binder_alloc: Replace kcalloc with kvcalloc to
 mitigate OOM issues


On 2024/6/18 2:43, Carlos Llamas wrote:
> On Mon, Jun 17, 2024 at 12:01:26PM +0800, Lei Liu wrote:
>> On 6/15/2024 at 2:38, Carlos Llamas wrote:
>>> My understanding is that kvcalloc() == kcalloc() if there is enough
>>> contiguous memory no?
>>>
>>> I would expect the performance to be the same at best.
>> 1.The main reason is memory fragmentation, where we are unable to
>> allocate contiguous order3 memory. Additionally, using the GFP_KERNEL
>> allocation flag in the kernel's __alloc_pages_slowpath function results
>> in multiple retry attempts, and if direct_reclaim and memory_compact
>> are unsuccessful, OOM occurs.
>>
>> 2.When fragmentation is severe, we observed that kvmalloc is faster
>> than kmalloc, as it eliminates the need for multiple retry attempts to
>> allocate order3. In such cases, falling back to order0 may result in
>> higher allocation efficiency.
>>
>> 3.Another crucial point is that in the kernel, allocations greater than
>> order3 are considered PAGE_ALLOC_COSTLY_ORDER. This leads to a reduced
>> number of retry attempts in __alloc_pages_slowpath, which explains the
>> increased time for order3 allocation in fragmented scenarios.
>>
>> In summary, under high memory pressure scenarios, the system is prone
>> to fragmentation. Instead of waiting for order3 allocation, it is more
>> efficient to allow kvmalloc to automatically select between order0 and
>> order3, reducing wait times in high memory pressure scenarios. This is
>> also the reason why kvmalloc can improve throughput.
> Yes, all this makes sense. What I don't understand is how "performance
> of kvcalloc is better". This is not supposed to be.

Based on my current understanding:
1.kvmalloc may allocate memory faster than kmalloc in cases of memory 
fragmentation, which could potentially improve the performance of binder.
2.Memory allocated by kvmalloc may not be contiguous, which could 
potentially degrade the data read and write speed of binder.

I'm uncertain about the relative impact of the points mentioned above. 
I'm interested in hearing your perspective on this matter.

>>> I'm not so sure about the results and performance improvements that are
>>> claimed here. However, the switch to kvcalloc() itself seems reasonable
>>> to me.
>>>
>>> I'll run these tests myself as the results might have some noise. I'll
>>> get back with the results.
>>>
>>> Thanks,
>>> Carlos Llamas
>> Okay, thank you for the suggestion. I look forward to receiving your
>> test results and continuing our discussion.
>>
> I ran several iterations of the benchmark test on a Pixel device and as
> expected I didn't see any significant differences. This is a good thing,
> but either we need to understand how you obtained a better performance
> from using kvcalloc(), or it would be better to drop this claim from the
> commit log.
>
> The following are two individual samples of each form. However, if we
> could average the output and get rid of the noise it seems the numbers
> are pretty much the same.
>
> Sample with kcalloc():
> ------------------------------------------------------------------
> Benchmark                        Time             CPU   Iterations
> ------------------------------------------------------------------
> BM_sendVec_binder/4          19983 ns         9832 ns        60255
> BM_sendVec_binder/8          19766 ns         9690 ns        71699
> BM_sendVec_binder/16         19785 ns         9722 ns        72086
> BM_sendVec_binder/32         20067 ns         9864 ns        71535
> BM_sendVec_binder/64         20077 ns         9941 ns        69141
> BM_sendVec_binder/128        20147 ns         9944 ns        71016
> BM_sendVec_binder/256        20424 ns        10044 ns        69451
> BM_sendVec_binder/512        20518 ns        10064 ns        69179
> BM_sendVec_binder/1024       21073 ns        10319 ns        67599
> BM_sendVec_binder/2048       21482 ns        10502 ns        66767
> BM_sendVec_binder/4096       22308 ns        10809 ns        63841
> BM_sendVec_binder/8192       24022 ns        11649 ns        60795
> BM_sendVec_binder/16384      27172 ns        13426 ns        51940
> BM_sendVec_binder/32768      32853 ns        16345 ns        42211
> BM_sendVec_binder/65536      80177 ns        39787 ns        17557
>
> Sample with kvalloc():
> ------------------------------------------------------------------
> Benchmark                        Time             CPU   Iterations
> ------------------------------------------------------------------
> BM_sendVec_binder/4          19900 ns         9711 ns        68626
> BM_sendVec_binder/8          19903 ns         9756 ns        71524
> BM_sendVec_binder/16         19601 ns         9541 ns        71069
> BM_sendVec_binder/32         19514 ns         9530 ns        72469
> BM_sendVec_binder/64         20042 ns        10006 ns        69753
> BM_sendVec_binder/128        20142 ns         9965 ns        70392
> BM_sendVec_binder/256        20274 ns         9958 ns        70173
> BM_sendVec_binder/512        20305 ns         9966 ns        70347
> BM_sendVec_binder/1024       20883 ns        10250 ns        67813
> BM_sendVec_binder/2048       21364 ns        10455 ns        67366
> BM_sendVec_binder/4096       22350 ns        10888 ns        65689
> BM_sendVec_binder/8192       24113 ns        11707 ns        58149
> BM_sendVec_binder/16384      27122 ns        13346 ns        52515
> BM_sendVec_binder/32768      32158 ns        15901 ns        44139
> BM_sendVec_binder/65536      87594 ns        43627 ns        16040
>
> To reiterate, the switch to kvcalloc() sounds good to me. Let's just fix
> the commit log and Greg's suggestions too.
>
> Thanks,
> Carlos Llamas

Hmm, this is really good news. From the current test results, it seems 
that kvmalloc does not degrade performance for binder.

I will retest the data on our phone to see if we reach the same 
conclusion. If kvmalloc still proves to be better, we will provide you 
with the reproduction method.