lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 19 Jun 2024 16:35:42 +0800
From: Lei Liu <liulei.rjpt@...o.com>
To: Carlos Llamas <cmllamas@...gle.com>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
 Arve Hjønnevåg <arve@...roid.com>,
 Todd Kjos <tkjos@...roid.com>, Martijn Coenen <maco@...roid.com>,
 Joel Fernandes <joel@...lfernandes.org>,
 Christian Brauner <brauner@...nel.org>,
 Suren Baghdasaryan <surenb@...gle.com>, linux-kernel@...r.kernel.org,
 opensource.kernel@...o.com
Subject: Re: [PATCH v3] binder_alloc: Replace kcalloc with kvcalloc to
 mitigate OOM issues


On 2024/6/18 12:37, Carlos Llamas wrote:
> On Tue, Jun 18, 2024 at 10:50:17AM +0800, Lei Liu wrote:
>> On 2024/6/18 2:43, Carlos Llamas wrote:
>>> On Mon, Jun 17, 2024 at 12:01:26PM +0800, Lei Liu wrote:
>>>> On 6/15/2024 at 2:38, Carlos Llamas wrote:
>>> Yes, all this makes sense. What I don't understand is how "performance
>>> of kvcalloc is better". This is not supposed to be.
>> Based on my current understanding:
>> 1.kvmalloc may allocate memory faster than kmalloc in cases of memory
>> fragmentation, which could potentially improve the performance of binder.
> I think there is a misunderstanding of the allocations performed in this
> benchmark test. Yes, in general when there is heavy memory pressure it
> can be faster to use kvmalloc() and not try too hard to reclaim
> contiguous memory.
>
> In the case of binder though, this is the mmap() allocation. This call
> is part of the "initial setup". In the test, there should only be two
> calls to kvmalloc(), since the benchmark is done across two processes.
> That's it.
>
> So the time it takes to allocate this memory is irrelevant to the
> performance results. Does this make sense?
>
>> 2.Memory allocated by kvmalloc may not be contiguous, which could
>> potentially degrade the data read and write speed of binder.
> This _is_ what is being considered in the benchmark test instead. There
> are repeated accesses to alloc->pages[n]. Your point is then the reason
> why I was expecting "same performance at best".
>
>> Hmm, this is really good news. From the current test results, it seems that
>> kvmalloc does not degrade performance for binder.
> Yeah, not in the "happy" case anyways. I'm not sure what the numbers
> look like under some memory pressure.
>
>> I will retest the data on our phone to see if we reach the same conclusion.
>> If kvmalloc still proves to be better, we will provide you with the
>> reproduction method.
>>
> Ok, thanks. I would suggest you do an "adb shell stop" before running
> these test. This might help with the noise.
>
> Thanks,
> Carlos Llamas

We used the "adb shell stop" command to retest the data. Now, the test 
data for kmalloc and vmalloc are basically consistent. There are a few 
instances where vmalloc may be slightly inferior, but the difference is 
not significant, within 3%. adb shell stop/ kmalloc /8+256G 
---------------------------------------------------------------------- 
Benchmark Time CPU Iterations OUTPUT OUTPUTCPU 
---------------------------------------------------------------------- 
BM_sendVec_binder4 39126 18550 38894 3.976282 8.38684 BM_sendVec_binder8 
38924 18542 37786 7.766108 16.3028 BM_sendVec_binder16 38328 18228 36700 
15.32039 32.2141 BM_sendVec_binder32 38154 18215 38240 32.07213 67.1798 
BM_sendVec_binder64 39093 18809 36142 59.16885 122.977 
BM_sendVec_binder128 40169 19188 36461 116.1843 243.2253 
BM_sendVec_binder256 40695 19559 35951 226.1569 470.5484 
BM_sendVec_binder512 41446 20211 34259 423.2159 867.8743 
BM_sendVec_binder1024 44040 22939 28904 672.0639 1290.278 
BM_sendVec_binder2048 47817 25821 26595 1139.063 2109.393 
BM_sendVec_binder4096 54749 30905 22742 1701.423 3014.115 
BM_sendVec_binder8192 68316 42017 16684 2000.634 3252.858 
BM_sendVec_binder16384 95435 64081 10961 1881.752 2802.469 
BM_sendVec_binder32768 148232 107504 6510 1439.093 1984.295 
BM_sendVec_binder65536 326499 229874 3178 637.8991 906.0329 NORAML TEST 
SUM 10355.79 17188.15 stressapptest eat 2G SUM 10088.39 16625.97 adb 
shell stop/ kvmalloc /8+256G 
----------------------------------------------------------------------- 
Benchmark Time CPU Iterations OUTPUT OUTPUTCPU 
----------------------------------------------------------------------- 
BM_sendVec_binder4 39673 18832 36598 3.689965 7.773577 
BM_sendVec_binder8 39869 18969 37188 7.462038 15.68369 
BM_sendVec_binder16 39774 18896 36627 14.73405 31.01355 
BM_sendVec_binder32 40225 19125 36995 29.43045 61.90013 
BM_sendVec_binder64 40549 19529 35148 55.47544 115.1862 
BM_sendVec_binder128 41580 19892 35384 108.9262 227.6871 
BM_sendVec_binder256 41584 20059 34060 209.6806 434.6857 
BM_sendVec_binder512 42829 20899 32493 388.4381 796.0389 
BM_sendVec_binder1024 45037 23360 29251 665.0759 1282.236 
BM_sendVec_binder2048 47853 25761 27091 1159.433 2153.735 
BM_sendVec_binder4096 55574 31745 22405 1651.328 2890.877 
BM_sendVec_binder8192 70706 43693 16400 1900.105 3074.836 
BM_sendVec_binder16384 96161 64362 10793 1838.921 2747.468 
BM_sendVec_binder32768 147875 107292 6296 1395.147 1922.858 
BM_sendVec_binder65536 330324 232296 3053 605.7126 861.3209 NORAML TEST 
SUM 10033.56 16623.35 stressapptest eat 2G SUM 9958.43 16497.55 Can I 
prepare the V4 version of the patch now? Do I need to modify anything 
else in the V4 version, in addition to addressing the following two 
points? 1.Shorten the "backtrace" in the commit message. 2.Modify the 
code indentation to comply with the community's code style requirements.

Thanks,
Lei Liu


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ