linux-kernel - Re: [PATCH v3] binder_alloc: Replace kcalloc with kvcalloc to mitigate OOM issues

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c46a07f5-f504-4c6f-af54-cfa00f987ce3@vivo.com>
Date: Mon, 17 Jun 2024 12:01:26 +0800
From: Lei Liu <liulei.rjpt@...o.com>
To: Carlos Llamas <cmllamas@...gle.com>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
 Arve Hjønnevåg <arve@...roid.com>,
 Todd Kjos <tkjos@...roid.com>, Martijn Coenen <maco@...roid.com>,
 Joel Fernandes <joel@...lfernandes.org>,
 Christian Brauner <brauner@...nel.org>,
 Suren Baghdasaryan <surenb@...gle.com>, linux-kernel@...r.kernel.org,
 opensource.kernel@...o.com
Subject: Re: [PATCH v3] binder_alloc: Replace kcalloc with kvcalloc to
 mitigate OOM issues


On 6/15/2024 at 2:38, Carlos Llamas wrote:
> On Fri, Jun 14, 2024 at 12:09:29PM +0800, Lei Liu wrote:
>> 1.In binder_alloc, there is a frequent need for order3 memory
>> allocation, especially on small-memory mobile devices, which can lead
>> to OOM and cause foreground applications to be killed, resulting in
>> flashbacks.The kernel call stack after the issue occurred is as follows:
>> dumpsys invoked oom-killer:
>> gfp_mask=0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), order=3,
>> oom_score_adj=-950
>> CPU: 6 PID: 31329 Comm: dumpsys Tainted: G        WC O
>> 5.10.168-android12-9-00003-gc873b6b86254-ab10823632 #1
>> Call trace:
>>   dump_backtrace.cfi_jt+0x0/0x8
>>   dump_stack_lvl+0xdc/0x138
>>   dump_header+0x5c/0x2ac
>>   oom_kill_process+0x124/0x304
>>   out_of_memory+0x25c/0x5e0
>>   __alloc_pages_slowpath+0x690/0xf6c
>>   __alloc_pages_nodemask+0x1f4/0x3dc
>>   kmalloc_order+0x54/0x338
>>   kmalloc_order_trace+0x34/0x1bc
>>   __kmalloc+0x5e8/0x9c0
>>   binder_alloc_mmap_handler+0x88/0x1f8
>>   binder_mmap+0x90/0x10c
>>   mmap_region+0x44c/0xc14
>>   do_mmap+0x518/0x680
>>   vm_mmap_pgoff+0x15c/0x378
>>   ksys_mmap_pgoff+0x80/0x108
>>   __arm64_sys_mmap+0x38/0x48
>>   el0_svc_common+0xd4/0x270
>>   el0_svc+0x28/0x98
>>   el0_sync_handler+0x8c/0xf0
>>   el0_sync+0x1b4/0x1c0
>> Mem-Info:
>> active_anon:47096 inactive_anon:57927 isolated_anon:100
>> active_file:43790 inactive_file:44434 isolated_file:0
>> unevictable:14693 dirty:171 writeback:0\x0a slab_reclaimable:21676
>> slab_unreclaimable:81771\x0a mapped:84485 shmem:4275 pagetables:33367
>> bounce:0\x0a free:3772 free_pcp:198 free_cma:11
>> Node 0 active_anon:188384kB inactive_anon:231708kB active_file:175160kB
>> inactive_file:177736kB unevictable:58772kB isolated(anon):400kB
>> isolated(file):0kB mapped:337940kB dirty:684kB writeback:0kB
>> shmem:17100kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB
>> writeback_tmp:0kB kernel_stack:84960kB shadow_call_stack:21340kB
>> Normal free:15088kB min:8192kB low:42616kB high:46164kB
>> reserved_highatomic:4096KB active_anon:187644kB inactive_anon:231608kB
>> active_file:174552kB inactive_file:178012kB unevictable:58772kB
>> writepending:684kB present:3701440kB managed:3550144kB mlocked:58508kB
>> pagetables:133468kB bounce:0kB free_pcp:1048kB local_pcp:12kB
>> free_cma:44kB
>> Normal: 3313*4kB (UMEH) 165*8kB (UMEH) 35*16kB (H) 15*32kB (H) 0*64kB
>> 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15612kB
>> 108356 total pagecache pages
> Think about indenting this stacktrace. IMO, the v1 had a commit log that
> was much easier to follow.
Hmm, okay, your suggestion is good. I will consider updating another 
version later as per your suggestion, and trim the stack.
>> 2.We use kvcalloc to allocate memory, which can reduce system OOM
>> occurrences, as well as decrease the time and probability of failure
>> for order3 memory allocations. Additionally, it can also improve the
>> throughput of binder (as verified by Google's binder_benchmark testing
>> tool).
>>
>> 3.We have conducted multiple tests on an 12GB memory phone, and the
>> performance of kvcalloc is better. Below is a partial excerpt of the
>> test data.
>> throughput = (size * Iterations)/Time
> Huh? Do you have an explanation for this performance improvement?
> Did you test this under memory pressure?
Hmm, in our mobile project, we often encounter OOM and application 
crashes under stress testing.
> My understanding is that kvcalloc() == kcalloc() if there is enough
> contiguous memory no?
>
> I would expect the performance to be the same at best.

1.The main reason is memory fragmentation, where we are unable to 
allocate contiguous order3 memory. Additionally, using the GFP_KERNEL 
allocation flag in the kernel's __alloc_pages_slowpath function results 
in multiple retry attempts, and if direct_reclaim and memory_compact are 
unsuccessful, OOM occurs.

2.When fragmentation is severe, we observed that kvmalloc is faster than 
kmalloc, as it eliminates the need for multiple retry attempts to 
allocate order3. In such cases, falling back to order0 may result in 
higher allocation efficiency.

3.Another crucial point is that in the kernel, allocations greater than 
order3 are considered PAGE_ALLOC_COSTLY_ORDER. This leads to a reduced 
number of retry attempts in __alloc_pages_slowpath, which explains the 
increased time for order3 allocation in fragmented scenarios.

In summary, under high memory pressure scenarios, the system is prone to 
fragmentation. Instead of waiting for order3 allocation, it is more 
efficient to allow kvmalloc to automatically select between order0 and 
order3, reducing wait times in high memory pressure scenarios. This is 
also the reason why kvmalloc can improve throughput.

>> kvcalloc->kvmalloc:
>> Benchmark-kvcalloc	Time	CPU	Iterations	throughput(Gb/s)
>> ----------------------------------------------------------------
>> BM_sendVec_binder-4096	30926 ns	20481 ns	34457	4563.66↑
>> BM_sendVec_binder-8192	42667 ns	30837 ns	22631	4345.11↑
>> BM_sendVec_binder-16384	67586 ns	52381 ns	13318	3228.51↑
>> BM_sendVec_binder-32768	116496 ns	94893 ns	7416	2085.97↑
>> BM_sendVec_binder-65536	265482 ns	209214 ns	3530	871.40↑
>>
>> kcalloc->kmalloc
>> Benchmark-kcalloc	Time	CPU	Iterations	throughput(Gb/s)
>> ----------------------------------------------------------------
>> BM_sendVec_binder-4096	39070 ns	24207 ns	31063	3256.56
>> BM_sendVec_binder-8192	49476 ns	35099 ns	18817	3115.62
>> BM_sendVec_binder-16384	76866 ns	58924 ns	11883	2532.86
>> BM_sendVec_binder-32768	134022 ns	102788 ns	6535	1597.78
>> BM_sendVec_binder-65536	281004 ns	220028 ns	3135	731.14
>>
>> Signed-off-by: Lei Liu <liulei.rjpt@...o.com>
>> ---
>> Changelog:
>> v2->v3:
>> 1.Modify the commit message description as the description for the V2
>>    version is unclear.
> The complete history of the changelog would be better.
>
>> ---
>>   drivers/android/binder_alloc.c | 6 +++---
>>   1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
>> index 2e1f261ec5c8..5dcab4a5e341 100644
>> --- a/drivers/android/binder_alloc.c
>> +++ b/drivers/android/binder_alloc.c
>> @@ -836,7 +836,7 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc,
>>   
>>   	alloc->buffer = vma->vm_start;
>>   
>> -	alloc->pages = kcalloc(alloc->buffer_size / PAGE_SIZE,
>> +	alloc->pages = kvcalloc(alloc->buffer_size / PAGE_SIZE,
>>   			       sizeof(alloc->pages[0]),
>>   			       GFP_KERNEL);
> I believe Greg had asked for these to be aligned to the parenthesis.
> You can double check by running checkpatch with the -strict flag.
Okay, I'll double check the format of the paths again.
>>   	if (alloc->pages == NULL) {
>> @@ -869,7 +869,7 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc,
>>   	return 0;
>>   
>>   err_alloc_buf_struct_failed:
>> -	kfree(alloc->pages);
>> +	kvfree(alloc->pages);
>>   	alloc->pages = NULL;
>>   err_alloc_pages_failed:
>>   	alloc->buffer = 0;
>> @@ -939,7 +939,7 @@ void binder_alloc_deferred_release(struct binder_alloc *alloc)
>>   			__free_page(alloc->pages[i].page_ptr);
>>   			page_count++;
>>   		}
>> -		kfree(alloc->pages);
>> +		kvfree(alloc->pages);
>>   	}
>>   	spin_unlock(&alloc->lock);
>>   	if (alloc->mm)
>> -- 
>> 2.34.1
>>
> I'm not so sure about the results and performance improvements that are
> claimed here. However, the switch to kvcalloc() itself seems reasonable
> to me.
>
> I'll run these tests myself as the results might have some noise. I'll
> get back with the results.
>
> Thanks,
> Carlos Llamas

Okay, thank you for the suggestion. I look forward to receiving your 
test results and continuing our discussion.

My testing tool is the binder throughput testing tool provided by 
Google. You can give it a try here:

https://source.android.com/docs/core/tests/vts/performance


Thanks,

Lei liu

>