lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230329031302epcms1p6afc9d9d8e92db6a39c29044606d21afc@epcms1p6>
Date:   Wed, 29 Mar 2023 12:13:02 +0900
From:   Jaewon Kim <jaewon31.kim@...sung.com>
To:     "T.J. Mercier" <tjmercier@...gle.com>
CC:     "jstultz@...gle.com" <jstultz@...gle.com>,
        "sumit.semwal@...aro.org" <sumit.semwal@...aro.org>,
        "daniel.vetter@...ll.ch" <daniel.vetter@...ll.ch>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        "hannes@...xchg.org" <hannes@...xchg.org>,
        "mhocko@...nel.org" <mhocko@...nel.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "jaewon31.kim@...il.com" <jaewon31.kim@...il.com>
Subject: RE: [PATCH] dma-buf/heaps: c9e8440eca61 staging: ion: Fix overflow
 and list bugs in system heap:

>On Tue, Mar 28, 2023 at 5:58?AM Jaewon Kim <jaewon31.kim@...sung.com> wrote:
>>
>> Normal free:212600kB min:7664kB low:57100kB high:106536kB
>>   reserved_highatomic:4096KB active_anon:276kB inactive_anon:180kB
>>   active_file:1200kB inactive_file:0kB unevictable:2932kB
>>   writepending:0kB present:4109312kB managed:3689488kB mlocked:2932kB
>>   pagetables:13600kB bounce:0kB free_pcp:0kB local_pcp:0kB
>>   free_cma:200844kB
>> Out of memory and no killable processes...
>> Kernel panic - not syncing: System is deadlocked on memory
>>
>> An OoM panic was reported, there were only native processes which are
>> non-killable as OOM_SCORE_ADJ_MIN.
>>
>> After looking into the dump, I've found the dma-buf system heap was
>> trying to allocate a huge size. It seems to be a signed negative value.
>>
>> dma_heap_ioctl_allocate(inline)
>>     |  heap_allocation = 0xFFFFFFC02247BD38 -> (
>>     |    len = 0xFFFFFFFFE7225100,
>>
>> Actually the old ion system heap had policy which does not allow that
>> huge size with commit c9e8440eca61 ("staging: ion: Fix overflow and list
>> bugs in system heap"). We need this change again. Single allocation
>> should not be bigger than half of all memory.
>>
>> Signed-off-by: Jaewon Kim <jaewon31.kim@...sung.com>
>> ---
>>  drivers/dma-buf/heaps/system_heap.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
>> index e8bd10e60998..4c1ef2ecfb0f 100644
>> --- a/drivers/dma-buf/heaps/system_heap.c
>> +++ b/drivers/dma-buf/heaps/system_heap.c
>> @@ -351,6 +351,9 @@ static struct dma_buf *system_heap_allocate(struct dma_heap *heap,
>>         struct page *page, *tmp_page;
>>         int i, ret = -ENOMEM;
>>
>> +       if (len / PAGE_SIZE > totalram_pages() / 2)
>> +               return ERR_PTR(-ENOMEM);
>> +
>
>Instead of policy like that, would __GFP_RETRY_MAYFAIL on the system
>heap's LOW_ORDER_GFP flags also avoid the panic, and eventually fail
>the allocation request?

Hello T.J.

Thank you for your opinion.
The __GFP_RETRY_MAYFAIL on LOW_ORDER_GFP seems to work.

page allocation failure: order:0, mode:0x144dc2(GFP_HIGHUSER|__GFP_RETRY_MAYFAIL|__GFP_COMP|__GFP_ZERO)
Node 0 active_anon:120kB inactive_anon:43012kB active_file:36kB inactive_file:788kB 

I tried to test it, and the allocation stopped at very low file cache situation without OoM panic
as we expected. The phone device was freezing for few seconds though.

We can avoid OoM panic through either totalram_pages() / 2 check or __GFP_RETRY_MAYFAIL.

But I think we still need the totalram_pages() / 2 check so that we don't have to suffer
the freezing in UX perspective. We may kill some critical processes or users' recent apps.

Regarding __GFP_RETRY_MAYFAIL, I think it will help us avoid OoM panic. But I'm worried
about low memory devices which still need OoM kill to get memory like in camera scenarios.

So what do you think?

Thank you
Jaewon Kim

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ