lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CADa9hSH9hu1h+Rd5wh7seZ8aON28pjer2HEn=_pk6EqEfJsC7Q@mail.gmail.com>
Date:   Wed, 11 Oct 2017 17:05:32 -0400
From:   Brian McFarland <mcfarlandjb@...il.com>
To:     linux-kernel@...r.kernel.org
Subject: How to fix oom-killer on DRA7x when enabling LPAE?

I'm running on a 4.4.45+ release based on TI's 6AM.1.3 release with an
Android MM user space.

    http://git.omapzoom.org/kernel/?p=kernel/omap.git

LPAE was previously disabled on the system because we were only using
2GB of RAM.  We've expanded that to 4GB of RAM, enabled LPAE to take
advantage of it and now see oom killer crashes.


It's always a low order allocation that fails, though exact source and
gfp_mask varies.

Example oom-killer events / gfp masks:

GLThread 583 invoked oom-killer: gfp_mask=0x24000c4, order=0, oom_score_adj=0
Binder_A invoked oom-killer: gfp_mask=0x26000c0, order=1, oom_score_adj=-705
top invoked oom-killer: gfp_mask=0x24000d0, order=0, oom_score_adj=-1000

Mem info from one such event:

[  358.267219] Mem-Info:
[  358.270438] active_anon:246392 inactive_anon:22796 isolated_anon:0
[  358.270438]  active_file:55368 inactive_file:74077 isolated_file:0
[  358.270438]  unevictable:0 dirty:5 writeback:0 unstable:0
[  358.270438]  slab_reclaimable:3246 slab_unreclaimable:6453
[  358.270438]  mapped:145719 shmem:23090 pagetables:5828 bounce:0
[  358.270438]  free:492267 free_pcp:182 free_cma:10778
[  358.288491] DMA free:23856kB min:2688kB low:7020kB high:7692kB
active_anon:88860kB inactive_anon:89928kB active_file:60kB
inactive_file:92kB unevictable:0kB isolated(anon):0kB
isolated(file):0kB present:782336kB managed:627212kB mlocked:0kB
dirty:0kB writeback:0kB mapped:206996kB shmem:89944kB
slab_reclaimable:12984kB slab_unreclaimable:25812kB
kernel_stack:9552kB pagetables:1556kB unstable:0kB bounce:0kB
free_pcp:0kB local_pcp:0kB free_cma:20812kB writeback_tmp:0kB
pages_scanned:0 all_unreclaimable? no
[  358.312859] lowmem_reserve[]: 0 0 3283 3283
[  358.315307] HighMem free:1949088kB min:512kB low:23908kB
high:27536kB active_anon:896808kB inactive_anon:1256kB
active_file:221312kB inactive_file:292232kB unevictable:0kB
isolated(anon):0kB isolated(file):0kB present:3386368kB
managed:3386368kB mlocked:0kB dirty:20kB writeback:0kB mapped:375880kB
shmem:2416kB slab_reclaimable:0kB slab_unreclaimable:0kB
kernel_stack:0kB pagetables:21756kB unstable:0kB bounce:0kB
free_pcp:792kB local_pcp:120kB free_cma:22300kB writeback_tmp:0kB
pages_scanned:0 all_unreclaimable? no
[  358.342671] lowmem_reserve[]: 0 0 0 0
[  358.344770] DMA: 935*4kB (MEC) 502*8kB (UMEC) 114*16kB (UMC)
29*32kB (UMC) 8*64kB (C) 1*128kB (C) 0*256kB 0*512kB 0*1024kB 0*2048kB
3*4096kB (C) = 23436kB
[  358.354477] HighMem: 9863*4kB (UMC) 5032*8kB (UMC) 1566*16kB (UMC)
501*32kB (UMC) 233*64kB (UMC) 48*128kB (UMC) 20*256kB (UMC) 9*512kB
(UM) 4*1024kB (MC) 0*2048kB 438*4096kB (MC) = 1949724kB
[  358.366813] 151233 total pagecache pages
[  358.369254] 0 pages in swap cache
[  358.371281] Swap cache stats: add 0, delete 0, find 0/0
[  358.378158] Free swap  = 0kB
[  358.382138] Total swap = 0kB
[  358.383621] 1042176 pages RAM
[  358.394011] 846592 pages HighMem/MovableOnly
[  358.402905] platform dabr_udc.0: SETUP     : ff.ff vffff i0000 l0 DATA_IN
[  358.402989] 38781 pages reserved
[  358.402991] 49152 pages cma reserve


Things I've tried (that have failed or come up with no clues):

- I've attempted both removing reserved memory carve outs from our
device tree (normally there for other cores on the SoC), and adjusting
vmalloc size to provide more low memory to the system. I'm able to
grant the kernel about 100MB extra low mem, but the problem still
occurs.

- kmemleak does not show any obvious issues (I thought it might since
the extra 100MB gets swallowed up).

- Ran some tests looking at ftrace for kmem_cache_alloc, kmalloc, and
friends on both non-lpae and lpae configurations.  I'm not seeing any
obvious differences between the two.

Seems strange that order 1 or 0 allocations would fail we're reporting
>20MB free low mem.

About out of ideas to debug this, aside from going through kernel/mm
line by line or trying to understand each of the CONFIG_ARM_LPAE
changes.

Any suggestions would be appreciated.

Regards,
Brian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ