lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 12 Jun 2018 21:08:27 -0700
From:   John Stultz <john.stultz@...aro.org>
To:     Tejun Heo <tj@...nel.org>, Johannes Weiner <hannes@...xchg.org>,
        Michal Hocko <mhocko@...e.com>, Roman Gushchin <guro@...com>
Cc:     lkml <linux-kernel@...r.kernel.org>
Subject: Re: OOPSes in mem_cgroup_protected

On Tue, Jun 12, 2018 at 6:02 PM, John Stultz <john.stultz@...aro.org> wrote:
> Hey Tejun,
>   With the current linus/master, I'm able to fairly regularly trip
> OOPSes (two examples below) in mem_cgroup_protected(), which seems to
> be new.  I haven't managed to trigger this sort of thing with v4.17.
>
> I've not had much time to dig in or bisect it - I only know that
> enabling most of the memory debuging config options didn't seem to
> trip anything prior to the issue. So I wanted to send you a heads up
> to see if there was already known, or if there was anything you might
> suggest to help chase this down.


So the line where we're crashing seems to be in mem_cgroup_protected():
  parent_emin = READ_ONCE(parent->memory.emin);

where I'm guessing the parent->memory value is null, and emin is at
the 0x120 offset in the strucutre.

Reverting the following commits seems to avoid the issue.
bf8d5d52ffe8 ("memcg: introduce memory.min")
5f93ad67436b ("mm: treat memory.low value inclusive")
230671533d64 ("mm: memory.low hierarchical behavior")

I'm guessing I'm tripping over some path where the memory value never
gets initialized?

Any ideas or suggestions?

thanks
-john

(usually I'd trim the backtraces below, but keeping them as I added
Roman to the CC list)

> console:/ $ [  170.530896] Unable to handle kernel read from
> unreadable memory at virtual address 0000000000000120
> [  170.540158] Mem abort info:
> [  170.543092]   ESR = 0x96000005
> [  170.546193]   Exception class = DABT (current EL), IL = 32 bits
> [  170.552251]   SET = 0, FnV = 0
> [  170.555444]   EA = 0, S1PTW = 0
> [  170.558698] Data abort info:
> [  170.561624]   ISV = 0, ISS = 0x00000005
> [  170.565572]   CM = 0, WnR = 0
> [  170.568650] user pgtable: 4k pages, 39-bit VAs, pgdp = 00000000190bb04e
> [  170.575374] [0000000000000120] pgd=0000000000000000, pud=0000000000000000
> [  170.582297] Internal error: Oops: 96000005 [#1] PREEMPT SMP
> [  170.587929] CPU: 7 PID: 663 Comm: kswapd0 Not tainted
> 4.17.0-11699-gb4f23f3 #411
> [  170.595358] Hardware name: HiKey Development Board (DT)
> [  170.600623] pstate: a0400005 (NzCv daif +PAN -UAO)
> [  170.605478] pc : mem_cgroup_protected+0x34/0x120
> [  170.610142] lr : shrink_node+0x120/0x478
> [  170.614093] sp : ffffff8009d23c50
> [  170.617438] x29: ffffff8009d23c50 x28: ffffff8009d23d48
> [  170.622808] x27: ffffffc074ca1000 x26: ffffff8009d23e28
> [  170.628160] x25: ffffff8009d23d88 x24: 0000000000000000
> [  170.633481] x23: 0000000000000000 x22: ffffff8009071f80
> [  170.638802] x21: 0000000000000012 x20: 0000000000000012
> [  170.644124] x19: 0000000000000000 x18: 0000000000000400
> [  170.649444] x17: 0000000000000000 x16: ffffffc074ca2000
> [  170.654765] x15: 0000000000000000 x14: 0000000000000400
> [  170.660087] x13: 00000000000000b1 x12: 0000000000000003
> [  170.665408] x11: 0000000000000020 x10: 0000000000000000
> [  170.670729] x9 : 0000000000000001 x8 : 0000000000000004
> [  170.676050] x7 : ffffffc074d43c00 x6 : 0000000000000000
> [  170.681370] x5 : 0000000000000000 x4 : 0000000000000000
> [  170.686690] x3 : 000000000000dafa x2 : 0000000000000000
> [  170.692010] x1 : ffffffc074ca1000 x0 : ffffffc0386e8000
> [  170.697335] Process kswapd0 (pid: 663, stack limit = 0x00000000e0f0ae51)
> [  170.704039] Call trace:
> [  170.706497]  mem_cgroup_protected+0x34/0x120
> [  170.710775]  balance_pgdat+0x1cc/0x418
> [  170.714529]  kswapd+0x180/0x3b8
> [  170.717674]  kthread+0xf8/0x128
> [  170.720824]  ret_from_fork+0x10/0x18
> [  170.724411] Code: b40007a2 d103e042 eb02001f 540006c0 (f9409046)
> [  170.730542] ---[ end trace 7c961b6d409886f1 ]---
> [  170.839299] Kernel panic - not syncing: Fatal exception
> [  170.844549] SMP: stopping secondary CPUs
> [  170.848488] Kernel Offset: disabled
> [  170.851982] CPU features: 0x24802004
> [  170.855556] Memory Limit: none
> [  170.888494] Rebooting in 5 seconds..
>
>
>
>
> console:/ # [  348.612152] Unable to handle kernel read from
> unreadable memory at virtual address 0000000000000120
> [  348.617384] Unable to handle kernel access to user memory outside
> uaccess routines at virtual address 0000000000000120
> [  348.621360] Mem abort info:
> [  348.632086] Mem abort info:
> [  348.634870]   ESR = 0x96000005
> [  348.634885]   Exception class = DABT (current EL), IL = 32 bits
> [  348.637686]   ESR = 0x96000005
> [  348.640785]   SET = 0, FnV = 0
> [  348.646740]   Exception class = DABT (current EL), IL = 32 bits
> [  348.649799]   EA = 0, S1PTW = 0
> [  348.652892]   SET = 0, FnV = 0
> [  348.652901]   EA = 0, S1PTW = 0
> [  348.652913] Data abort info:
> [  348.658905] Data abort info:
> [  348.662041]   ISV = 0, ISS = 0x00000005
> [  348.662050]   CM = 0, WnR = 0
> [  348.662071] user pgtable: 4k pages, 39-bit VAs, pgdp = 00000000697cecc4
> [  348.665129]   ISV = 0, ISS = 0x00000005
> [  348.668298] [0000000000000120] pgd=000000003a915003, pud=000000003a915003
> [  348.671224]   CM = 0, WnR = 0
> [  348.671242] user pgtable: 4k pages, 39-bit VAs, pgdp = 00000000c568bd29
> [  348.674193] , pmd=0000000000000000
> [  348.678021] [0000000000000120] pgd=0000000000000000, pud=0000000000000000
> [  348.691540] Internal error: Oops: 96000005 [#1] PREEMPT SMP
> [  348.723733] CPU: 5 PID: 3246 Comm: CrRendererMain Not tainted
> 4.17.0-11699-gb4f23f3 #412
> [  348.731857] Hardware name: HiKey Development Board (DT)
> [  348.737121] pstate: a0400005 (NzCv daif +PAN -UAO)
> [  348.741975] pc : mem_cgroup_protected+0x34/0x120
> [  348.746640] lr : shrink_node+0x120/0x478
> [  348.750590] sp : ffffff800ac9b8a0
> [  348.753934] x29: ffffff800ac9b8a0 x28: ffffff800ac9b9d8
> [  348.759304] x27: ffffffc071982480 x26: ffffff800ac9bb30
> [  348.764673] x25: ffffff800ac9ba18 x24: 0000000000000000
> [  348.770038] x23: 0000000000000000 x22: ffffff8009113d00
> [  348.775404] x21: 000000000000000f x20: 000000000000000f
> [  348.780769] x19: 0000000000000000 x18: 0000000000000000
> [  348.786134] x17: 0000000000000000 x16: ffffffc071985a80
> [  348.791500] x15: 0000000000000000 x14: 00000000d5e75c2f
> [  348.796868] x13: 00000000d7237d18 x12: 0000000000000003
> [  348.802233] x11: 0000000000000020 x10: 0000000000000000
> [  348.807598] x9 : 0000000000000001 x8 : 0000000000000004
> [  348.812963] x7 : ffffffc072d58c80 x6 : 0000000000000000
> [  348.818311] x5 : 0000000000000000 x4 : 0000000000000000
> [  348.823626] x3 : 000000000000e1fc x2 : 0000000000000000
> [  348.828941] x1 : ffffffc071982480 x0 : ffffffc038700080
> [  348.834258] Process CrRendererMain (pid: 3246, stack limit =
> 0x00000000b82069c1)
> [  348.841652] Call trace:
> [  348.844100]  mem_cgroup_protected+0x34/0x120
> [  348.848370]  do_try_to_free_pages+0xd0/0x3c0
> [  348.852639]  try_to_free_pages+0xf8/0x120
> [  348.856651]  __alloc_pages_nodemask+0x460/0xb68
> [  348.861181]  do_huge_pmd_anonymous_page+0x328/0x7d8
> [  348.866061]  __handle_mm_fault+0x57c/0xea0
> [  348.870157]  handle_mm_fault+0x128/0x1f8
> [  348.874082]  do_page_fault+0x1d0/0x490
> [  348.877830]  do_translation_fault+0x5c/0x68
> [  348.882012]  do_mem_abort+0x54/0x118
> [  348.885587]  el0_da+0x20/0x24
> [  348.888557] Code: b40007a2 d103e042 eb02001f 540006c0 (f9409046)
> [  348.894651] ---[ end trace 58afd90183767ac2 ]---
> [  348.942150] Kernel panic - not syncing: Fatal exception
> [  348.947448] SMP: stopping secondary CPUs
> [  349.784747] SMP: failed to stop secondary CPUs 2,5
> [  349.789569] Kernel Offset: disabled
> [  349.793089] CPU features: 0x24802004
> [  349.796691] Memory Limit: none
> [  349.909567] Rebooting in 5 seconds..

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ