[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210520114257.huqhkqsdrhohn3u5@ava.usersys.com>
Date: Thu, 20 May 2021 12:42:57 +0100
From: Aaron Tomlin <atomlin@...hat.com>
To: Vlastimil Babka <vbabka@...e.cz>
Cc: Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
mhocko@...e.com, willy@...radead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3] mm/page_alloc: bail out on fatal signal during
reclaim/compaction retry attempt
On Thu 2021-05-20 12:20 +0200, Vlastimil Babka wrote:
> On 5/20/21 6:34 AM, Andrew Morton wrote:
> >
> > What observed problems motivated this change?
> >
> > What were the observed runtime effects of this change?
>
> Yep those details from the previous thread should be included here.
Fair enough.
During kernel crash dump/or vmcore analysis: I discovered in the context of
__alloc_pages_slowpath() the value stored in the no_progress_loops variable
was found to be 31,611,688 i.e. well above MAX_RECLAIM_RETRIES; and a fatal
signal was pending against current.
#6 [ffff00002e78f7c0] do_try_to_free_pages+0xe4 at ffff00001028bd24
#7 [ffff00002e78f840] try_to_free_pages+0xe4 at ffff00001028c0f4
#8 [ffff00002e78f900] __alloc_pages_nodemask+0x500 at ffff0000102cd130
// w28 = *(sp + 148) /* no_progress_loops */
0xffff0000102cd1e0 <__alloc_pages_nodemask+0x5b0>: ldr w0, [sp,#148]
// w0 = w0 + 0x1
0xffff0000102cd1e4 <__alloc_pages_nodemask+0x5b4>: add w0, w0, #0x1
// *(sp + 148) = w0
0xffff0000102cd1e8 <__alloc_pages_nodemask+0x5b8>: str w0, [sp,#148]
// if (w0 >= 0x10)
// goto __alloc_pages_nodemask+0x904
0xffff0000102cd1ec <__alloc_pages_nodemask+0x5bc>: cmp w0, #0x10
0xffff0000102cd1f0 <__alloc_pages_nodemask+0x5c0>: b.gt 0xffff0000102cd534
- The stack pointer was 0xffff00002e78f900
crash> p *(int *)(0xffff00002e78f900+148)
$1 = 31611688
crash> ps 521171
PID PPID CPU TASK ST %MEM VSZ RSS COMM
> 521171 1 36 ffff8080e2128800 RU 0.0 34789440 18624 special
crash> p &((struct task_struct *)0xffff8080e2128800)->signal.shared_pending
$2 = (struct sigpending *) 0xffff80809a416e40
crash> p ((struct sigpending *)0xffff80809a416e40)->signal.sig[0]
$3 = 0x804100
crash> sig -s 0x804100
SIGKILL SIGTERM SIGXCPU
crash> p ((struct sigpending *)0xffff80809a416e40)->signal.sig[0] & 1U << (9 - 1)
$4 = 0x100
Unfortunately, this incident was not reproduced, to date.
Kind regards,
--
Aaron Tomlin
Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)
Powered by blists - more mailing lists