lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1561411839.5154.60.camel@lca.pw>
Date:   Mon, 24 Jun 2019 17:30:39 -0400
From:   Qian Cai <cai@....pw>
To:     Will Deacon <will@...nel.org>
Cc:     Anshuman Khandual <anshuman.khandual@....com>,
        Catalin Marinas <catalin.marinas@....com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
        Mike Kravetz <mike.kravetz@...cle.com>
Subject: Re: LTP hugemmap05 test case failure on arm64 with linux-next
 (next-20190613)

So the problem is that ipcget_public() has held the semaphore "ids->rwsem" for
too long seems unnecessarily and then goes to sleep sometimes due to direct
reclaim (other times LTP hugemmap05 [1] has hugetlb_file_setup() returns
-ENOMEM),

[  788.765739][ T1315] INFO: task hugemmap05:5001 can't die for more than 122
seconds.
[  788.773512][ T1315] hugemmap05      R  running task    25600  5001      1
0x0000000d
[  788.781348][ T1315] Call trace:
[  788.784536][ T1315]  __switch_to+0x2e0/0x37c
[  788.788848][ T1315]  try_to_free_pages+0x614/0x934
[  788.793679][ T1315]  __alloc_pages_nodemask+0xe88/0x1d60
[  788.799030][ T1315]  alloc_fresh_huge_page+0x16c/0x588
[  788.804206][ T1315]  alloc_surplus_huge_page+0x9c/0x278
[  788.809468][ T1315]  hugetlb_acct_memory+0x114/0x5c4
[  788.814469][ T1315]  hugetlb_reserve_pages+0x170/0x2b0
[  788.819662][ T1315]  hugetlb_file_setup+0x26c/0x3a8
[  788.824600][ T1315]  newseg+0x220/0x63c
[  788.828490][ T1315]  ipcget+0x570/0x674
[  788.832377][ T1315]  ksys_shmget+0x90/0xc4
[  788.836525][ T1315]  __arm64_sys_shmget+0x54/0x88
[  788.841282][ T1315]  el0_svc_handler+0x19c/0x26c
[  788.845952][ T1315]  el0_svc+0x8/0xc

and then all other processes are waiting on the semaphore causes lock
contentions,

[  788.849583][ T1315] INFO: task hugemmap05:5027 blocked for more than 122
seconds.
[  788.857119][ T1315]       Tainted: G        W         5.2.0-rc6-next-20190624 
#2
[  788.864566][ T1315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  788.873139][ T1315] hugemmap05      D26960  5027   5026 0x00000000
[  788.879395][ T1315] Call trace:
[  788.882576][ T1315]  __switch_to+0x2e0/0x37c
[  788.886901][ T1315]  __schedule+0xb74/0xf0c
[  788.891136][ T1315]  schedule+0x60/0x168
[  788.895097][ T1315]  rwsem_down_write_slowpath+0x5a0/0x8c8
[  788.900653][ T1315]  down_write+0xc0/0xc4
[  788.904715][ T1315]  ipcget+0x74/0x674
[  788.908516][ T1315]  ksys_shmget+0x90/0xc4
[  788.912664][ T1315]  __arm64_sys_shmget+0x54/0x88
[  788.917420][ T1315]  el0_svc_handler+0x19c/0x26c
[  788.922088][ T1315]  el0_svc+0x8/0xc

Ideally, it seems only ipc_findkey() and newseg() in this path needs to hold the
semaphore to protect concurrency access, so it could just be converted to a
spinlock instead.

[1] ./hugemmap05 -s -m

https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/mem/huget
lb/hugemmap/hugemmap05.c

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ