[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <lsq.1556377989.902743036@decadent.org.uk>
Date: Sat, 27 Apr 2019 16:13:09 +0100
From: Ben Hutchings <ben@...adent.org.uk>
To: linux-kernel@...r.kernel.org, stable@...r.kernel.org
CC: akpm@...ux-foundation.org, Denis Kirjanov <kda@...ux-powerpc.org>,
"Vineet Gupta" <vgupta@...opsys.com>
Subject: [PATCH 3.16 051/202] ARC: mm: do_page_fault fixes #1: relinquish
mmap_sem if signal arrives while handle_mm_fault
3.16.66-rc1 review patch. If anyone has any objections, please let me know.
------------------
From: Vineet Gupta <vgupta@...opsys.com>
commit 4d447455e73b47c43dd35fcc38ed823d3182a474 upstream.
do_page_fault() forgot to relinquish mmap_sem if a signal came while
handling handle_mm_fault() - due to say a ctl+c or oom etc.
This would later cause a deadlock by acquiring it twice.
This came to light when running libc testsuite tst-tls3-malloc test but
is likely also the cause for prior seen LTP failures. Using lockdep
clearly showed what the issue was.
| # while true; do ./tst-tls3-malloc ; done
| Didn't expect signal from child: got `Segmentation fault'
| ^C
| ============================================
| WARNING: possible recursive locking detected
| 4.17.0+ #25 Not tainted
| --------------------------------------------
| tst-tls3-malloc/510 is trying to acquire lock:
| 606c7728 (&mm->mmap_sem){++++}, at: __might_fault+0x28/0x5c
|
|but task is already holding lock:
|606c7728 (&mm->mmap_sem){++++}, at: do_page_fault+0x9c/0x2a0
|
| other info that might help us debug this:
| Possible unsafe locking scenario:
|
| CPU0
| ----
| lock(&mm->mmap_sem);
| lock(&mm->mmap_sem);
|
| *** DEADLOCK ***
|
------------------------------------------------------------
What the change does is not obvious (note to myself)
prior code was
| do_page_fault
|
| down_read() <-- lock taken
| handle_mm_fault <-- signal pending as this runs
| if fatal_signal_pending
| if VM_FAULT_ERROR
| up_read
| if user_mode
| return <-- lock still held, this was the BUG
New code
| do_page_fault
|
| down_read() <-- lock taken
| handle_mm_fault <-- signal pending as this runs
| if fatal_signal_pending
| if VM_FAULT_RETRY
| return <-- not same case as above, but still OK since
| core mm already relinq lock for FAULT_RETRY
| ...
|
| < Now falls through for bug case above >
|
| up_read() <-- lock relinquished
Signed-off-by: Vineet Gupta <vgupta@...opsys.com>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@...adent.org.uk>
---
arch/arc/mm/fault.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
--- a/arch/arc/mm/fault.c
+++ b/arch/arc/mm/fault.c
@@ -131,12 +131,17 @@ good_area:
*/
fault = handle_mm_fault(mm, vma, address, flags);
- /* If Pagefault was interrupted by SIGKILL, exit page fault "early" */
if (unlikely(fatal_signal_pending(current))) {
- if ((fault & VM_FAULT_ERROR) && !(fault & VM_FAULT_RETRY))
- up_read(&mm->mmap_sem);
- if (user_mode(regs))
+
+ /*
+ * if fault retry, mmap_sem already relinquished by core mm
+ * so OK to return to user mode (with signal handled first)
+ */
+ if (fault & VM_FAULT_RETRY) {
+ if (!user_mode(regs))
+ goto no_context;
return;
+ }
}
if (likely(!(fault & VM_FAULT_ERROR))) {
Powered by blists - more mailing lists