lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230821011303.hoeqjbmjaxajh255@f>
Date:   Mon, 21 Aug 2023 03:13:03 +0200
From:   Mateusz Guzik <mjguzik@...il.com>
To:     Matthew Wilcox <willy@...radead.org>
Cc:     torvalds@...ux-foundation.org, akpm@...ux-foundation.org,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH] mm: remove unintentional voluntary preemption in
 get_mmap_lock_carefully

On Sun, Aug 20, 2023 at 07:12:16PM +0100, Matthew Wilcox wrote:
> On Sun, Aug 20, 2023 at 12:43:03PM +0200, Mateusz Guzik wrote:
> > Found by checking off-CPU time during kernel build (like so:
> > "offcputime-bpfcc -Ku"), sample backtrace:
> >     finish_task_switch.isra.0
> >     __schedule
> >     __cond_resched
> >     lock_mm_and_find_vma
> >     do_user_addr_fault
> >     exc_page_fault
> >     asm_exc_page_fault
> >     -                sh (4502)
> 
> Now I'm awake, this backtrace really surprises me.  Do we not check
> need_resched on entry?  It seems terribly unlikely that need_resched
> gets set between entry and getting to this point, so I guess we must
> not.
> 
> I suggest the version of the patch which puts might_sleep() before the
> mmap_read_trylock() is the right one to apply.  It's basically what
> we've done forever, except that now we'll be rescheduling without the
> mmap lock held, which just seems like an overall win.
> 

I can't sleep and your response made me curious, is that really safe
here?

As I wrote in another email, the routine is concerned with a case of the
kernel faulting on something it should not have. For a case like that I
find rescheduling to another thread to be most concerning.

That said I think I found a winner -- add need_resched() prior to
trylock.

This adds less work than you would have added with might_sleep (a func
call), still respects the preemption point, dodges exception table
checks in the common case and does not switch away if the there is
anything fishy going on.

Or just do that might_sleep.

I'm really buggering off the subject now.

====

mm: remove unintentional voluntary preemption in get_mmap_lock_carefully

Should the trylock succeed (and thus blocking was avoided), the routine
wants to ensure blocking was still legal to do. However, might_sleep()
used ends up calling __cond_resched() injecting a voluntary preemption
point with the freshly acquired lock.

__might_sleep() instead with the lock, but check for preemption prior to
taking it.

Found by checking off-CPU time during kernel build (like so:
"offcputime-bpfcc -Ku"), sample backtrace:
    finish_task_switch.isra.0
    __schedule
    __cond_resched
    lock_mm_and_find_vma
    do_user_addr_fault
    exc_page_fault
    asm_exc_page_fault
    -                sh (4502)
        10

Signed-off-by: Mateusz Guzik <mjguzik@...il.com>
---
 mm/memory.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 1ec1ef3418bf..6dac9dbb7b59 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5258,8 +5258,8 @@ EXPORT_SYMBOL_GPL(handle_mm_fault);
 static inline bool get_mmap_lock_carefully(struct mm_struct *mm, struct pt_regs *regs)
 {
 	/* Even if this succeeds, make it clear we *might* have slept */
-	if (likely(mmap_read_trylock(mm))) {
-		might_sleep();
+	if (likely(!need_resched() && mmap_read_trylock(mm))) {
+		__might_sleep(__FILE__, __LINE__);
 		return true;
 	}
 
-- 
2.39.2

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ