linux-kernel - Re: [GIT PULL] x86/shstk for 6.4

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <93ae88a4-1dac-77bf-37f6-f8708a6d83b7@intel.com>
Date:   Mon, 15 May 2023 14:36:54 -0700
From:   Dave Hansen <dave.hansen@...el.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>,
        "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
        "keescook@...omium.org" <keescook@...omium.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "x86@...nel.org" <x86@...nel.org>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        Peter Zijlstra <peterz@...radead.org>
Subject: Re: [GIT PULL] x86/shstk for 6.4

On 5/12/23 14:55, Linus Torvalds wrote:
>> There's always a race there because mm->mm_users can always get bumped
>> after the fork()er checks it.
> Ugh. I was assuming that if they don't already have a reference to the
> mm, they can't get one (becasue the '1' comes from 'current->mm', and
> nobody else has that mm).
> 
> But maybe that's just not true. Looking around, we have things like
> 
>         pages->source_mm = current->mm;
>         mmgrab(pages->source_mm);
> 
> that creates a ref to the mm with a grab, and then later it gets used.
> 
> So maybe the "no races can happen" is limited to *both* mm_users and
> mm_count being 1. What would increment it in that case, when 'current'
> is obviously busy forking?
> 
> That "both are one" might still be the common case for fork(). Hmm?

get_task_mm() is the devil here.  It goes right from having a
task_struct to bumping ->mm_users, no ->mm_count needed.  It also bumps
->mm_users while holding task_lock(), which means we can't do something
simple like take mmap_lock in there to avoid racing with fork().

I did hack something together that seems to work for fork() and
get_task_mm().  Basically, we let get_task_mm()'s legacy behavior to be
the fast path.  But it diverts over to a slow path if get_task_mm()
notices that an mm's refcounts and mmap_lock are consistent with a
fork() happening elsewhere.

The slow path releases the task_lock() and acquires mmap_lock so it can
sleep until the (potential) fork() is finished.

On the other side, the fork() code checks ->mm_users and ->mm_count.  It
can now depend on them being stable because it holds mmap_lock and it
diverts the get_task_mm() callers over to the slow path.

This works for two important cases:

1. get_task_mm() callers since they now conditionally use mmap_lock
2. mmgrab() -> mmget_not_zero() users that later take the mmap_lock

I'm also fairly sure it misses some cases outside of those two.  The
patch is also quite ugly.  The "->task_doing_fast_fork" mechanism is
pure hackery, for instance.

This seems to stay on the fast paths pretty well, even with 'top' or
some other /proc poking going on.  In the end, this is balancing the
extra cost of the get_task_mm() slow path with reduced atomic cost in
the fork() path.  It looks promising so far.

Is this worth pursuing?