linux-kernel - Re: [PATCH v4 29/33] x86/mm: try VMA lock-based page fault handling first

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJuCfpGLozRzxE9KZehgW1dAYpNxe4b+nnjH+ppbeAuFtRNGBQ@mail.gmail.com>
Date:   Fri, 30 Jun 2023 10:40:29 -0700
From:   Suren Baghdasaryan <surenb@...gle.com>
To:     Jiri Slaby <jirislaby@...nel.org>
Cc:     akpm@...ux-foundation.org, michel@...pinasse.org,
        jglisse@...gle.com, mhocko@...e.com, vbabka@...e.cz,
        hannes@...xchg.org, mgorman@...hsingularity.net, dave@...olabs.net,
        willy@...radead.org, liam.howlett@...cle.com, peterz@...radead.org,
        ldufour@...ux.ibm.com, paulmck@...nel.org, mingo@...hat.com,
        will@...nel.org, luto@...nel.org, songliubraving@...com,
        peterx@...hat.com, david@...hat.com, dhowells@...hat.com,
        hughd@...gle.com, bigeasy@...utronix.de, kent.overstreet@...ux.dev,
        punit.agrawal@...edance.com, lstoakes@...il.com,
        peterjung1337@...il.com, rientjes@...gle.com, chriscli@...gle.com,
        axelrasmussen@...gle.com, joelaf@...gle.com, minchan@...gle.com,
        rppt@...nel.org, jannh@...gle.com, shakeelb@...gle.com,
        tatashin@...gle.com, edumazet@...gle.com, gthelen@...gle.com,
        gurua@...gle.com, arjunroy@...gle.com, soheil@...gle.com,
        leewalsh@...gle.com, posk@...gle.com,
        michalechner92@...glemail.com, linux-mm@...ck.org,
        linux-arm-kernel@...ts.infradead.org,
        linuxppc-dev@...ts.ozlabs.org, x86@...nel.org,
        linux-kernel@...r.kernel.org, kernel-team@...roid.com
Subject: Re: [PATCH v4 29/33] x86/mm: try VMA lock-based page fault handling first

On Fri, Jun 30, 2023 at 1:43 AM Jiri Slaby <jirislaby@...nel.org> wrote:
>
> On 30. 06. 23, 10:28, Jiri Slaby wrote:
> >  > 2348
> > clone3({flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, child_tid=0x7fcaa5882990, parent_tid=0x7fcaa5882990, exit_signal=0, stack=0x7fcaa5082000, stack_size=0x7ffe00, tls=0x7fcaa58826c0} => {parent_tid=[2351]}, 88) = 2351
> >  > 2350  <... clone3 resumed> => {parent_tid=[2372]}, 88) = 2372
> >  > 2351  <... clone3 resumed> => {parent_tid=[2354]}, 88) = 2354
> >  > 2351  <... clone3 resumed> => {parent_tid=[2357]}, 88) = 2357
> >  > 2354  <... clone3 resumed> => {parent_tid=[2355]}, 88) = 2355
> >  > 2355  <... clone3 resumed> => {parent_tid=[2370]}, 88) = 2370
> >  > 2370  mmap(NULL, 262144, PROT_READ|PROT_WRITE,
> > MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
> >  > 2370  <... mmap resumed>)               = 0x7fca68249000
> >  > 2372  <... clone3 resumed> => {parent_tid=[2384]}, 88) = 2384
> >  > 2384  <... clone3 resumed> => {parent_tid=[2388]}, 88) = 2388
> >  > 2388  <... clone3 resumed> => {parent_tid=[2392]}, 88) = 2392
> >  > 2392  <... clone3 resumed> => {parent_tid=[2395]}, 88) = 2395
> >  > 2395  write(2, "runtime: marked free object in s"..., 36 <unfinished
> > ...>
> >
> > I.e. IIUC, all are threads (CLONE_VM) and thread 2370 mapped ANON
> > 0x7fca68249000 - 0x7fca6827ffff and go in thread 2395 thinks for some
> > reason 0x7fca6824bec8 in that region is "bad".

Thanks for the analysis Jiri.
Is it possible from these logs to identify whether 2370 finished the
mmap operation before 2395 tried to access 0x7fca6824bec8? That access
has to happen only after mmap finishes mapping the region.

>
> As I was noticed, this might be as well be a fail of the go's
> inter-thread communication (or alike) too. It might now be only more
> exposed with vma-based locks as we can do more parallelism now.

Yes, with multithreaded processes like these where threads are mapping
and accessing memory areas, per-VMA locks should allow for greater
parallelism. So, if there is a race like the one I asked above, it
might become more pronounced with per-VMA locks.

I'll double check the code, but from Kernel's POV mmap would take the
mmap_lock for write then will lock the VMA lock for write. That should
prevent any page fault handlers from accessing this VMA in parallel
until writers release the locks. Page fault path would try to find the
VMA without any lock and then will try to read-lock that VMA. If it
fails it will fall back to mmap_lock. So, if the writer started first
and obtained the VMA lock, the reader will fall back to mmap_lock and
will block until the writer releases the mmap_lock. If the reader got
VMA read lock first then the writer will block while obtaining the
VMA's write lock. However for your scenario, the reader (page fault)
might be getting here before the writer (mmap) and upon not finding
the VMA it is looking for, it will fail.
Please let me know if you can verify this scenario.
Thanks,
Suren.

>
> There are older hard to reproduce bugs in go with similar symptoms (we
> see this error sometimes now too):
> https://github.com/golang/go/issues/15246
>
> Or this 2016 bug is a red herring. Hard to tell...
>
> >> thanks,
> --
> js
> suse labs
>