lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5b8b9f8c-8e9b-42a5-b8b2-9b96903f3ada@redhat.com>
Date:   Mon, 4 Dec 2023 18:27:08 +0100
From:   David Hildenbrand <david@...hat.com>
To:     Ryan Roberts <ryan.roberts@....com>,
        Catalin Marinas <catalin.marinas@....com>,
        Will Deacon <will@...nel.org>,
        Ard Biesheuvel <ardb@...nel.org>,
        Marc Zyngier <maz@...nel.org>,
        Oliver Upton <oliver.upton@...ux.dev>,
        James Morse <james.morse@....com>,
        Suzuki K Poulose <suzuki.poulose@....com>,
        Zenghui Yu <yuzenghui@...wei.com>,
        Andrey Ryabinin <ryabinin.a.a@...il.com>,
        Alexander Potapenko <glider@...gle.com>,
        Andrey Konovalov <andreyknvl@...il.com>,
        Dmitry Vyukov <dvyukov@...gle.com>,
        Vincenzo Frascino <vincenzo.frascino@....com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Anshuman Khandual <anshuman.khandual@....com>,
        Matthew Wilcox <willy@...radead.org>,
        Yu Zhao <yuzhao@...gle.com>,
        Mark Rutland <mark.rutland@....com>,
        Kefeng Wang <wangkefeng.wang@...wei.com>,
        John Hubbard <jhubbard@...dia.com>, Zi Yan <ziy@...dia.com>,
        Barry Song <21cnbao@...il.com>,
        Alistair Popple <apopple@...dia.com>,
        Yang Shi <shy828301@...il.com>
Cc:     linux-arm-kernel@...ts.infradead.org, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 01/15] mm: Batch-copy PTE ranges during fork()

> 
> With rmap batching from [1] -- rebased+changed on top of that -- we could turn
> that into an effective (untested):
> 
>           if (page && folio_test_anon(folio)) {
> +               nr = folio_nr_pages_cont_mapped(folio, page, src_pte, addr, end,
> +                                               pte, enforce_uffd_wp, &nr_dirty,
> +                                               &nr_writable);
>                   /*
>                    * If this page may have been pinned by the parent process,
>                    * copy the page immediately for the child so that we'll always
>                    * guarantee the pinned page won't be randomly replaced in the
>                    * future.
>                    */
> -               folio_get(folio);
> -               if (unlikely(folio_try_dup_anon_rmap_pte(folio, page, src_vma))) {
> +               folio_ref_add(folio, nr);
> +               if (unlikely(folio_try_dup_anon_rmap_ptes(folio, page, nr, src_vma))) {
>                           /* Page may be pinned, we have to copy. */
> -                       folio_put(folio);
> -                       return copy_present_page(dst_vma, src_vma, dst_pte, src_pte,
> -                                                addr, rss, prealloc, page);
> +                       folio_ref_sub(folio, nr);
> +                       ret = copy_present_page(dst_vma, src_vma, dst_pte,
> +                                               src_pte, addr, rss, prealloc,
> +                                               page);
> +                       return ret == 0 ? 1 : ret;
>                   }
> -               rss[MM_ANONPAGES]++;
> +               rss[MM_ANONPAGES] += nr;
>           } else if (page) {
> -               folio_get(folio);
> -               folio_dup_file_rmap_pte(folio, page);
> -               rss[mm_counter_file(page)]++;
> +               nr = folio_nr_pages_cont_mapped(folio, page, src_pte, addr, end,
> +                                               pte, enforce_uffd_wp, &nr_dirty,
> +                                               &nr_writable);
> +               folio_ref_add(folio, nr);
> +               folio_dup_file_rmap_ptes(folio, page, nr);
> +               rss[mm_counter_file(page)] += nr;
>           }
> 
> 
> We'll have to test performance, but it could be that we want to specialize
> more on !folio_test_large(). That code is very performance-sensitive.
> 
> 
> [1] https://lkml.kernel.org/r/20231204142146.91437-1-david@redhat.com

So, on top of [1] without rmap batching but with a slightly modified 
version of yours (that keeps the existing code structure as pointed out 
and e.g., updates counter updates), running my fork() microbenchmark 
with a 1 GiB of memory:

Compared to [1], with all order-0 pages it gets 13--14% _slower_ and 
with all PTE-mapped THP (order-9) it gets ~29--30% _faster_.

So looks like we really want to have a completely seprate code path for 
"!folio_test_large()" to keep that case as fast as possible. And 
"Likely" we want to use "likely(!folio_test_large()". ;)

Performing rmap batching on top of that code only slightly (another 1% 
or so) improves performance in the PTE-mapped THP (order-9) case right 
now, in contrast to other rmap batching. Reason is as all rmap code gets 
inlined here and we're only doing subpage mapcount updates + PAE handling.

-- 
Cheers,

David / dhildenb

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ