lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20230930195434.3507483510ba7961985fbeb2@linux-foundation.org>
Date:   Sat, 30 Sep 2023 19:54:34 -0700
From:   Andrew Morton <akpm@...ux-foundation.org>
To:     riel@...riel.com
Cc:     linux-kernel@...r.kernel.org, kernel-team@...a.com,
        linux-mm@...ck.org, muchun.song@...ux.dev, mike.kravetz@...cle.com,
        leit@...a.com, willy@...radead.org
Subject: Re: [PATCH v5 0/3] hugetlbfs: close race between MADV_DONTNEED and
 page fault

On Sat, 30 Sep 2023 20:55:47 -0400 riel@...riel.com wrote:

> v5: somehow a __vma_private_lock(vma) test failed to make it from my tree into the v4 series, fix that
> v4: fix unmap_vmas locking issue pointed out by Mike Kravetz, and resulting lockdep fallout
> v3: fix compile error w/ lockdep and test case errors with patch 3
> v2: fix the locking bug found with the libhugetlbfs tests.
> 
> Malloc libraries, like jemalloc and tcalloc, take decisions on when
> to call madvise independently from the code in the main application.
> 
> This sometimes results in the application page faulting on an address,
> right after the malloc library has shot down the backing memory with
> MADV_DONTNEED.
> 
> Usually this is harmless, because we always have some 4kB pages
> sitting around to satisfy a page fault. However, with hugetlbfs
> systems often allocate only the exact number of huge pages that
> the application wants.
> 
> Due to TLB batching, hugetlbfs MADV_DONTNEED will free pages outside of
> any lock taken on the page fault path, which can open up the following
> race condition:
> 
>        CPU 1                            CPU 2
> 
>        MADV_DONTNEED
>        unmap page
>        shoot down TLB entry
>                                        page fault
>                                        fail to allocate a huge page
>                                        killed with SIGBUS
>        free page
> 
> Fix that race by extending the hugetlb_vma_lock locking scheme to also
> cover private hugetlb mappings (with resv_map), and pulling the locking 
> from __unmap_hugepage_final_range into helper functions called from
> zap_page_range_single. This ensures page faults stay locked out of
> the MADV_DONTNEED VMA until the huge pages have actually been freed.

Didn't we decide that [1/3] and [2/3] should be cc:stable?

> The third patch in the series is more of an RFC. Using the
> invalidate_lock instead of the hugetlb_vma_lock greatly simplifies
> the code, but at the cost of turning a per-VMA lock into a lock
> per backing hugetlbfs file, which could slow things down when
> multiple processes are mapping the same hugetlbfs file.

"could slow things down" is testable-for?

This third one I'd queue up for testing for a 6.7-rc1 merge, so I'll split
the series apart.  Not a problem, but it would be a little better if
things were originally packaged that way.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ