linux-kernel - Re: [PATCH 5/5] hugetlbfs: Limit wait time when trying to share huge PMD

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190912044002.xp3c7jbpbmq4dbz6@linux-p48b>
Date:   Wed, 11 Sep 2019 21:40:02 -0700
From:   Davidlohr Bueso <dave@...olabs.net>
To:     Matthew Wilcox <willy@...radead.org>
Cc:     Mike Kravetz <mike.kravetz@...cle.com>,
        Waiman Long <longman@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Will Deacon <will.deacon@....com>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        linux-mm@...ck.org
Subject: Re: [PATCH 5/5] hugetlbfs: Limit wait time when trying to share huge
 PMD

On Wed, 11 Sep 2019, Matthew Wilcox wrote:

>On Wed, Sep 11, 2019 at 08:26:52PM -0700, Mike Kravetz wrote:
>> All this got me wondering if we really need to take i_mmap_rwsem in write
>> mode here.  We are not changing the tree, only traversing it looking for
>> a suitable vma.
>>
>> Unless I am missing something, the hugetlb code only ever takes the semaphore
>> in write mode; never read.  Could this have been the result of changing the
>> tree semaphore to read/write?  Instead of analyzing all the code, the easiest
>> and safest thing would have been to take all accesses in write mode.
>
>I was wondering the same thing.  It was changed here:
>
>commit 83cde9e8ba95d180eaefefe834958fbf7008cf39
>Author: Davidlohr Bueso <dave@...olabs.net>
>Date:   Fri Dec 12 16:54:21 2014 -0800
>
>    mm: use new helper functions around the i_mmap_mutex
>
>    Convert all open coded mutex_lock/unlock calls to the
>    i_mmap_[lock/unlock]_write() helpers.
>
>and a subsequent patch said:
>
>    This conversion is straightforward.  For now, all users take the write
>    lock.
>
>There were subsequent patches which changed a few places
>c8475d144abb1e62958cc5ec281d2a9e161c1946
>1acf2e040721564d579297646862b8ea3dd4511b
>d28eb9c861f41aa2af4cfcc5eeeddff42b13d31e
>874bfcaf79e39135cd31e1cfc9265cf5222d1ec3
>3dec0ba0be6a532cac949e02b853021bf6d57dad
>
>but I don't know why this one wasn't changed.

I cannot recall why huge_pmd_share() was not changed along with the other
callers that don't modify the interval tree. By looking at the function,
I agree that this could be shared, in fact this lock is much less involved
than it's anon_vma counterpart, last I checked (perhaps with the exception
of take_rmap_locks().

>
>(I was also wondering about caching a potentially sharable page table
>in the address_space to avoid having to walk the VMA tree at all if that
>one happened to be sharable).

I also think that the right solution is within the mm instead of adding
a new api to rwsem and the extra complexity/overhead to osq _just_ for this
case. We've managed to not need timeout extensions in our locking primitives
thus far, which is a good thing imo.

Thanks,
Davidlohr