lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <20160523214942.GA79646@black.fi.intel.com> Date: Tue, 24 May 2016 00:49:42 +0300 From: "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com> To: Rik van Riel <riel@...hat.com> Cc: "Kirill A. Shutemov" <kirill@...temov.name>, Michal Hocko <mhocko@...nel.org>, Ebru Akagunduz <ebru.akagunduz@...il.com>, linux-mm@...ck.org, hughd@...gle.com, akpm@...ux-foundation.org, n-horiguchi@...jp.nec.com, aarcange@...hat.com, iamjoonsoo.kim@....com, gorcunov@...nvz.org, linux-kernel@...r.kernel.org, mgorman@...e.de, rientjes@...gle.com, vbabka@...e.cz, aneesh.kumar@...ux.vnet.ibm.com, hannes@...xchg.org, boaz@...xistor.com Subject: Re: [PATCH 3/3] mm, thp: make swapin readahead under down_read of mmap_sem On Mon, May 23, 2016 at 04:13:03PM -0400, Rik van Riel wrote: > On Mon, 2016-05-23 at 23:02 +0300, Kirill A. Shutemov wrote: > > On Mon, May 23, 2016 at 03:26:47PM -0400, Rik van Riel wrote: > > > > > > On Mon, 2016-05-23 at 22:01 +0300, Kirill A. Shutemov wrote: > > > > > > > > On Mon, May 23, 2016 at 02:49:09PM -0400, Rik van Riel wrote: > > > > > > > > > > > > > > > On Mon, 2016-05-23 at 20:42 +0200, Michal Hocko wrote: > > > > > > > > > > > > > > > > > > On Mon 23-05-16 20:14:11, Ebru Akagunduz wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > Currently khugepaged makes swapin readahead under > > > > > > > down_write. This patch supplies to make swapin > > > > > > > readahead under down_read instead of down_write. > > > > > > You are still keeping down_write. Can we do without it > > > > > > altogether? > > > > > > Blocking mmap_sem of a remote proces for write is certainly > > > > > > not > > > > > > nice. > > > > > Maybe Andrea can explain why khugepaged requires > > > > > a down_write of mmap_sem? > > > > > > > > > > If it were possible to have just down_read that > > > > > would make the code a lot simpler. > > > > You need a down_write() to retract page table. We need to make > > > > sure > > > > that > > > > nobody sees the page table before we can replace it with huge > > > > pmd. > > > Good point. > > > > > > I guess the alternative is to have the page_table_lock > > > taken by a helper function (everywhere) that can return > > > failure if the page table was changed while the caller > > > was waiting for the lock. > > Not page table was changed, but pmd is now pointing to something > > else. > > Basically, we would need to nest all pte-ptl's within pmd_lock(). > > That's not good for scalability. > > I can see a few alternatives here: > > 1) huge pmd collapsing takes both the pmd lock and the pte lock, > preventing pte updates from happening simultaneously That's what we do now and that's not enough. We would need to serialize against pmd_lock() during normal page-fault path (and other pte manipulation), which we don't do now if pmd points to page table. That's huge hit on scalability. > > 2) code that (re-)acquires the pte lock can read a sequence number > at the pmd level, check that it did not change after the > pte lock has been acquired, and abort if it has - I believe most > of the code that re-acquires the pte lock already knows how to > abort if somebody else touched the pte while it was looking > elsewhere So, every pmd_lock() (and other means we take the lock) should bump the sequence number and we need to be able to read stable result outside pmd_lock(), meaning it should be atomic_t or something similar. Not exactly free. And I'm not convinced the hassle worth the gain. > That way the (uncommon) thp collapse code should still exclude > pte level operations, at the cost of potentially teaching a few > more pte level operations to abort (chances are most already do, > considering a race with other pte-level manipulations requires that). -- Kirill A. Shutemov
Powered by blists - more mailing lists