linux-kernel - Re: [PATCH 00/14] Small step toward KSM for file back page.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20201007214532.GA3484657@redhat.com>
Date:   Wed, 7 Oct 2020 17:45:32 -0400
From:   Jerome Glisse <jglisse@...hat.com>
To:     Matthew Wilcox <willy@...radead.org>
Cc:     linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        linux-fsdevel@...r.kernel.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Tejun Heo <tj@...nel.org>, Jan Kara <jack@...e.cz>,
        Josef Bacik <jbacik@...com>
Subject: Re: [PATCH 00/14] Small step toward KSM for file back page.

On Wed, Oct 07, 2020 at 07:33:16PM +0100, Matthew Wilcox wrote:
> On Wed, Oct 07, 2020 at 01:54:19PM -0400, Jerome Glisse wrote:
> > On Wed, Oct 07, 2020 at 06:05:58PM +0100, Matthew Wilcox wrote:
> > > On Wed, Oct 07, 2020 at 10:48:35AM -0400, Jerome Glisse wrote:
> > > > On Wed, Oct 07, 2020 at 04:20:13AM +0100, Matthew Wilcox wrote:
> > > > > On Tue, Oct 06, 2020 at 09:05:49PM -0400, jglisse@...hat.com wrote:
> > > For other things (NUMA distribution), we can point to something which

[...]

> > > isn't a struct page and can be distiguished from a real struct page by a
> > > bit somewhere (I have ideas for at least three bits in struct page that
> > > could be used for this).  Then use a pointer in that data structure to
> > > point to the real page.  Or do NUMA distribution at the inode level.
> > > Have a way to get from (inode, node) to an address_space which contains
> > > just regular pages.
> > 
> > How do you find all the copies ? KSM maintains a list for a reasons.
> > Same would be needed here because if you want to break the write prot
> > you need to find all the copy first. If you intend to walk page table
> > then how do you synchronize to avoid more copy to spawn while you
> > walk reverse mapping, we could lock the struct page i guess. Also how
> > do you walk device page table which are completely hidden from core mm.
> 
> You have the inode and you iterate over each mapping, looking up the page
> that's in each mapping.  Or you use the i_mmap tree to find the pages.

This would slow down for everyone as we would have to walk all mapping
each time we try to write to page. Also we a have mechanism for page
write back to avoid race between thread trying to write and write back.
We would also need something similar. Without mediating this through
struct page i do not see how to keep this reasonable from performance
point of view.


> > > I don't have time to work on all of these.  If there's one that
> > > particularly interests you, let's dive deep into it and figure out how
> > 
> > I care about KSM, duplicate NUMA copy (not only for CPU but also
> > device) and write protection or exclusive write access. In each case
> > you need a list of all the copy (for KSM of the deduplicated page)
> > Having a special entry in the page cache does not sound like a good
> > option in many code path you would need to re-look the page cache to
> > find out if the page is in special state. If you use a bit flag in
> > struct page how do you get to the callback or to the copy/alias,
> > walk all the page tables ?
> 
> Like I said, something that _looks_ like a struct page.  At least looks
> enough like a struct page that you can pull a pointer out of the page
> cache and check the bit.  But since it's not actually a struct page,
> you can use the rest of the data structure for pointers to things you
> want to track.  Like the real struct page.

What i fear is the added cost because it means we need to do this look-
up everytime to check and we also need proper locking to avoid races.
Adding an ancilliary struct and trying to keep everything synchronize
seems harder to me.

> 
> > I do not see how i am doing violence to struct page :) The basis of
> > my approach is to pass down the mapping. We always have the mapping
> > at the top of the stack (either syscall entry point on a file or
> > through the vma when working on virtual address).
> 
> Yes, you explained all that in Utah.  I wasn't impressed than, and I'm
> not impressed now.

Is this more of a taste thing or is there something specific you do not
like ?

Cheers,
Jérôme