lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201008153028.GA3508856@redhat.com>
Date:   Thu, 8 Oct 2020 11:30:28 -0400
From:   Jerome Glisse <jglisse@...hat.com>
To:     Matthew Wilcox <willy@...radead.org>
Cc:     linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        linux-fsdevel@...r.kernel.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Tejun Heo <tj@...nel.org>, Jan Kara <jack@...e.cz>,
        Josef Bacik <jbacik@...com>
Subject: Re: [PATCH 00/14] Small step toward KSM for file back page.

On Wed, Oct 07, 2020 at 11:09:16PM +0100, Matthew Wilcox wrote:
> On Wed, Oct 07, 2020 at 01:54:19PM -0400, Jerome Glisse wrote:
> > > For other things (NUMA distribution), we can point to something which
> > > isn't a struct page and can be distiguished from a real struct page by a
> > > bit somewhere (I have ideas for at least three bits in struct page that
> > > could be used for this).  Then use a pointer in that data structure to
> > > point to the real page.  Or do NUMA distribution at the inode level.
> > > Have a way to get from (inode, node) to an address_space which contains
> > > just regular pages.
> > 
> > How do you find all the copies ? KSM maintains a list for a reasons.
> > Same would be needed here because if you want to break the write prot
> > you need to find all the copy first. If you intend to walk page table
> > then how do you synchronize to avoid more copy to spawn while you
> > walk reverse mapping, we could lock the struct page i guess. Also how
> > do you walk device page table which are completely hidden from core mm.
> 
> So ... why don't you put a PageKsm page in the page cache?  That way you
> can share code with the current KSM implementation.  You'd need
> something like this:

I do just that but there is no need to change anything in page cache.
So below code is not necessary. What you need is a way to find all
the copies so if you have a write fault (or any write access) then
from that fault you get the mapping and offset and you use that to
lookup the fs specific informations and de-duplicate the page with
new page and the fs specific informations. Hence the filesystem code
do not need to know anything it all happens in generic common code.

So flow is:

  Same as before:
    1 - write fault (address, vma)
    2 - regular write fault handler -> find page in page cache

  New to common page fault code:
    3 - ksm check in write fault common code (same as ksm today
        for anonymous page fault code path).
    4 - break ksm (address, vma) -> (file offset, mapping)
        4.a - use mapping and file offset to lookup the proper
              fs specific information that were save when the
              page was made ksm.
        4.b - allocate new page and initialize it with that
              information (and page content), update page cache
              and mappings ie all the pte who where pointing to
              the ksm for that mapping at that offset to now use
              the new page (like KSM for anonymous page today).

  Resume regular code path:
        mkwrite /|| set pte ...

Roughly the same for write ioctl (other cases goes through GUP
which itself goes through page fault code path). There is no
need to change page cache in anyway. Just common code path that
enable write to file back page.

The fs specific information is page->private, some of the flags
(page->flags) and page->indexi (file offset). Everytime a page
is deduplicated a copy of that information is save in an alias
struct which you can get to from the the share KSM page (page->
mapping is a pointer to ksm root struct which has a pointer to
list of all aliases).

> 
> +++ b/mm/filemap.c
> @@ -1622,6 +1622,9 @@ struct page *find_lock_entry(struct address_space *mapping
> , pgoff_t index)
>                 lock_page(page);
>                 /* Has the page been truncated? */
>                 if (unlikely(page->mapping != mapping)) {
> +                       if (PageKsm(page)) {
> +                               ...
> +                       }
>                         unlock_page(page);
>                         put_page(page);
>                         goto repeat;
> @@ -1655,6 +1658,7 @@ struct page *find_lock_entry(struct address_space *mapping, pgoff_t index)
>   * * %FGP_WRITE - The page will be written
>   * * %FGP_NOFS - __GFP_FS will get cleared in gfp mask
>   * * %FGP_NOWAIT - Don't get blocked by page lock
> + * * %FGP_KSM - Return KSM pages
>   *
>   * If %FGP_LOCK or %FGP_CREAT are specified then the function may sleep even
>   * if the %GFP flags specified for %FGP_CREAT are atomic.
> @@ -1687,6 +1691,11 @@ struct page *pagecache_get_page(struct address_space *mapping, pgoff_t index,
>  
>                 /* Has the page been truncated? */
>                 if (unlikely(page->mapping != mapping)) {
> +                       if (PageKsm(page) {
> +                               if (fgp_flags & FGP_KSM)
> +                                       return page;
> +                               ...
> +                       }
>                         unlock_page(page);
>                         put_page(page);
>                         goto repeat;
> 
> I don't know what you want to do when you find a KSM page, so I just left
> an ellipsis.
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ