lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20181023174148.GX18839@dhcp22.suse.cz>
Date:   Tue, 23 Oct 2018 19:41:48 +0200
From:   Michal Hocko <mhocko@...nel.org>
To:     Mike Kravetz <mike.kravetz@...cle.com>
Cc:     linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        Hugh Dickins <hughd@...gle.com>,
        Naoya Horiguchi <n-horiguchi@...jp.nec.com>,
        "Aneesh Kumar K . V" <aneesh.kumar@...ux.vnet.ibm.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        "Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
        Davidlohr Bueso <dave@...olabs.net>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        stable@...r.kernel.org
Subject: Re: [PATCH] hugetlbfs: dirty pages as they are added to pagecache

On Tue 23-10-18 10:30:44, Mike Kravetz wrote:
> On 10/23/18 12:43 AM, Michal Hocko wrote:
> > On Wed 17-10-18 21:10:22, Mike Kravetz wrote:
> >> Some test systems were experiencing negative huge page reserve
> >> counts and incorrect file block counts.  This was traced to
> >> /proc/sys/vm/drop_caches removing clean pages from hugetlbfs
> >> file pagecaches.  When non-hugetlbfs explicit code removes the
> >> pages, the appropriate accounting is not performed.
> >>
> >> This can be recreated as follows:
> >>  fallocate -l 2M /dev/hugepages/foo
> >>  echo 1 > /proc/sys/vm/drop_caches
> >>  fallocate -l 2M /dev/hugepages/foo
> >>  grep -i huge /proc/meminfo
> >>    AnonHugePages:         0 kB
> >>    ShmemHugePages:        0 kB
> >>    HugePages_Total:    2048
> >>    HugePages_Free:     2047
> >>    HugePages_Rsvd:    18446744073709551615
> >>    HugePages_Surp:        0
> >>    Hugepagesize:       2048 kB
> >>    Hugetlb:         4194304 kB
> >>  ls -lsh /dev/hugepages/foo
> >>    4.0M -rw-r--r--. 1 root root 2.0M Oct 17 20:05 /dev/hugepages/foo
> >>
> >> To address this issue, dirty pages as they are added to pagecache.
> >> This can easily be reproduced with fallocate as shown above. Read
> >> faulted pages will eventually end up being marked dirty.  But there
> >> is a window where they are clean and could be impacted by code such
> >> as drop_caches.  So, just dirty them all as they are added to the
> >> pagecache.
> >>
> >> In addition, it makes little sense to even try to drop hugetlbfs
> >> pagecache pages, so disable calls to these filesystems in drop_caches
> >> code.
> >>
> >> Fixes: 70c3547e36f5 ("hugetlbfs: add hugetlbfs_fallocate()")
> >> Cc: stable@...r.kernel.org
> >> Signed-off-by: Mike Kravetz <mike.kravetz@...cle.com>
> > 
> > I do agree with others that HUGETLBFS_MAGIC check in drop_pagecache_sb
> > is wrong in principal. I am not even sure we want to special case memory
> > backed filesystems. What if we ever implement MADV_FREE on fs? Should
> > those pages be dropped? My first idea take would be yes.
> 
> Ok, I have removed that hard coded check.  Implementing MADV_FREE on
> hugetlbfs would take some work, but it could be done.
> 
> > Acked-by: Michal Hocko <mhocko@...e.com> to the set_page_dirty dirty
> > part.
> > 
> > Although I am wondering why you haven't covered only the fallocate path
> > wrt Fixes tag. In other words, do we need the same treatment for the
> > page fault path? We do not set dirty bit on page there as well. We rely
> > on the dirty bit in pte and only for writable mappings. I have hard time
> > to see why we have been safe there as well. So maybe it is your Fixes:
> > tag which is not entirely correct, or I am simply missing the fault
> > path.
> 
> No, you are not missing anything.  In the commit log I mentioned that this
> also does apply to the fault path.  The change takes care of them both.
> 
> I was struggling with what to put in the fixes tag.  As mentioned, this
> problem also exists in the fault path.  Since 3.16 is the oldest stable
> release, I went back and used the commit next to the add_to_page_cache code
> there.  However, that seems kind of random.  Is there a better way to say
> the patch applies to all stable releases?

OK, good, I was afraid I was missing something, well except for not
reading the changelog properly. I would go with

Cc: stable # all kernels with hugetlb

> Here is updated patch without the drop_caches change and updated fixes tag.
> 
> From: Mike Kravetz <mike.kravetz@...cle.com>
> 
> hugetlbfs: dirty pages as they are added to pagecache
> 
> Some test systems were experiencing negative huge page reserve
> counts and incorrect file block counts.  This was traced to
> /proc/sys/vm/drop_caches removing clean pages from hugetlbfs
> file pagecaches.  When non-hugetlbfs explicit code removes the
> pages, the appropriate accounting is not performed.
> 
> This can be recreated as follows:
>  fallocate -l 2M /dev/hugepages/foo
>  echo 1 > /proc/sys/vm/drop_caches
>  fallocate -l 2M /dev/hugepages/foo
>  grep -i huge /proc/meminfo
>    AnonHugePages:         0 kB
>    ShmemHugePages:        0 kB
>    HugePages_Total:    2048
>    HugePages_Free:     2047
>    HugePages_Rsvd:    18446744073709551615
>    HugePages_Surp:        0
>    Hugepagesize:       2048 kB
>    Hugetlb:         4194304 kB
>  ls -lsh /dev/hugepages/foo
>    4.0M -rw-r--r--. 1 root root 2.0M Oct 17 20:05 /dev/hugepages/foo
> 
> To address this issue, dirty pages as they are added to pagecache.
> This can easily be reproduced with fallocate as shown above. Read
> faulted pages will eventually end up being marked dirty.  But there
> is a window where they are clean and could be impacted by code such
> as drop_caches.  So, just dirty them all as they are added to the
> pagecache.
> 
> Fixes: 6bda666a03f0 ("hugepages: fold find_or_alloc_pages into huge_no_page()")
> Cc: stable@...r.kernel.org
> Signed-off-by: Mike Kravetz <mike.kravetz@...cle.com>

Acked-by: Mihcla Hocko <mhocko@...e.com>

> ---
>  mm/hugetlb.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 5c390f5a5207..7b5c0ad9a6bd 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -3690,6 +3690,12 @@ int huge_add_to_page_cache(struct page *page, struct address_space *mapping,
>  		return err;
>  	ClearPagePrivate(page);
>  
> +	/*
> +	 * set page dirty so that it will not be removed from cache/file
> +	 * by non-hugetlbfs specific code paths.
> +	 */
> +	set_page_dirty(page);
> +
>  	spin_lock(&inode->i_lock);
>  	inode->i_blocks += blocks_per_huge_page(h);
>  	spin_unlock(&inode->i_lock);
> -- 
> 2.17.2

-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ