[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <be7778b9-58de-3717-0da5-e88fc5ec5542@alibaba-inc.com>
Date: Thu, 04 Jan 2018 16:17:50 +0800
From: "夷则(Caspar)" <jinli.zjl@...baba-inc.com>
To: Andrew Morton <akpm@...ux-foundation.org>,
Mel Gorman <mgorman@...hsingularity.net>
Cc: <green@...uxhacker.ru>, <linux-mm@...ck.org>,
<linux-kernel@...r.kernel.org>,
"杨勇(智彻)" <zhiche.yy@...baba-inc.com>,
"十刀" <shidao.ytt@...baba-inc.com>
Subject: Re: [PATCH] mm/fadvise: discard partial pages iff endbyte is also eof
On 2018/1/4 08:17, Andrew Morton wrote:
> On Wed, 3 Jan 2018 10:48:00 +0000 Mel Gorman <mgorman@...hsingularity.net> wrote:
>
>> On Wed, Jan 03, 2018 at 02:53:43PM +0800, ??????(Caspar) wrote:
>>>
>>>
>>>> ?? 2017??12??23????12:16?????? <shidao.ytt@...baba-inc.com> ??????
>>>>
>>>> From: "shidao.ytt" <shidao.ytt@...baba-inc.com>
>>>>
>>>> in commit 441c228f817f7 ("mm: fadvise: document the
>>>> fadvise(FADV_DONTNEED) behaviour for partial pages") Mel Gorman
>>>> explained why partial pages should be preserved instead of discarded
>>>> when using fadvise(FADV_DONTNEED), however the actual codes to calcuate
>>>> end_index was unexpectedly wrong, the code behavior didn't match to the
>>>> statement in comments; Luckily in another commit 18aba41cbf
>>>> ("mm/fadvise.c: do not discard partial pages with POSIX_FADV_DONTNEED")
>>>> Oleg Drokin fixed this behavior
>>>>
>>>> Here I come up with a new idea that actually we can still discard the
>>>> last parital page iff the page-unaligned endbyte is also the end of
>>>> file, since no one else will use the rest of the page and it should be
>>>> safe enough to discard.
>>>
>>> +akpm...
>>>
>>> Hi Mel, Andrew:
>>>
>>> Would you please take a look at this patch, to see if this proposal
>>> is reasonable enough, thanks in advance!
>>>
>>
>> I'm backlogged after being out for the Christmas. Superficially the patch
>> looks ok but I wondered how often it happened in practice as we already
>> would discard files smaller than a page on DONTNEED. It also requires
>> that the system call get the exact size of the file correct and would not
>> discard if the off + len was past the end of the file for whatever reason
>> (e.g. a stat to read the size, a truncate in parallel and fadvise using
>> stale data from stat) and that's why the patch looked like it might have
>> no impact in practice. Is the patch known to help a real workload or is
>> it motivated by a code inspection?
>
> The current whole-pages-only logic was introduced (accidentally, I
> think) by yours truly when fixing a bug in the initial fadvise()
> commit in 2003.
>
> https://kernel.opensuse.org/cgit/kernel/commit/?h=v2.6.0-test4&id=7161ee20fea6e25a32feb91503ca2b7c7333c886
>
> Namely:
>
> : invalidate_mapping_pages() takes start/end, but fadvise is currently passing
> : it start/len.
> :
> :
> :
> : mm/fadvise.c | 8 ++++++--
> : 1 files changed, 6 insertions(+), 2 deletions(-)
> :
> : diff -puN mm/fadvise.c~fadvise-fix mm/fadvise.c
> : --- 25/mm/fadvise.c~fadvise-fix 2003-08-14 18:16:12.000000000 -0700
> : +++ 25-akpm/mm/fadvise.c 2003-08-14 18:16:12.000000000 -0700
> : @@ -26,6 +26,8 @@ long sys_fadvise64(int fd, loff_t offset
> : struct inode *inode;
> : struct address_space *mapping;
> : struct backing_dev_info *bdi;
> : + pgoff_t start_index;
> : + pgoff_t end_index;
> : int ret = 0;
> :
> : if (!file)
> : @@ -65,8 +67,10 @@ long sys_fadvise64(int fd, loff_t offset
> : case POSIX_FADV_DONTNEED:
> : if (!bdi_write_congested(mapping->backing_dev_info))
> : filemap_flush(mapping);
> : - invalidate_mapping_pages(mapping, offset >> PAGE_CACHE_SHIFT,
> : - (len >> PAGE_CACHE_SHIFT) + 1);
> : + start_index = offset >> PAGE_CACHE_SHIFT;
> : + end_index = (offset + len + PAGE_CACHE_SIZE - 1) >>
> : + PAGE_CACHE_SHIFT;
> : + invalidate_mapping_pages(mapping, start_index, end_index);
> : break;
> : default:
> : ret = -EINVAL;
> :
>
> So I'm not sure that the whole "don't discard partial pages" thing is
> well-founded and I see no reason why we cannot alter it.
>
> So, thinking caps on: why not just discard them? After all, that's
> what userspace asked us to do.
Hi Andrew, I doubt if "just discard them" is a proper action to match
the userspace's expectation. Maybe we will never meet the userspace's
expectation since we are doing pages in kernel while userspace is
passing bytes offset/length to the kernel. Note that Mel Gorman has
already documented page-unaligned behaviors in posix_fadvise() man
page[1] but obviously not all people (including /me) are able to read
the _latest_ version, so someone might still uses the syscall with page
unaligned offset/length. The userspace might only ask for discarding
certain *bytes*, instead of *pages*.
And I think we need to look back first why we thought "preserved is
better than discard". If we throw the whole page, the rest part of the
page might still be required (consider the offset and length is in the
middle of a file) because it's untagged:
...|------------ PAGE --------------|...
...| DONTNEED |------ UNTAGGED -----|...
but the page has gone, page fault occurs and we need to reload it from
the disk -- performance degradation happens.
Maybe that's why we would rather preserv the whole page before.
But if we don't throw the partial page at all, and if the tail partial
page is _exactly the end of the file_, a page that advised to be NONEED
would be left in memory. And we all know that it is safe to throw it.
So we come up with this patch -- to keep the partial page not been
throwing away, and add a special case when the partial page is the end
of the file, we can throw it safely. I guess it might be a better solution.
One thing I'm worrying about is that, this patch might lead to a new
undocumented behavior, so maybe we need to document this special case in
posix_fadvise() man page too? hmmm...
Thanks,
Caspar
Powered by blists - more mailing lists