linux-ext4 - Re: page eviction from the buddy cache

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <3C8EEEF8-C1EB-4E3D-8DE6-198AB1BEA8C0@gmail.com>
Date:	Thu, 28 Mar 2013 07:34:03 +0200
From:	Alexey Lyahkov <alexey.lyashkov@...il.com>
To:	Hugh Dickins <hughd@...gle.com>
Cc:	Theodore Ts'o <tytso@....edu>, Andrew Perepechko <anserper@...ru>,
	linux-ext4@...r.kernel.org, akpm@...ux-foundation.org,
	linux-mm@...ck.org
Subject: Re: page eviction from the buddy cache

Hi Hugh,

"immediately" say in ~1s after allocation /via krobes/ftrace logs/, 
and you are correct - that is in case large streaming io in Lustre - like 3-4GB/s in read.
ftrace logs (with additional trace points) say page allocated, mark page accessed..
and nothing until that page will found in isolate_lru_page in shrink_inactive_list 
/that point to set kprobe/
if someone need a logs i may provide it's as it's easy to collect. 

But may be that is more generic question when ext4 code, some important metadata exist 
in block device page cache in that case calling lru_page_drain() here move these pages
in active LRU so will accessible easy.


On Mar 27, 2013, at 21:24, Hugh Dickins wrote:

> [Cc'ing linux-mm: "buddy cache" here is cache of some ext4 metadata]
> 
> On Wed, 27 Mar 2013, Theodore Ts'o wrote:
>> Hi Andrew,
>> 
>> Thanks for your analysis!  Since I'm not a mm developer, I'm not sure
>> what's the best way to more aggressively mark a page as one that we'd
>> really like to keep in the page cache --- whether it's calling
>> lru_add_drain(), or calling activate_page(page), etc.
>> 
>> So I've added Andrew Morton and Hugh Dickens to the cc list as mm
>> experts in the hopes they could give us some advice about the best way
>> to achieve this goal.  Andrew, Hugh, could you give us some quick
>> words of wisdom?
> 
> Hardly from me: I'm dissatisfied with answer below, Cc'ed linux-mm.
> 
>> 
>> Thanks,
>> 
>> 					- Ted
>> On Mon, Mar 25, 2013 at 04:59:44PM +0400, Andrew Perepechko wrote:
>>> Hello!
>>> 
>>> Our recent investigation has found that pages from
>>> the buddy cache are evicted too often as compared
>>> to the expectation from their usage pattern. This
>>> introduces additional reads during large writes under
>>> our workload and really hurts overall performance.
>>> 
>>> ext4 uses find_get_page() and find_or_create_page()
>>> to look for buddy cache pages, but these pages don't
>>> get a chance to become activated until the following
>>> lru_add_drain() call, because mark_page_accessed()
>>> does not activate pages which are not PageLRU().
>>> 
>>> As can be found from a kprobe-based test, these pages
>>> are often moved on the inactive LRU as a result of
>>> shrink_inactive_list()->lru_add_drain() and immediately
>>> evicted.
> 
> Not quite like that, I think.
> 
> Cache pages are intentionally put on the inactive list initially,
> so that streaming I/O does not push out more useful pages: it is
> intentional that the first call to mark_page_accessed() merely
> marks the page referenced, but does not move it to active LRU.
> 
> You're right that the pagevec confuses things here, but I'm
> surprised if these pages are "immediately evicted": they won't
> be evicted while they remain on a pagevec, and can only be evicted
> after reaching the LRU.  And they should be put on the hot end of
> the inactive LRU, and only evicted once they reach the cold end.
> 
> But maybe you have lots of dirty or otherwise-un-immediately-evictable
> data pages in between, so that page reclaim reaches these ones too soon.
> 
> IIUC the pages you are discussing here are important metadata pages,
> which you would much prefer to retain longer than streaming data.
> 
> While I question "immediately evicted", I don't doubt that they
> get evicted sooner than you wish: one way or another, they arrive
> at the cold end of the inactive LRU too soon.
> 
> You would like a way to mark these as more important to retain than
> data pages: you would like to put them directly on the active list,
> but are frustrated by the pagevec.
> 
>>> 
>>> From a quick look into linux-2.6.git, the issue seems
>>> to exist in the current code as well.
>>> 
>>> A possible and, perhaps, non-optimal solution would be
>>> to call lru_add_drain() each time a buddy cache page
>>> is used.
> 
> mark_page_accessed() should be enough each time one is actually used,
> but yes, it looks like you need more than that when first added to cache.
> 
> It appears that at the moment you need to do:
> 
> 	mark_page_accessed(page);	/* to SetPageReferenced */
> 	lru_add_drain();		/* to SetPageLRU */
> 	mark_page_accessed(page);	/* to SetPageActive */
> 
> but I agree that we would really prefer a filesystem not to have to
> call lru_add_drain().
> 
> I quite like the idea of
> 	mark_page_accessed(page);
> 	mark_page_accessed(page);
> as a sequence to use on important metadata (nicely reminiscent of
> "sync; sync;"), but maybe not everybody will agree with me on that!
> 
> As currently implemented, a page is put on to a pagevec specific to
> the LRU it is destined for, and we cannot change that destination
> before it is flushed to that LRU.  But at this moment I cannot see
> a fundamental reason why we should not allow PageActive to be set
> while in the pagevec, and destination LRU adjusted accordingly.
> 
> However, I could easily be missing something (probably some VM_BUG_ONs
> at the least); and changing this might uncover unwanted side-effects -
> perhaps some code paths which already call mark_page_accessed() twice
> in quick succession unintentionally, and would now be given an Active
> page when Inactive has actually been more appropriate.
> 
> Though I'd like to come back to this, I am very unlikely to find time
> for it in the near future: perhaps someone else might take it further.
> 
> Hugh
> 
>>> 
>>> Any other suggestions?
>>> 
>>> Thank you,
>>> Andrew

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html