lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 04 Apr 2013 09:24:53 +0800
From:	Will Huck <>
To:	Alexey Lyahkov <>
CC:	Hugh Dickins <>, Theodore Ts'o <>,
	Andrew Perepechko <>,,,
Subject: Re: page eviction from the buddy cache

Hi Alexey,
On 03/28/2013 01:34 PM, Alexey Lyahkov wrote:
> Hi Hugh,
> "immediately" say in ~1s after allocation /via krobes/ftrace logs/,
> and you are correct - that is in case large streaming io in Lustre - like 3-4GB/s in read.
> ftrace logs (with additional trace points) say page allocated, mark page accessed..
> and nothing until that page will found in isolate_lru_page in shrink_inactive_list
> /that point to set kprobe/
> if someone need a logs i may provide it's as it's easy to collect.

I don't need the log, but could you show me how you trace?

> But may be that is more generic question when ext4 code, some important metadata exist
> in block device page cache in that case calling lru_page_drain() here move these pages
> in active LRU so will accessible easy.
> On Mar 27, 2013, at 21:24, Hugh Dickins wrote:
>> [Cc'ing linux-mm: "buddy cache" here is cache of some ext4 metadata]
>> On Wed, 27 Mar 2013, Theodore Ts'o wrote:
>>> Hi Andrew,
>>> Thanks for your analysis!  Since I'm not a mm developer, I'm not sure
>>> what's the best way to more aggressively mark a page as one that we'd
>>> really like to keep in the page cache --- whether it's calling
>>> lru_add_drain(), or calling activate_page(page), etc.
>>> So I've added Andrew Morton and Hugh Dickens to the cc list as mm
>>> experts in the hopes they could give us some advice about the best way
>>> to achieve this goal.  Andrew, Hugh, could you give us some quick
>>> words of wisdom?
>> Hardly from me: I'm dissatisfied with answer below, Cc'ed linux-mm.
>>> Thanks,
>>> 					- Ted
>>> On Mon, Mar 25, 2013 at 04:59:44PM +0400, Andrew Perepechko wrote:
>>>> Hello!
>>>> Our recent investigation has found that pages from
>>>> the buddy cache are evicted too often as compared
>>>> to the expectation from their usage pattern. This
>>>> introduces additional reads during large writes under
>>>> our workload and really hurts overall performance.
>>>> ext4 uses find_get_page() and find_or_create_page()
>>>> to look for buddy cache pages, but these pages don't
>>>> get a chance to become activated until the following
>>>> lru_add_drain() call, because mark_page_accessed()
>>>> does not activate pages which are not PageLRU().
>>>> As can be found from a kprobe-based test, these pages
>>>> are often moved on the inactive LRU as a result of
>>>> shrink_inactive_list()->lru_add_drain() and immediately
>>>> evicted.
>> Not quite like that, I think.
>> Cache pages are intentionally put on the inactive list initially,
>> so that streaming I/O does not push out more useful pages: it is
>> intentional that the first call to mark_page_accessed() merely
>> marks the page referenced, but does not move it to active LRU.
>> You're right that the pagevec confuses things here, but I'm
>> surprised if these pages are "immediately evicted": they won't
>> be evicted while they remain on a pagevec, and can only be evicted
>> after reaching the LRU.  And they should be put on the hot end of
>> the inactive LRU, and only evicted once they reach the cold end.
>> But maybe you have lots of dirty or otherwise-un-immediately-evictable
>> data pages in between, so that page reclaim reaches these ones too soon.
>> IIUC the pages you are discussing here are important metadata pages,
>> which you would much prefer to retain longer than streaming data.
>> While I question "immediately evicted", I don't doubt that they
>> get evicted sooner than you wish: one way or another, they arrive
>> at the cold end of the inactive LRU too soon.
>> You would like a way to mark these as more important to retain than
>> data pages: you would like to put them directly on the active list,
>> but are frustrated by the pagevec.
>>>>  From a quick look into linux-2.6.git, the issue seems
>>>> to exist in the current code as well.
>>>> A possible and, perhaps, non-optimal solution would be
>>>> to call lru_add_drain() each time a buddy cache page
>>>> is used.
>> mark_page_accessed() should be enough each time one is actually used,
>> but yes, it looks like you need more than that when first added to cache.
>> It appears that at the moment you need to do:
>> 	mark_page_accessed(page);	/* to SetPageReferenced */
>> 	lru_add_drain();		/* to SetPageLRU */
>> 	mark_page_accessed(page);	/* to SetPageActive */
>> but I agree that we would really prefer a filesystem not to have to
>> call lru_add_drain().
>> I quite like the idea of
>> 	mark_page_accessed(page);
>> 	mark_page_accessed(page);
>> as a sequence to use on important metadata (nicely reminiscent of
>> "sync; sync;"), but maybe not everybody will agree with me on that!
>> As currently implemented, a page is put on to a pagevec specific to
>> the LRU it is destined for, and we cannot change that destination
>> before it is flushed to that LRU.  But at this moment I cannot see
>> a fundamental reason why we should not allow PageActive to be set
>> while in the pagevec, and destination LRU adjusted accordingly.
>> However, I could easily be missing something (probably some VM_BUG_ONs
>> at the least); and changing this might uncover unwanted side-effects -
>> perhaps some code paths which already call mark_page_accessed() twice
>> in quick succession unintentionally, and would now be given an Active
>> page when Inactive has actually been more appropriate.
>> Though I'd like to come back to this, I am very unlikely to find time
>> for it in the near future: perhaps someone else might take it further.
>> Hugh
>>>> Any other suggestions?
>>>> Thank you,
>>>> Andrew
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to  For more info on Linux MM,
> see: .
> Don't email: <a href=ilto:""> </a>

To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to
More majordomo info at

Powered by blists - more mailing lists