[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <515ADFCF.4010209@gmail.com>
Date: Tue, 02 Apr 2013 21:40:31 +0800
From: Simon Jeons <simon.jeons@...il.com>
To: Hugh Dickins <hughd@...gle.com>
CC: Minchan Kim <minchan@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org,
Dan Magenheimer <dan.magenheimer@...cle.com>,
Seth Jennings <sjenning@...ux.vnet.ibm.com>,
Nitin Gupta <ngupta@...are.org>,
Konrad Rzeszutek Wilk <konrad@...nok.org>,
Shaohua Li <shli@...nel.org>,
Kamezawa Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
Subject: Re: [RFC] mm: remove swapcache page early
Hi Hugh,
On 03/28/2013 05:41 AM, Hugh Dickins wrote:
> On Wed, 27 Mar 2013, Minchan Kim wrote:
>
>> Swap subsystem does lazy swap slot free with expecting the page
>> would be swapped out again so we can't avoid unnecessary write.
> so we can avoid unnecessary write.
If page can be swap out again, which codes can avoid unnecessary write?
Could you point out to me? Thanks in advance. ;-)
>> But the problem in in-memory swap is that it consumes memory space
>> until vm_swap_full(ie, used half of all of swap device) condition
>> meet. It could be bad if we use multiple swap device, small in-memory swap
>> and big storage swap or in-memory swap alone.
> That is a very good realization: it's surprising that none of us
> thought of it before - no disrespect to you, well done, thank you.
>
> And I guess swap readahead is utterly unhelpful in this case too.
>
>> This patch changes vm_swap_full logic slightly so it could free
>> swap slot early if the backed device is really fast.
>> For it, I used SWP_SOLIDSTATE but It might be controversial.
> But I strongly disagree with almost everything in your patch :)
> I disagree with addressing it in vm_swap_full(), I disagree that
> it can be addressed by device, I disagree that it has anything to
> do with SWP_SOLIDSTATE.
>
> This is not a problem with swapping to /dev/ram0 or to /dev/zram0,
> is it? In those cases, a fixed amount of memory has been set aside
> for swap, and it works out just like with disk block devices. The
> memory set aside may be wasted, but that is accepted upfront.
>
> Similarly, this is not a problem with swapping to SSD. There might
> or might not be other reasons for adjusting the vm_swap_full() logic
> for SSD or generally, but those have nothing to do with this issue.
>
> The problem here is peculiar to frontswap, and the variably sized
> memory behind it, isn't it? We are accustomed to using swap to free
> up memory by transferring its data to some other, cheaper but slower
> resource.
>
> But in the case of frontswap and zmem (I'll say that to avoid thinking
> through which backends are actually involved), it is not a cheaper and
> slower resource, but the very same memory we are trying to save: swap
> is stolen from the memory under reclaim, so any duplication becomes
> counter-productive (if we ignore cpu compression/decompression costs:
> I have no idea how fair it is to do so, but anyone who chooses zmem
> is prepared to pay some cpu price for that).
>
> And because it's a frontswap thing, we cannot decide this by device:
> frontswap may or may not stand in front of each device. There is no
> problem with swapcache duplicated on disk (until that area approaches
> being full or fragmented), but at the higher level we cannot see what
> is in zmem and what is on disk: we only want to free up the zmem dup.
>
> I believe the answer is for frontswap/zmem to invalidate the frontswap
> copy of the page (to free up the compressed memory when possible) and
> SetPageDirty on the PageUptodate PageSwapCache page when swapping in
> (setting page dirty so nothing will later go to read it from the
> unfreed location on backing swap disk, which was never written).
>
> We cannot rely on freeing the swap itself, because in general there
> may be multiple references to the swap, and we only satisfy the one
> which has faulted. It may or may not be a good idea to use rmap to
> locate the other places to insert pte in place of swap entry, to
> resolve them all at once; but we have chosen not to do so in the
> past, and there's no need for that, if the zmem gets invalidated
> and the swapcache page set dirty.
>
> Hugh
>
>> So let's add Ccing Shaohua and Hugh.
>> If it's a problem for SSD, I'd like to create new type SWP_INMEMORY
>> or something for z* family.
>>
>> Other problem is zram is block device so that it can set SWP_INMEMORY
>> or SWP_SOLIDSTATE easily(ie, actually, zram is already done) but
>> I have no idea to use it for frontswap.
>>
>> Any idea?
>>
>> Other optimize point is we remove it unconditionally when we
>> found it's exclusive when swap in happen.
>> It could help frontswap family, too.
>> What do you think about it?
>>
>> Cc: Hugh Dickins <hughd@...gle.com>
>> Cc: Dan Magenheimer <dan.magenheimer@...cle.com>
>> Cc: Seth Jennings <sjenning@...ux.vnet.ibm.com>
>> Cc: Nitin Gupta <ngupta@...are.org>
>> Cc: Konrad Rzeszutek Wilk <konrad@...nok.org>
>> Cc: Shaohua Li <shli@...nel.org>
>> Signed-off-by: Minchan Kim <minchan@...nel.org>
>> ---
>> include/linux/swap.h | 11 ++++++++---
>> mm/memory.c | 3 ++-
>> mm/swapfile.c | 11 +++++++----
>> mm/vmscan.c | 2 +-
>> 4 files changed, 18 insertions(+), 9 deletions(-)
>>
>> diff --git a/include/linux/swap.h b/include/linux/swap.h
>> index 2818a12..1f4df66 100644
>> --- a/include/linux/swap.h
>> +++ b/include/linux/swap.h
>> @@ -359,9 +359,14 @@ extern struct page *swapin_readahead(swp_entry_t, gfp_t,
>> extern atomic_long_t nr_swap_pages;
>> extern long total_swap_pages;
>>
>> -/* Swap 50% full? Release swapcache more aggressively.. */
>> -static inline bool vm_swap_full(void)
>> +/*
>> + * Swap 50% full or fast backed device?
>> + * Release swapcache more aggressively.
>> + */
>> +static inline bool vm_swap_full(struct swap_info_struct *si)
>> {
>> + if (si->flags & SWP_SOLIDSTATE)
>> + return true;
>> return atomic_long_read(&nr_swap_pages) * 2 < total_swap_pages;
>> }
>>
>> @@ -405,7 +410,7 @@ mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent, bool swapout)
>> #define get_nr_swap_pages() 0L
>> #define total_swap_pages 0L
>> #define total_swapcache_pages() 0UL
>> -#define vm_swap_full() 0
>> +#define vm_swap_full(si) 0
>>
>> #define si_swapinfo(val) \
>> do { (val)->freeswap = (val)->totalswap = 0; } while (0)
>> diff --git a/mm/memory.c b/mm/memory.c
>> index 705473a..1ca21a9 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -3084,7 +3084,8 @@ static int do_swap_page(struct mm_struct *mm, struct vm_area_struct *vma,
>> mem_cgroup_commit_charge_swapin(page, ptr);
>>
>> swap_free(entry);
>> - if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) || PageMlocked(page))
>> + if (likely(PageSwapCache(page)) && (vm_swap_full(page_swap_info(page))
>> + || (vma->vm_flags & VM_LOCKED) || PageMlocked(page)))
>> try_to_free_swap(page);
>> unlock_page(page);
>> if (page != swapcache) {
>> diff --git a/mm/swapfile.c b/mm/swapfile.c
>> index 1bee6fa..f9cc701 100644
>> --- a/mm/swapfile.c
>> +++ b/mm/swapfile.c
>> @@ -293,7 +293,7 @@ checks:
>> scan_base = offset = si->lowest_bit;
>>
>> /* reuse swap entry of cache-only swap if not busy. */
>> - if (vm_swap_full() && si->swap_map[offset] == SWAP_HAS_CACHE) {
>> + if (vm_swap_full(si) && si->swap_map[offset] == SWAP_HAS_CACHE) {
>> int swap_was_freed;
>> spin_unlock(&si->lock);
>> swap_was_freed = __try_to_reclaim_swap(si, offset);
>> @@ -382,7 +382,8 @@ scan:
>> spin_lock(&si->lock);
>> goto checks;
>> }
>> - if (vm_swap_full() && si->swap_map[offset] == SWAP_HAS_CACHE) {
>> + if (vm_swap_full(si) &&
>> + si->swap_map[offset] == SWAP_HAS_CACHE) {
>> spin_lock(&si->lock);
>> goto checks;
>> }
>> @@ -397,7 +398,8 @@ scan:
>> spin_lock(&si->lock);
>> goto checks;
>> }
>> - if (vm_swap_full() && si->swap_map[offset] == SWAP_HAS_CACHE) {
>> + if (vm_swap_full(si) &&
>> + si->swap_map[offset] == SWAP_HAS_CACHE) {
>> spin_lock(&si->lock);
>> goto checks;
>> }
>> @@ -763,7 +765,8 @@ int free_swap_and_cache(swp_entry_t entry)
>> * Also recheck PageSwapCache now page is locked (above).
>> */
>> if (PageSwapCache(page) && !PageWriteback(page) &&
>> - (!page_mapped(page) || vm_swap_full())) {
>> + (!page_mapped(page) ||
>> + vm_swap_full(page_swap_info(page)))) {
>> delete_from_swap_cache(page);
>> SetPageDirty(page);
>> }
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index df78d17..145c59c 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -933,7 +933,7 @@ cull_mlocked:
>>
>> activate_locked:
>> /* Not a candidate for swapping, so reclaim swap space. */
>> - if (PageSwapCache(page) && vm_swap_full())
>> + if (PageSwapCache(page) && vm_swap_full(page_swap_info(page)))
>> try_to_free_swap(page);
>> VM_BUG_ON(PageActive(page));
>> SetPageActive(page);
>> --
>> 1.8.2
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@...ck.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@...ck.org"> email@...ck.org </a>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists