linux-kernel - Re: [Question] ksm: rmap_item pointing to some stale vmas

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <55879AD2.30507@codeaurora.org>
Date:	Mon, 22 Jun 2015 10:49:14 +0530
From:	Susheel Khiani <skhiani@...eaurora.org>
To:	Hugh Dickins <hughd@...gle.com>
CC:	akpm@...ux-foundation.org, peterz@...radead.org, neilb@...e.de,
	dhowells@...hat.com, paulmcquad@...il.com, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org
Subject: Re: [Question] ksm: rmap_item pointing to some stale vmas

On 6/9/2015 11:56 PM, Susheel Khiani wrote:
> On 4/30/2015 11:37 AM, Susheel Khiani wrote:
>>> But if I've misunderstood, and you think that what you're seeing
>>> fits with the transient forking bugs I've (not quite) described,
>>> and you can explain why even the transient case is important for
>>> you to have fixed, then I really ought to redouble my efforts.
>>>
>>> Hugh
>
> I was able to root cause the issue as we got few instances of same and
> was frequently getting reproducible on stress tests. The reason why it
> was important was because failure to unmap ksm page was resulting into
> CMA allocation failure for us.
>
> For cases like fork, what we observed is for private mapped file pages,
> stable_node pointed by KSM page won't cover all the mappings until ksmd
> completes one full scan. Only after ksmd scan, new rmap_items pointing
> to mappings in child process would come into existence. So in cases like
> CMA allocations where we can't wait for ksmd to complete one full cycle,
> we can traverse anon_vma tree from parent's anon_vma to find out all the
> pages wheres CMA is mapped.
>
> I have tested the following patch on 3.10 kernel and with this change I
> am able to avoid CMA allocation failure which we were otherwise
> frequently seeing because of not able to unmap KSM page.
>
> Please review and let me know the feedback.
>
>
>
> [PATCH] ksm: Traverse through parent's anon_vma while unmapping
>
> While doing try_to_unmap_ksm, we traverse through
> rmap_item list to find out all the anon_vmas from which
> page needs to be unmapped.
>
> Now as per the design of KSM, it builds up its data
> structures by looking into each mm, and comes back a cycle
> later to find out which data structures are now outdated and
> needs to be updated. So, for cases like fork, what we
> observe is for private mapped file pages stable_node
> pointed by KSM page won't cover all the mappings until
> ksmd completes one full scan. Only after ksmd scan, new
> rmap_items pointing to mappings in child process would come
> into existence.
>
> As a result unmapping of a stable page can't be done until
> ksmd has completed one full scan. This becomes an issue in
> case of CMA where we need to unmap and move a CMA page and
> can't wait for ksmd to complete one cycle. Because of
> new rmap_items for new mapping still not created we won't be
> able to unmap CMA page from all the vmas where it is mapped.
> This would result in frequent CMA allocation failures.
>
> So instead of just relying on rmap_items list which we know
> can contain incomplete list, we also scan anon_vma tree from
> parent's anon_vma to find out all the vmas where CMA page is
> mapped and thereby successfully unmap the page and move it
> to new page.
>
> Change-Id: I97cacf6a73734b10c7098362c20fb3f2d4040c76
> Signed-off-by: Susheel Khiani <skhiani@...eaurora.org>
> ---
>   mm/ksm.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++++++++++---
>   1 file changed, 55 insertions(+), 3 deletions(-)
>
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 11f6293..10d5266 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -1956,6 +1956,7 @@ int page_referenced_ksm(struct page *page, struct
> mem_cgroup *memcg,
>       unsigned int mapcount = page_mapcount(page);
>       int referenced = 0;
>       int search_new_forks = 0;
> +    int search_from_root = 0;
>
>       VM_BUG_ON(!PageKsm(page));
>       VM_BUG_ON(!PageLocked(page));
> @@ -1968,9 +1969,20 @@ again:
>           struct anon_vma *anon_vma = rmap_item->anon_vma;
>           struct anon_vma_chain *vmac;
>           struct vm_area_struct *vma;
> +        struct rb_root rb_root;
> +
> +        if (!search_from_root) {
> +            if (anon_vma)
> +                rb_root = anon_vma->rb_root;
> +        }
> +        else {
> +            if (anon_vma && anon_vma->root) {
> +                rb_root = anon_vma->root->rb_root;
> +            }
> +        }
>
>           anon_vma_lock_read(anon_vma);
> -        anon_vma_interval_tree_foreach(vmac, &anon_vma->rb_root,
> +        anon_vma_interval_tree_foreach(vmac, &rb_root,
>                              0, ULONG_MAX) {
>               vma = vmac->vma;
>               if (rmap_item->address < vma->vm_start ||
> @@ -1999,6 +2011,11 @@ again:
>       }
>       if (!search_new_forks++)
>           goto again;
> +
> +    if (!search_from_root++) {
> +        search_new_forks = 0;
> +        goto again;
> +    }
>   out:
>       return referenced;
>   }
> @@ -2010,6 +2027,7 @@ int try_to_unmap_ksm(struct page *page, enum
> ttu_flags flags,
>       struct rmap_item *rmap_item;
>       int ret = SWAP_AGAIN;
>       int search_new_forks = 0;
> +    int search_from_root = 0;
>
>       VM_BUG_ON(!PageKsm(page));
>       VM_BUG_ON(!PageLocked(page));
> @@ -2028,9 +2046,20 @@ again:
>           struct anon_vma *anon_vma = rmap_item->anon_vma;
>           struct anon_vma_chain *vmac;
>           struct vm_area_struct *vma;
> +        struct rb_root rb_root;
> +
> +        if (!search_from_root) {
> +            if (anon_vma)
> +                rb_root = anon_vma->rb_root;
> +        }
> +        else {
> +            if (anon_vma && anon_vma->root) {
> +                rb_root = anon_vma->root->rb_root;
> +            }
> +        }
>
>           anon_vma_lock_read(anon_vma);
> -        anon_vma_interval_tree_foreach(vmac, &anon_vma->rb_root,
> +        anon_vma_interval_tree_foreach(vmac, &rb_root,
>                              0, ULONG_MAX) {
>               vma = vmac->vma;
>               if (rmap_item->address < vma->vm_start ||
> @@ -2056,6 +2085,11 @@ again:
>       }
>       if (!search_new_forks++)
>           goto again;
> +
> +    if(!search_from_root++) {
> +        search_new_forks = 0;
> +        goto again;
> +    }
>   out:
>       return ret;
>   }
> @@ -2068,6 +2102,7 @@ int rmap_walk_ksm(struct page *page, int
> (*rmap_one)(struct page *,
>       struct rmap_item *rmap_item;
>       int ret = SWAP_AGAIN;
>       int search_new_forks = 0;
> +    int search_from_root = 0;
>
>       VM_BUG_ON(!PageKsm(page));
>       VM_BUG_ON(!PageLocked(page));
> @@ -2080,9 +2115,21 @@ again:
>           struct anon_vma *anon_vma = rmap_item->anon_vma;
>           struct anon_vma_chain *vmac;
>           struct vm_area_struct *vma;
> +        struct rb_root rb_root;
> +
> +        if (!search_from_root) {
> +            if (anon_vma)
> +                rb_root = anon_vma->rb_root;
> +        }
> +        else {
> +            if (anon_vma && anon_vma->root) {
> +                rb_root = anon_vma->root->rb_root;
> +            }
> +        }
> +
>
>           anon_vma_lock_read(anon_vma);
> -        anon_vma_interval_tree_foreach(vmac, &anon_vma->rb_root,
> +        anon_vma_interval_tree_foreach(vmac, &rb_root,
>                              0, ULONG_MAX) {
>               vma = vmac->vma;
>               if (rmap_item->address < vma->vm_start ||
> @@ -2107,6 +2154,11 @@ again:
>       }
>       if (!search_new_forks++)
>           goto again;
> +
> +    if (!search_from_root++) {
> +        search_new_forks = 0;
> +        goto again;
> +    }
>   out:
>       return ret;
>   }

Reminder Ping, did you get a chance to look into
the previous mail

-- 
Susheel Khiani

QUALCOMM INDIA, on behalf of Qualcomm Innovation Center,
Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/