linux-kernel - Re: [Question] ksm: rmap_item pointing to some stale vmas

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 30 Apr 2015 11:37:41 +0530
From:	Susheel Khiani <skhiani@...eaurora.org>
To:	Hugh Dickins <hughd@...gle.com>
CC:	akpm@...ux-foundation.org, peterz@...radead.org, neilb@...e.de,
	dhowells@...hat.com, paulmcquad@...il.com, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org
Subject: Re: [Question] ksm: rmap_item pointing to some stale vmas

On 04/15/15 11:52, Hugh Dickins wrote:
>> We are using kernel-3.10.49 and I have gone through patches of ksm above this
>> >kernel version but didn't find anything relevant w.r.t issue. The latest
>> >patch which we have for KSM on our tree is
>> >
>> >668f9abb: mm: close PageTail race
> I agree, I don't think 3.10.49 would be missing any relevant fix -
> unless there's a later fix to some "random" corruption which happens
> to hit you here in KSM.
>
> I wonder how you identified that this issue of un-unmappable pages
> is peculiar to KSM.  Have you established that ordinary anon pages
> (we need not worry about file pages here) are always successfully
> unmappable?  KSM is reliant upon anon_vmas working as intended
> (but then makes use of them in its own peculiar way).
>


We identified issue in try_to_unmap_ksm as part of debugging CMA 
allocation failures. During alloc_contig_range we do migrate_pages, 
where we were failing to migrate a specific page even after all the 
retries which we make in migrate_pages function. Digging deeper we were 
able to conclude that we were failing in try_to_unmap_ksm where we 
failed to find valid ptes.


>> >
>> >The issue otherwise is difficult to reproduce and is appearing after days of
>> >testing on 512MB Android platform. What I am not able to figure out is which
>> >code path in ksm could actually land us in situation where in stable_node we
>> >still have stale rmap_items with old vmas which are now unmapped.
> Whether that's something to worry about depends on what you mean.
>
> It's normal for a stable_node to have some stale rmap_items attached,
> now pointing to pages different from the stable page, or pointing to none.
> That's in the nature of KSM, the way ksmd builds up its structures by
> peeking at what's in each mm, moving on, and coming back a cycle later
> to discover what's changed.
>
> But the anon_vma which such a stale rmap_item points to should remain
> valid (KSM holds an additional reference to it), even if its interval
> tree is now empty, or none of the vmas that it holds now cover this
> mm,address (but any vmas held should still be valid vmas).
>
> I was concerned, not that the stable_node has stale rmap_items attached,
> but that you know the page to be mapped, yet try_to_unmap_ksm is unable
> to locate its mappings.
>
>> >
>> >In the dumps we can see the new vmas mapping to the page but the new
>> >rmap_items with these new vmas which maps the page are still not updated in
>> >stable_node.
> "still not updated" after how long?
> I assume you to mean that, how ever long you wait (but at least
> one full scan), the stable_node is not updated with an rmap_item
> pointing to an anon_vma whose interval tree contains one of these
> new vmas which maps the page?


I have not yet concluded if we are waiting for one full scan or not. 
Since I was debugging this w.r.t CMA allocation failure by saying "still 
not updated" , I meant that even after all the number of retries which 
we make in CMA allocation path to migrate pages, the stable_node was not 
updated with rmap_item. But now I understand that we need to wait for at 
least one full ksm scan to see the update.


>
> (When setting up a new stable node, it will take several scans to
> establish, and can be delayed by various races, such as shifts in
> the unstable tree, and the trylock_page in try_to_merge_one_page.
> But I think that once you can see a stable ksm page mapped somewhere,
> all pointers to it should be captured within a single scan.)


I am actually thinking the reason for my issue could be that we might 
have not waited sufficient time to ensure that ksm scan ran once. The 
reason for this is I was able to track down mm_slot structure which we 
create in __ksm_enter and it contained mm_struct which had vma where our 
page is mapped. But rmap_list of this mm_slot was still NULL which I 
guess would get populate once ksm_do_scan runs.


>
> That's bad, but I have no idea of the cause.  I mention corruption
> above, because that would be one possibility; though unlikely if
> it always hits you here in KSM only.


Yes, even we have ruled out corruption since now we have seen multiple 
instances with similar symptoms.


>
> Whereas if you mean that a new mapping of the stable page may not
> be unmapped until ksmd has completed a full scan, that is also
> wrong, but not so serious.  Or would even that be a serious issue
> for you?  Please describe how this comes to be a problem for you.


Right now I don't have enough data points to claim that new mapping of 
the stable page may not be unmapped until ksmd has completed a full 
scan. But I am debugging in this direction and would get back once I 
have sufficient data.


>
> I believe I have found two bugs that would explain the latter case;
> but both of them require fork, and legend has it that Android avoids
> fork (correct me if wrong); so I doubt they're responsible for your
> case, and expect both to be corrected within one full scan.
>
> The lesser of the bugs is this: KSM reclaim (dependent on anon_vmas)
> was introduced in 2.6.33, but then anon_vma_chains were introduced
> in 2.6.34, and I suspect that the conversion ought to have updated
> try_to_merge_with_ksm_page, to take rmap_item->anon_vma from page
> instead of from vma.  I believe that some fork-connected mappings
> may be missed for a scan because of that.
>
> But fixing it doesn't help much: because the greater bug (mine) is
> that the search_new_forks code is not working as well as intended.
> It relies on using one rmap_item's anon_vma to locate the page in
> newer mappings forked from it, before ksmd reaches them to create
> their own rmap_items; but we're doing nothing to prevent that
> earlier rmap_item from being removed too soon.
>
> I would much rather be sending a patch, than trying to describe
> this so obscurely; but I have not succeeded and time has run out.
>
> I got far enough, I think, to confirm that this happens for me,
> and can be fixed by delaying the removal of such rmap_items.
> But I did not get far enough to stop them from leaking wildly;
> and although I've searched for quick and easy ways to do it,
> have come to the conclusion that fixing it safely without leaks
> will require more time and care than I can afford at present.
>
> (And even with those fixed, there would still be rare cases when
> a new mapping could not immediately be unmapped: for example,
> replace_page increments kpage's mapcount, but a racing
> try_to_unmap_ksm may hold kpage's page lock, preventing the
> relevant rmap_item from being appended to the stable tree.)
>
> I do hate to put down half-finished work, and would have liked
> to send you a patch, even if only to confirm that my problem
> is actually not your problem.  But I now see no alternative to
> merely informing you of this, and wishing you luck in your own
> investigation: I'm sorry, I just don't know.
>
> But if I've misunderstood, and you think that what you're seeing
> fits with the transient forking bugs I've (not quite) described,
> and you can explain why even the transient case is important for
> you to have fixed, then I really ought to redouble my efforts.
>
> Hugh


-- 
Susheel Khiani QUALCOMM INDIA, on behalf of Qualcomm Innovation Center,
Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/