linux-kernel - Re: [Xen-devel] [PATCH v2 0/2] xen: vnuma introduction for pv guest

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEr7rXiOeBCzSmbkKDPqaSt3H85wciOiG0z=-Lbi73_Lx6AkVg@mail.gmail.com>
Date:	Fri, 20 Dec 2013 02:39:52 -0500
From:	Elena Ufimtseva <ufimtseva@...il.com>
To:	Dario Faggioli <dario.faggioli@...rix.com>
Cc:	Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
	akpm@...ux-foundation.org, wency@...fujitsu.com,
	Stefano Stabellini <stefano.stabellini@...citrix.com>,
	x86@...nel.org, linux-kernel@...r.kernel.org,
	tangchen@...fujitsu.com, mingo@...hat.com,
	David Vrabel <david.vrabel@...rix.com>,
	"H. Peter Anvin" <hpa@...or.com>,
	xen-devel <xen-devel@...ts.xenproject.org>,
	Boris Ostrovsky <boris.ostrovsky@...cle.com>,
	tglx@...utronix.de, Ian Campbell <ian.campbell@...rix.com>
Subject: Re: [Xen-devel] [PATCH v2 0/2] xen: vnuma introduction for pv guest

On Wed, Dec 4, 2013 at 8:13 PM, Dario Faggioli
<dario.faggioli@...rix.com> wrote:
> On mer, 2013-12-04 at 01:20 -0500, Elena Ufimtseva wrote:
>> On Tue, Dec 3, 2013 at 7:35 PM, Elena Ufimtseva <ufimtseva@...il.com> wrote:
>> > Oh guys, I feel really bad about not replying to these emails... Somehow these
>> > replies all got deleted.. wierd.
>> >
> No worries... You should see *my* backlog. :-P
>
>> > Ok, about that automatic balancing. At the moment of the last patch
>> > automatic numa balancing seem to
>> > work, but after rebasing on the top of 3.12-rc2 I see similar issues.
>> > I will try to figure out what commits broke and will contact Ingo
>> > Molnar and Mel Gorman.
>> >
>> As of now I have patch v4 for reviewing. Not sure if it will be
>> beneficial to post it for review
>> or look closer at the current problem.
>>
> You mean the Linux side? Perhaps stick somewhere a reference to the git
> tree/branch where it lives, but, before re-sending, let's wait for it to
> be as issue free as we can tell?
>
>> The issue I am seeing right now is defferent from what was happening before.
>> The corruption happens when on change_prot_numa way :
>>
> Ok, so, I think I need to step back a bit from the actual stack trace
> and look at the big picture. Please, Elena or anyone, correct me if I'm
> saying something wrong about how Linux's autonuma works and interacts
> with Xen.
>
> The way it worked when I last looked at it was sort of like this:
>  - there was a kthread scanning all the pages, removing the PAGE_PRESENT
>    bit from actually present pages, and adding a new special one
>    (PAGE_NUMA or something like that);
>  - when a page fault is triggered and the PAGE_NUMA flag is found, it
>    figures out the page is actually there, so no swap or anything.
>    However, it tracks from what node the access to that page came from,
>    matches it with the node where the page actually is and collect some
>    statistics about that;
>  - at some point (and here I don't remember the exact logic, since it
>    changed quite a few times) pages ranking badly in the stats above are
>    moved from one node to another.

Hello Dario, Konrad.

- Yes, there is a kernel worker that runs on each node and scans some
pages stats and
marks them as _PROT_NONE and resets _PAGE_PRESENT.
The page fault at this moment is triggered and control is being
returned back to the linux pv kernel
to process with handle_mm_fault and page numa fault handler if
discovered if that was a numa pmd/pte with
present flag cleared.
About the stats, I will have to collect some sensible information.

>
> Is this description still accurate? If yes, here's what I would (double)
> check, when running this in a PV guest on top of Xen:
>
>  1. the NUMA hinting page fault, are we getting and handling them
>     correctly in the PV guest? Are the stats in the guest kernel being
>     updated in a sensible way, i.e., do they make sense and properly
>     relate to the virtual topology of the guest?
>     At some point we thought it would have been necessary to intercept
>     these faults and make sure the above is true with some help from the
>     hypervisor... Is this the case? Why? Why not?

The real healp needed from hypervisor is to allow _PAGE_NUMA flags on
pte/pmd entries.
I have done so in hypervisor by utilizing same _PAGE_NUMA bit and
including into the allowed bit mask.
As this bit is the same as PAGE_GLOBAL in hypervisor, that may induce
some other errors. So far I have not seen any
and I will double check on this.

>
>  2. what happens when autonuma tries to move pages from one node to
>     another? For us, that would mean in moving from one virtual node
>     to another... Is there a need to do anything at all? I mean, is
>     this, from our perspective, just copying the content of an MFN from
>     node X into another MFN on node Y, or do we need to update some of
>     our vnuma tracking data structures in Xen?
>
> If we have this figured out already, then I think we just chase bugs and
> repost the series. If not, well, I think we should. :-D
>
here is the best part :)

After a fresh look at the numa autobalancing, applying recent patches,
talking some to riel who works now on mm numa autobalancing and
running some tests including dd, ltp, kernel compiling and my own
tests, autobalancing now is working
correctly with vnuma. Now I can see sucessfully migrated pages in /proc/vmstat:

numa_pte_updates 39
numa_huge_pte_updates 0
numa_hint_faults 36
numa_hint_faults_local 23
numa_pages_migrated 4
pgmigrate_success 4
pgmigrate_fail 0

I will be running some tests with transparent huge pages as the
migration of such will be failing.
Probably it is possible to find all the patches related to numa
autobalancing and figure out possible reasons
of why previously balancing was not working. Giving the amount of work
kernel folks spent recently to fix
issues with numa and the significance of the changes itself, I might
need few more attempts to understand it.

I am going to test THP and if that works will follow up with patches.

Dario, what tools did you use to test NUMA on xen? Maybe there is
something I can use as well?
Here http://lwn.net/Articles/558593/ Mel Gorman uses specjbb and jvm,
I though I can run something similar.

> Thanks and Regards,
> Dario
>
> --
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
>



-- 
Elena
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/