linux-kernel - Re: [PATCH] mm: numa: Do not trap faults on shared data section pages.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <2BEFC6DE-7A47-4CB9-AAE5-CEF70453B46F@oracle.com>
Date:   Thu, 18 Jan 2018 17:06:54 -0800
From:   Henry Willard <henry.willard@...cle.com>
To:     Christopher Lameter <cl@...ux.com>
Cc:     Mel Gorman <mgorman@...e.de>, akpm@...ux-foundation.org,
        kstewart@...uxfoundation.org, zi.yan@...rutgers.edu,
        pombredanne@...b.com, aarcange@...hat.com,
        gregkh@...uxfoundation.org, aneesh.kumar@...ux.vnet.ibm.com,
        kirill.shutemov@...ux.intel.com, jglisse@...hat.com,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm: numa: Do not trap faults on shared data section
 pages.

> On Jan 17, 2018, at 10:23 AM, Christopher Lameter <cl@...ux.com> wrote:
> 
> On Tue, 16 Jan 2018, Mel Gorman wrote:
> 
>> My main source of discomfort is the fact that this is permanent as two
>> processes perfectly isolated but with a suitably shared COW mapping
>> will never migrate the data. A potential improvement to get the reported
>> bandwidth up in the test program would be to skip the rest of the VMA if
>> page_mapcount != 1 in a COW mapping as it would be reasonable to assume
>> the remaining pages in the VMA are also affected and the scan is wasteful.
>> There are counter-examples to this but I suspect that the full VMA being
>> shared is the common case. Whether you do that or not;
> 
> Same concern here. Typically CAP_SYS_NICE will bypass the check that the
> page is only mapped to a single process and the check looks exactly like
> the ones for manual migration. Using CAP_SYS_NICE would be surprising
> here since autonuma is not triggered by the currently running process.
> 
> Can we configure this somehow via sysfs?

If I understand the code correctly, CAP_SYS_NICE allows MPOL_MF_MOVE_ALL to be set with mbind() or used with move_pages(). CAP_SYS_NICE also causes migrate_pages() to behave as if MPOL_MF_MOVE_ALL were specified. There are checks requiring either MPOL_MF_MOVE_ALL or page_mapcount(page) == 1. The normal case does not call change_prot_numa(). change_prot_numa() is only called when MPOL_MF_LAZY is specified, and at the moment MPOL_MF_LAZY is not recognized as a valid flag. It looks to me that as things stand now, change_prot_numa() is only called from task_numa_work().

If MPOL_MF_LAZY were allowed and specified things would not work correctly. change_pte_range() is unaware of and can’t honor the difference between MPOL_MF_MOVE_ALL and MPOL_MF_MOVE. 

For the case of auto numa balancing, it may be undesirable for shared pages to be migrated whether they are also copy-on-write or not. The copy-on-write test was added to restrict the effect of the patch to the specific situation we observed. Perhaps I should remove it, I don’t understand why it would be desirable to modify the behavior via sysfs.

Thanks,
Henry
> 
>