linux-kernel - Re: [PATCH] mm, numa: Fix bad pmd by atomically check for pmd_trans_huge when marking page tables prot

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170410121931.ytuebes7lvjbwfim@techsingularity.net>
Date:   Mon, 10 Apr 2017 13:19:31 +0100
From:   Mel Gorman <mgorman@...hsingularity.net>
To:     Vlastimil Babka <vbabka@...e.cz>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Rik van Riel <riel@...hat.com>,
        Michal Hocko <mhocko@...nel.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm, numa: Fix bad pmd by atomically check for
 pmd_trans_huge when marking page tables prot_numa

On Mon, Apr 10, 2017 at 12:03:20PM +0200, Vlastimil Babka wrote:
> On 04/10/2017 11:48 AM, Mel Gorman wrote:
> > A user reported a bug against a distribution kernel while running
> > a proprietary workload described as "memory intensive that is not
> > swapping" that is expected to apply to mainline kernels. The workload
> > is read/write/modifying ranges of memory and checking the contents. They
> > reported that within a few hours that a bad PMD would be reported followed
> > by a memory corruption where expected data was all zeros.  A partial report
> > of the bad PMD looked like
> > 
> > [ 5195.338482] ../mm/pgtable-generic.c:33: bad pmd ffff8888157ba008(000002e0396009e2)
> > [ 5195.341184] ------------[ cut here ]------------
> > [ 5195.356880] kernel BUG at ../mm/pgtable-generic.c:35!
> > ....
> > [ 5195.410033] Call Trace:
> > [ 5195.410471]  [<ffffffff811bc75d>] change_protection_range+0x7dd/0x930
> > [ 5195.410716]  [<ffffffff811d4be8>] change_prot_numa+0x18/0x30
> > [ 5195.410918]  [<ffffffff810adefe>] task_numa_work+0x1fe/0x310
> > [ 5195.411200]  [<ffffffff81098322>] task_work_run+0x72/0x90
> > [ 5195.411246]  [<ffffffff81077139>] exit_to_usermode_loop+0x91/0xc2
> > [ 5195.411494]  [<ffffffff81003a51>] prepare_exit_to_usermode+0x31/0x40
> > [ 5195.411739]  [<ffffffff815e56af>] retint_user+0x8/0x10
> > 
> > Decoding revealed that the PMD was a valid prot_numa PMD and the bad PMD
> > was a false detection. The bug does not trigger if automatic NUMA balancing
> > or transparent huge pages is disabled.
> > 
> > The bug is due a race in change_pmd_range between a pmd_trans_huge and
> > pmd_nond_or_clear_bad check without any locks held. During the pmd_trans_huge
> > check, a parallel protection update under lock can have cleared the PMD
> > and filled it with a prot_numa entry between the transhuge check and the
> > pmd_none_or_clear_bad check.
> > 
> > While this could be fixed with heavy locking, it's only necessary to
> > make a copy of the PMD on the stack during change_pmd_range and avoid
> > races. A new helper is created for this as the check if quite subtle and the
> > existing similar helpful is not suitable. This passed 154 hours of testing
> > (usually triggers between 20 minutes and 24 hours) without detecting bad
> > PMDs or corruption. A basic test of an autonuma-intensive workload showed
> > no significant change in behaviour.
> > 
> > Signed-off-by: Mel Gorman <mgorman@...hsingularity.net>
> > Cc: stable@...r.kernel.org
> 
> It would be better if there was a Fixes: tag, or at least version hint. Assuming
> it's since autonuma balancing was merged?
> 

Fair point. It's all the way back to 3.15 rather than all the way back to
the introduction of automatic NUMA balancing so

Cc: stable@...r.kernel.org # 3.15+

-- 
Mel Gorman
SUSE Labs