lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAPvkgC3vbAG9f-MnVoZ_VwQMZNTbhS6Vx0FgcVGSCcOKgmqWkA@mail.gmail.com>
Date:	Thu, 25 Feb 2016 16:08:28 +0000
From:	Steve Capper <steve.capper@...aro.org>
To:	"Kirill A. Shutemov" <kirill@...temov.name>
Cc:	Will Deacon <will.deacon@....com>,
	Gerald Schaefer <gerald.schaefer@...ibm.com>,
	Christian Borntraeger <borntraeger@...ibm.com>,
	"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Michael Ellerman <mpe@...erman.id.au>,
	Benjamin Herrenschmidt <benh@...nel.crashing.org>,
	Paul Mackerras <paulus@...ba.org>,
	linuxppc-dev@...ts.ozlabs.org,
	Catalin Marinas <catalin.marinas@....com>,
	"linux-arm-kernel@...ts.infradead.org" 
	<linux-arm-kernel@...ts.infradead.org>,
	Martin Schwidefsky <schwidefsky@...ibm.com>,
	Heiko Carstens <heiko.carstens@...ibm.com>,
	linux-s390@...r.kernel.org,
	Sebastian Ott <sebott@...ux.vnet.ibm.com>,
	Steve Capper <steve.capper@....com>
Subject: Re: [BUG] random kernel crashes after THP rework on s390 (maybe also
 on PowerPC and ARM)

On 25 February 2016 at 16:01, Kirill A. Shutemov <kirill@...temov.name> wrote:
> On Thu, Feb 25, 2016 at 03:49:33PM +0000, Steve Capper wrote:
>> On 23 February 2016 at 18:47, Will Deacon <will.deacon@....com> wrote:
>> > [adding Steve, since he worked on THP for 32-bit ARM]
>>
>> Apologies for my late reply...
>>
>> >
>> > On Tue, Feb 23, 2016 at 07:19:07PM +0100, Gerald Schaefer wrote:
>> >> On Tue, 23 Feb 2016 13:32:21 +0300
>> >> "Kirill A. Shutemov" <kirill@...temov.name> wrote:
>> >> > The theory is that the splitting bit effetely masked bogus pmd_present():
>> >> > we had pmd_trans_splitting() in all code path and that prevented mm from
>> >> > touching the pmd. Once pmd_trans_splitting() has gone, mm proceed with the
>> >> > pmd where it shouldn't and here's a boom.
>> >>
>> >> Well, I don't think pmd_present() == true is bogus for a trans_huge pmd under
>> >> splitting, after all there is a page behind the the pmd. Also, if it was
>> >> bogus, and it would need to be false, why should it be marked !pmd_present()
>> >> only at the pmdp_invalidate() step before the pmd_populate()? It clearly
>> >> is pmd_present() before that, on all architectures, and if there was any
>> >> problem/race with that, setting it to !pmd_present() at this stage would
>> >> only (marginally) reduce the race window.
>> >>
>> >> BTW, PowerPC and Sparc seem to do the same thing in pmdp_invalidate(),
>> >> i.e. they do not set pmd_present() == false, only mark it so that it would
>> >> not generate a new TLB entry, just like on s390. After all, the function
>> >> is called pmdp_invalidate(), and I think the comment in mm/huge_memory.c
>> >> before that call is just a little ambiguous in its wording. When it says
>> >> "mark the pmd notpresent" it probably means "mark it so that it will not
>> >> generate a new TLB entry", which is also what the comment is really about:
>> >> prevent huge and small entries in the TLB for the same page at the same
>> >> time.
>> >>
>> >> FWIW, and since the ARM arch-list is already on cc, I think there is
>> >> an issue with pmdp_invalidate() on ARM, since it also seems to clear
>> >> the trans_huge (and formerly trans_splitting) bit, which actually makes
>> >> the pmd !pmd_present(), but it violates the other requirement from the
>> >> comment:
>> >> "the pmd_trans_huge and pmd_trans_splitting must remain set at all times
>> >> on the pmd until the split is complete for this pmd"
>> >
>> > I've only been testing this for arm64 (where I'm yet to see a problem),
>> > but we use the generic pmdp_invalidate implementation from
>> > mm/pgtable-generic.c there. On arm64, pmd_trans_huge will return true
>> > after pmd_mknotpresent. On arm, it does look to be buggy, since it nukes
>> > the entire entry... Steve?
>>
>> pmd_mknotpresent on arm looks inconsistent with the other
>> architectures and can be changed.
>>
>> Having had a look at the usage, I can't see it causing an immediate
>> problem (that needs to be addressed by an emergency patch).
>> We don't have a notion of splitting pmds (so there is no splitting
>> information to lose), and the only usage I could see of
>> pmd_mknotpresent was:
>>
>> pmdp_invalidate(vma, haddr, pmd);
>> pmd_populate(mm, pmd, pgtable);
>>
>> In mm/huge_memory.c, around line 3588.
>>
>> So we invalidate the entry (which puts down a faulting entry from
>> pmd_mknotpresent and invalidates tlb), then immediately put down a
>> table entry with pmd_populate.
>>
>> I have run a 32-bit ARM test kernel and exacerbated THP splits (that's
>> what took me time), and I didn't notice any problems with 4.5-rc5.
>
> If I read code correctly, your pmd_mknotpresent() makes the pmd
> pmd_none(), right? If yes, it's a problem.
>
> It introduces race I've described here:
>
> https://marc.info/?l=linux-mm&m=144723658100512&w=4
>
> Basically, if zap_pmd_range() would see pmd_none() between
> pmdp_mknotpresent() and pmd_populate(), we're screwed.
>
> The race window is small, but it's there.

Ahhhh, okay, thank you Kirill.
I agree, I'll get a patch out.

Cheers,
--
Steve

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ