lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20151013171246.0a7b255e@mschwide>
Date:	Tue, 13 Oct 2015 17:12:46 +0200
From:	Martin Schwidefsky <schwidefsky@...ibm.com>
To:	Vlastimil Babka <vbabka@...e.cz>
Cc:	linux-mm@...ck.org, linux-kernel@...r.kernel.org,
	Yong Sun <yosun@...e.com>, linux390@...ibm.com,
	linux-s390@...r.kernel.org, Zhang Yi <wetpzy@...il.com>,
	Mel Gorman <mgorman@...e.de>,
	Andrea Arcangeli <aarcange@...hat.com>,
	"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
	Hugh Dickins <hughd@...gle.com>,
	Dominik Dingel <dingel@...ux.vnet.ibm.com>,
	Heiko Carstens <heiko.carstens@...ibm.com>,
	Christian Borntraeger <borntraeger@...ibm.com>
Subject: Re: [RFC] futex: prevent endless loop on s390x with emulated
 hugepages

On Tue, 13 Oct 2015 15:51:22 +0200
Vlastimil Babka <vbabka@...e.cz> wrote:

> On 10/13/2015 01:48 PM, Vlastimil Babka wrote:
> > On 09/28/2015 01:49 PM, Martin Schwidefsky wrote:
> >> On Thu, 24 Sep 2015 17:05:48 +0200
> >> Vlastimil Babka <vbabka@...e.cz> wrote:
> >
> > [...]
> >
> >>> However, __get_user_pages_fast() is still broken. The get_user_pages_fast()
> >>> wrapper will hide this in the common case. The other user of the __ variant
> >>> is kvm, which is mentioned as the reason for removal of emulated hugepages.
> >>> The call of page_cache_get_speculative() looks also broken in this scenario
> >>> on debug builds because of VM_BUG_ON_PAGE(PageTail(page), page). With
> >>> CONFIG_TINY_RCU enabled, there's plain atomic_inc(&page->_count) which also
> >>> probably shouldn't happen for a tail page...
> >>
> >> It boils down to __get_user_pages_fast being broken for emulated large pages,
> >> doesn't it? My preferred fix would be to get __get_user_page_fast to work
> >> in this case.
> >
> > I agree, but didn't know enough of the architecture to attempt such fix
> > :) Thanks!
> >
> >> For 3.12 a patch would look like this (needs more testing
> >> though):
> >
> > FWIW it works for me in the particular LTP test, but as you said, it
> > needs more testing and breaking stable would suck.
> 
> I'm trying to break the patch on 3.12 with trinity, let's see...
> Tried also to review it, although it's unlikely I'll catch some 
> s390x-specific gotchas. For example, can't say what the effect of 
> _SEGMENT_ENTRY_CO removal will be - before, the bit was set for 
> non-emulated hugepages, and now the same bit is set for emulated ones?
> Or if pmd_bad() was also broken before, and now isn't? But otherwise the 
> change seems OK, besides some nitpick below.

The _SEGMENT_ENTRY_CO is the segment change-override bit. This allows the
machine to skip storage-key updates for the dirty bit. Linux uses the 
storage keys only for KVM which does not allow any kind of large page
to be present. The latest PoP remove the change-override bits again,
it never had an effect. As the bit is ignored I can reuse it as the
software large page bit.

> >> @@ -103,7 +104,7 @@ static inline int gup_pmd_range(pud_t *pudp, pud_t pud, unsigned long addr,
> >>    		unsigned long end, int write, struct page **pages, int *nr)
> >>    {
> >>    	unsigned long next;
> >> -	pmd_t *pmdp, pmd;
> >> +	pmd_t *pmdp, pmd, pmd_orig;
> >>
> >>    	pmdp = (pmd_t *) pudp;
> >>    #ifdef CONFIG_64BIT
> >> @@ -112,7 +113,7 @@ static inline int gup_pmd_range(pud_t *pudp, pud_t pud, unsigned long addr,
> >>    	pmdp += pmd_index(addr);
> >>    #endif
> >>    	do {
> >> -		pmd = *pmdp;
> >> +		pmd = pmd_orig = *pmdp;
> >>    		barrier();
> >>    		next = pmd_addr_end(addr, end);
> >>    		/*
> >> @@ -127,8 +128,9 @@ static inline int gup_pmd_range(pud_t *pudp, pud_t pud, unsigned long addr,
> >>    		if (pmd_none(pmd) || pmd_trans_splitting(pmd))
> >>    			return 0;
> >>    		if (unlikely(pmd_large(pmd))) {
> >> -			if (!gup_huge_pmd(pmdp, pmd, addr, next,
> >> -					  write, pages, nr))
> >> +			if (!gup_huge_pmd(pmdp, pmd_orig,
> >> +					  pmd_swlarge_deref(pmd),
> >> +					  addr, next, write, pages, nr))
> >>    				return 0;
> >>    		} else if (!gup_pte_range(pmdp, pmd, addr, next,
> >>    					  write, pages, nr))
> 
> The "pmd" variable isn't changed anywhere in this loop after the initial 
> assignment, so the extra "pmd_orig" variable isn't needed.

That is true, I will remove the pmd_orig variable.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ