lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160512135253.GA17039@gmail.com>
Date:	Thu, 12 May 2016 15:52:53 +0200
From:	Jerome Glisse <j.glisse@...il.com>
To:	Nicolas Morey-Chaisemartin <devel@...ey-chaisemartin.com>
Cc:	Hugh Dickins <hughd@...gle.com>,
	Mel Gorman <mgorman@...hsingularity.net>,
	Andrea Arcangeli <aarcange@...hat.com>,
	"Kirill A. Shutemov" <kirill@...temov.name>,
	"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
	Alex Williamson <alex.williamson@...hat.com>,
	One Thousand Gnomes <gnomes@...rguk.ukuu.org.uk>,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [Question] Missing data after DMA read transfer - mm issue with
 transparent huge page?

On Thu, May 12, 2016 at 03:30:24PM +0200, Nicolas Morey-Chaisemartin wrote:
> Le 05/12/2016 à 11:36 AM, Jerome Glisse a écrit :
> > On Thu, May 12, 2016 at 08:07:59AM +0200, Nicolas Morey-Chaisemartin wrote:
> >>
> >> Le 05/11/2016 à 04:51 PM, Jerome Glisse a écrit :
> >>> On Wed, May 11, 2016 at 01:15:54PM +0200, Nicolas Morey Chaisemartin wrote:
> >>>> Le 05/10/2016 à 12:01 PM, Jerome Glisse a écrit :
> >>>>> On Tue, May 10, 2016 at 09:04:36AM +0200, Nicolas Morey Chaisemartin wrote:
> >>>>>> Le 05/03/2016 à 12:11 PM, Jerome Glisse a écrit :
> >>>>>>> On Mon, May 02, 2016 at 09:04:02PM -0700, Hugh Dickins wrote:
> >>>>>>>> On Fri, 29 Apr 2016, Nicolas Morey Chaisemartin wrote:
> >>>> [...]
> >>>>>> Hi,
> >>>>>>
> >>>>>> I backported the patch to 3.10 (had to copy paste pmd_protnone defitinition from 4.5) and it's working !
> >>>>>> I'll open a ticket in Redhat tracker to try and get this fixed in RHEL7.
> >>>>>>
> >>>>>> I have a dumb question though: how can we end up in numa/misplaced memory code on a single socket system?
> >>>>>>
> >>>>> This patch is not a fix, do you see bug message in kernel log ? Because if
> >>>>> you do that it means we have a bigger issue.
> >>>>>
> >>>>> You did not answer one of my previous question, do you set get_user_pages
> >>>>> with write = 1 as a paremeter ?
> >>>>>
> >>>>> Also it would be a lot easier if you were testing with lastest 4.6 or 4.5
> >>>>> not RHEL kernel as they are far appart and what might looks like same issue
> >>>>> on both might be totaly different bugs.
> >>>>>
> >>>>> If you only really care about RHEL kernel then open a bug with Red Hat and
> >>>>> you can add me in bug-cc <jglisse@...hat.com>
> >>>>>
> >>>>> Cheers,
> >>>>> Jérôme
> >>>> I finally managed to get a proper setup.
> >>>> I build a vanilla 4.5 kernel from git tree using the Centos7 config, my test fails as usual.
> >>>> I applied your patch, rebuild => still fails and no new messages in dmesg.
> >>>>
> >>>> Now that I don't have to go through the RPM repackaging, I can try out things much quicker if you have any ideas.
> >>>>
> >>> Still an issue if you boot with transparent_hugepage=never ?
> >>>
> >>> Also to simplify investigation force write to 1 all the time no matter what.
> >>>
> >>> Cheers,
> >>> Jérôme
> >> With transparent_hugepage=never I can't see the bug anymore.
> >>
> > Can you test https://patchwork.kernel.org/patch/9061351/ with 4.5
> > (does not apply to 3.10) and without transparent_hugepage=never
> >
> > Jérôme
> 
> Fails with 4.5 + this patch and with 4.5 + this patch + yours
> 

There must be some bug in your code, we have upstream user that works
fine with the above combination (see drivers/vfio/vfio_iommu_type1.c)
i suspect you might be releasing the page pin too early (put_page()).

If you really believe it is bug upstream we would need a dumb kernel
module that does gup like you do and that shows the issue. Right now
looking at code (assuming above patches applied) i can't see anything
that can go wrong with THP.

Cheers,
Jérôme

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ