[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20121002124814.GA31316@avionic-0098.mockup.avionic-design.de>
Date: Tue, 2 Oct 2012 14:48:14 +0200
From: Thierry Reding <thierry.reding@...onic-design.de>
To: Mel Gorman <mgorman@...e.de>
Cc: Peter Ujfalusi <peter.ujfalusi@...com>,
Minchan Kim <minchan@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Marek Szyprowski <m.szyprowski@...sung.com>,
Michal Nazarewicz <mina86@...a86.com>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
Bartlomiej Zolnierkiewicz <b.zolnierkie@...sung.com>,
Kyungmin Park <kyungmin.park@...sung.com>,
Mark Brown <broonie@...nsource.wolfsonmicro.com>
Subject: Re: CMA broken in next-20120926
On Mon, Oct 01, 2012 at 04:24:29PM +0200, Thierry Reding wrote:
> On Fri, Sep 28, 2012 at 01:43:32PM +0100, Mel Gorman wrote:
> > On Fri, Sep 28, 2012 at 01:39:24PM +0200, Thierry Reding wrote:
> > > On Fri, Sep 28, 2012 at 12:07:12PM +0100, Mel Gorman wrote:
> > > > On Fri, Sep 28, 2012 at 12:51:13PM +0200, Thierry Reding wrote:
> > > > > On Fri, Sep 28, 2012 at 12:38:15PM +0200, Thierry Reding wrote:
> > > > > > On Fri, Sep 28, 2012 at 12:32:07PM +0200, Thierry Reding wrote:
> > > > > > > On Fri, Sep 28, 2012 at 11:27:28AM +0100, Mel Gorman wrote:
> > > > > > > > On Fri, Sep 28, 2012 at 11:48:25AM +0300, Peter Ujfalusi wrote:
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > On 09/28/2012 11:37 AM, Mel Gorman wrote:
> > > > > > > > > >> I hope this patch fixes the bug. If this patch fixes the problem
> > > > > > > > > >> but has some problem about description or someone has better idea,
> > > > > > > > > >> feel free to modify and resend to akpm, Please.
> > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > > > A full revert is overkill. Can the following patch be tested as a
> > > > > > > > > > potential replacement please?
> > > > > > > > > >
> > > > > > > > > > ---8<---
> > > > > > > > > > mm: compaction: Iron out isolate_freepages_block() and isolate_freepages_range() -fix1
> > > > > > > > > >
> > > > > > > > > > CMA is reported to be broken in next-20120926. Minchan Kim pointed out
> > > > > > > > > > that this was due to nr_scanned != total_isolated in the case of CMA
> > > > > > > > > > because PageBuddy pages are one scan but many isolations in CMA. This
> > > > > > > > > > patch should address the problem.
> > > > > > > > > >
> > > > > > > > > > This patch is a fix for
> > > > > > > > > > mm-compaction-acquire-the-zone-lock-as-late-as-possible-fix-2.patch
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Mel Gorman <mgorman@...e.de>
> > > > > > > > >
> > > > > > > > > linux-next + this patch alone also works for me.
> > > > > > > > >
> > > > > > > > > Tested-by: Peter Ujfalusi <peter.ujfalusi@...com>
> > > > > > > >
> > > > > > > > Thanks Peter. I expect it also works for Thierry as I expect you were
> > > > > > > > suffering the same problem but obviously confirmation of that would be nice.
> > > > > > >
> > > > > > > I've been running a few tests and indeed this solves the obvious problem
> > > > > > > that the coherent pool cannot be created at boot (which in turn caused
> > > > > > > the ethernet adapter to fail on Tegra).
> > > > > > >
> > > > > > > However I've been working on the Tegra DRM driver, which uses CMA to
> > > > > > > allocate large chunks of framebuffer memory and these are now failing.
> > > > > > > I'll need to check if Minchan's patch solves that problem as well.
> > > > > >
> > > > > > Indeed, with Minchan's patch the DRM can allocate the framebuffer
> > > > > > without a problem. Something else must be wrong then.
> > > > >
> > > > > However, depending on the size of the allocation it also happens with
> > > > > Minchan's patch. What I see is this:
> > > > >
> > > > > [ 60.736729] alloc_contig_range test_pages_isolated(1e900, 1f0e9) failed
> > > > > [ 60.743572] alloc_contig_range test_pages_isolated(1ea00, 1f1e9) failed
> > > > > [ 60.750424] alloc_contig_range test_pages_isolated(1ea00, 1f2e9) failed
> > > > > [ 60.757239] alloc_contig_range test_pages_isolated(1ec00, 1f3e9) failed
> > > > > [ 60.764066] alloc_contig_range test_pages_isolated(1ec00, 1f4e9) failed
> > > > > [ 60.770893] alloc_contig_range test_pages_isolated(1ec00, 1f5e9) failed
> > > > > [ 60.777698] alloc_contig_range test_pages_isolated(1ec00, 1f6e9) failed
> > > > > [ 60.784526] alloc_contig_range test_pages_isolated(1f000, 1f7e9) failed
> > > > > [ 60.791148] drm tegra: Failed to alloc buffer: 8294400
> > > > >
> > > > > I'm pretty sure this did work before next-20120926.
> > > > >
> > > >
> > > > Can you double check this please?
> > > >
> > > > This is a separate bug but may be related to the same series. However, CMA should
> > > > be ignoring the "skip" hints and because it's sync compaction it should
> > > > not be exiting due to lock contention. Maybe Marek will spot it.
> > >
> > > I've written a small test module that tries to allocate growing blocks
> > > of contiguous memory and it seems like with your patch this always fails
> > > at 8 MiB.
> >
> > You earlier said it also happens with Minchan's but your statment here
> > is less clear. Does Minchan's also fail on the 8MiB boundary? Second,
> > did the test module work with next-20120926?
>
> The cmatest module that I use tries to allocate blocks from 4 KiB to 256
> MiB (in increments of powers of two). With next-20120926 this always
> fails at 8 MiB, independent of the CMA size setting (though I didn't
> test setting the CMA size to <= 8 MiB, I assumed that would make the 8
> MiB allocation fail anyway). Note that I had to apply the attached patch
> which fixes a build failure on next-20120926. I believe that Mark Brown
> posted a similar fix a few days ago. I'm also attaching a log from the
> module's test run. There's also an interesting page allocation failure
> at the very end of that log which I have not seen with next-20120925.
>
> I've run the same tests on next-20120925 with the CMA size set to 256
> MiB and only the 256 MiB allocation fails. This is normal since there
> are other modules that already allocate smaller buffers from CMA, so a
> whole 256 MiB won't be available.
>
> Vanilla 3.6-rc6 shows the same behaviour as next-20120925. I will try
> 3.6-rc7 next since that's what next-20120926 is based on. If that
> succeeds I'll try to bisect between 3.6-rc7 and next-20120926 to find
> the culprit, but that will probably take some more time as I need to
> apply at least one other commit on top to get the board to boot at all.
>
> So this really isn't all that new, but I just wanted to confirm my
> results from last week. We'll see if bisection shows up something
> interesting.
I just finished bisecting this and git reports:
3750280f8bd0ed01753a72542756a8c82ab27933 is the first bad commit
I'm attaching the complete bisection log and a diff of all the changes
applied on top of the bad commit to make it compile and run on my board.
Most of the patch is probably not important, though. There are two hunks
which have the pageblock changes I already posted an two other hunks
with the patch you posted earlier.
I hope this helps. If you want me to run any other tests, please let me
know.
Thierry
View attachment "bisect.log" of type "text/plain" (3130 bytes)
View attachment "bisect.patch" of type "text/plain" (16073 bytes)
Content of type "application/pgp-signature" skipped
Powered by blists - more mailing lists