lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sat, 2 Jun 2012 00:20:13 -0700 (PDT)
From:	Hugh Dickins <hughd@...gle.com>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
cc:	Bartlomiej Zolnierkiewicz <b.zolnierkie@...sung.com>,
	Kyungmin Park <kyungmin.park@...sung.com>,
	Marek Szyprowski <m.szyprowski@...sung.com>,
	Mel Gorman <mgorman@...e.de>, Minchan Kim <minchan@...nel.org>,
	Rik van Riel <riel@...hat.com>,
	Dave Jones <davej@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Cong Wang <amwang@...hat.com>,
	Markus Trippelsdorf <markus@...ppelsdorf.de>,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: WARNING: at mm/page-writeback.c:1990
 __set_page_dirty_nobuffers+0x13a/0x170()

On Fri, 1 Jun 2012, Linus Torvalds wrote:
> On Fri, Jun 1, 2012 at 9:40 PM, Hugh Dickins <hughd@...gle.com> wrote:
> >
> > Move the lock after the loop, I think you meant.
> 
> Well, I wasn't sure if anything inside the loop might need it. I don't
> *think* so, but at the same time, what protects "page_order(page)"
> (or, indeed PageBuddy()) from being stable while that loop content
> uses them?

Yes, I believe you're right, page_order(page) could supply nonsense
if it's not stabilized under zone->lock along with PageBuddy(page).

Though if this rescue_unmovable_pageblock() is just best-effort,
with a little more care we can probably avoid the lock in there.

> 
> I don't understand that code at all. It does that crazy iteration over
> page, and changes "page" in random ways,

I don't think they're random ways: when buddy it uses the order to skip
that block, otherwise it goes page by page, considering a free (I guess
on pcp) page or an lru page as good for movable.

> and then finishes up with a
> totally new "page" value that is some random thing that is *after* the
> end_page thing. WHAT?
> 
> The code makes no sense. It tests all those pages within the
> page-block, but then after it has done all those tests, it does the
> final
> 
>   set_pageblock_migratetype(..)
>   move_freepages_block(..)
> 
> using a page that is *beyond* the pageblock (and with the whole
> page_order() thing, who knows just how far beyond it?)

I totally missed that, thank goodness you did not.  Yes, it's rubbish.
It goes to this effort to find a suitable pageblock, then chooses the
next one instead (or possibly another).  Perhaps it would get even
better results using a random number generator in there.

> 
> It looks entirely too much like random-monkey code to me.

Presumably it should be passing start_page instead of page
to set_pageblock_migratetype() and move_freepages_block().

But this does seem to be code of the kind, that the longer you look
at it, the more bugs you find.  And I worry about what trouble it
might then cause, if it actually started to work in the way it was
intending.  I don't think fixing it up is wise for -rc1.

Commit 5ceb9ce6fe9462a298bb2cd5c9f1ca6cb80a0199
("mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks")
appears to revert cleanly, and I'm running with it reverted now.

I'm not saying it shouldn't come back later, but does anyone see
an argument against reverting it now?

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ