lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 22 May 2014 11:24:23 +0200
From:	Vlastimil Babka <vbabka@...e.cz>
To:	Mel Gorman <mgorman@...e.de>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Joonsoo Kim <iamjoonsoo.kim@....com>
CC:	Johannes Weiner <hannes@...xchg.org>, Jan Kara <jack@...e.cz>,
	Michal Hocko <mhocko@...e.cz>, Hugh Dickins <hughd@...gle.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Dave Hansen <dave.hansen@...el.com>,
	Linux Kernel <linux-kernel@...r.kernel.org>,
	Linux-MM <linux-mm@...ck.org>,
	Linux-FSDevel <linux-fsdevel@...r.kernel.org>
Subject: Re: [PATCH 09/19] mm: page_alloc: Use word-based accesses for get/set
 pageblock bitmaps

On 05/13/2014 11:45 AM, Mel Gorman wrote:
> The test_bit operations in get/set pageblock flags are expensive. This patch
> reads the bitmap on a word basis and use shifts and masks to isolate the bits
> of interest. Similarly masks are used to set a local copy of the bitmap and then
> use cmpxchg to update the bitmap if there have been no other changes made in
> parallel.
> 
> In a test running dd onto tmpfs the overhead of the pageblock-related
> functions went from 1.27% in profiles to 0.5%.
> 
> Signed-off-by: Mel Gorman <mgorman@...e.de>
> Acked-by: Vlastimil Babka <vbabka@...e.cz>

Hi, I've tested if this closes the race I've been previously trying to fix
with the series in http://marc.info/?l=linux-mm&m=139359694028925&w=2
And indeed with this patch I wasn't able to reproduce it in my stress test
(which adds lots of memory isolation calls) anymore. So thanks to Mel I can
dump my series in the trashcan :P

Therefore I believe something like below should be added to the changelog,
and put to stable as well.

Thanks,
Vlastimil

-----8<-----
In addition to the performance benefits, this patch closes races that are
possible between:

a) get_ and set_pageblock_migratetype(), where get_pageblock_migratetype()
   reads part of the bits before and other part of the bits after
   set_pageblock_migratetype() has updated them.

b) set_pageblock_migratetype() and set_pageblock_skip(), where the non-atomic
   read-modify-update set bit operation in set_pageblock_skip() will cause
   lost updates to some bits changed in the set_pageblock_migratetype().

Joonsoo Kim first reported the case a) via code inspection. Vlastimil Babka's
testing with a debug patch showed that either a) or b) occurs roughly once per
mmtests' stress-highalloc benchmark (although not necessarily in the same
pageblock). Furthermore during development of unrelated compaction patches,
it was observed that frequent calls to {start,undo}_isolate_page_range() the
race occurs several thousands of times and has resulted in NULL pointer
dereferences in move_freepages() and free_one_page() in places where
free_list[migratetype] is manipulated by e.g. list_move(). Further debugging
confirmed that migratetype had invalid value of 6, causing out of bounds access
to the free_list array. 

That confirmed that the race exist, although it may be extremely rare, and
currently only fatal where page isolation is performed due to memory hot remove.
Races on pageblocks being updated by set_pageblock_migratetype(), where both
old and new migratetype are lower MIGRATE_RESERVE, currently cannot result in an
invalid value being observed, although theoretically they may still lead to
unexpected creation or destruction of MIGRATE_RESERVE pageblocks. Furthermore,
things could get suddenly worse when memory isolation is used more, or when new
migratetypes are added.

After this patch, the race has no longer been observed in testing.

Reported-by: Joonsoo Kim <iamjoonsoo.kim@....com>
Reported-and-tested-by: Vlastimil Babka <vbabka@...e.cz>
Cc: <stable@...r.kernel.org>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists