lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190731053444.GA155569@google.com>
Date:   Wed, 31 Jul 2019 14:34:44 +0900
From:   Minchan Kim <minchan@...nel.org>
To:     Qian Cai <cai@....pw>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Johannes Weiner <hannes@...xchg.org>,
        Michal Hocko <mhocko@...e.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: "mm: account nr_isolated_xxx in [isolate|putback]_lru_page"
 breaks OOM with swap

On Tue, Jul 30, 2019 at 12:25:28PM -0400, Qian Cai wrote:
> OOM workloads with swapping is unable to recover with linux-next since next-
> 20190729 due to the commit "mm: account nr_isolated_xxx in
> [isolate|putback]_lru_page" breaks OOM with swap" [1]
> 
> [1] https://lore.kernel.org/linux-mm/20190726023435.214162-4-minchan@kernel.org/
> T/#mdcd03bcb4746f2f23e6f508c205943726aee8355
> 
> For example, LTP oom01 test case is stuck for hours, while it finishes in a few
> minutes here after reverted the above commit. Sometimes, it prints those message
> while hanging.
> 
> [  509.983393][  T711] INFO: task oom01:5331 blocked for more than 122 seconds.
> [  509.983431][  T711]       Not tainted 5.3.0-rc2-next-20190730 #7
> [  509.983447][  T711] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [  509.983477][  T711] oom01           D24656  5331   5157 0x00040000
> [  509.983513][  T711] Call Trace:
> [  509.983538][  T711] [c00020037d00f880] [0000000000000008] 0x8 (unreliable)
> [  509.983583][  T711] [c00020037d00fa60] [c000000000023724]
> __switch_to+0x3a4/0x520
> [  509.983615][  T711] [c00020037d00fad0] [c0000000008d17bc]
> __schedule+0x2fc/0x950
> [  509.983647][  T711] [c00020037d00fba0] [c0000000008d1e68] schedule+0x58/0x150
> [  509.983684][  T711] [c00020037d00fbd0] [c0000000008d7614]
> rwsem_down_read_slowpath+0x4b4/0x630
> [  509.983727][  T711] [c00020037d00fc90] [c0000000008d7dfc]
> down_read+0x12c/0x240
> [  509.983758][  T711] [c00020037d00fd20] [c00000000005fb28]
> __do_page_fault+0x6f8/0xee0
> [  509.983801][  T711] [c00020037d00fe20] [c00000000000a364]
> handle_page_fault+0x18/0x38

Thanks for the testing! No surprise the patch make some bugs because
it's rather tricky.

Could you test this patch?

>From b31667210dd747f4d8aeb7bdc1f5c14f1f00bff5 Mon Sep 17 00:00:00 2001
From: Minchan Kim <minchan@...nel.org>
Date: Wed, 31 Jul 2019 14:18:01 +0900
Subject: [PATCH] mm: decrease NR_ISOALTED count at succesful migration

If migration fails, it should go back to LRU list so putback_lru_page
could handle NR_ISOLATED count in pair with isolate_lru_page. However,
if migration is successful, the page will be freed so no need to
add the page back to LRU list. Thus, NR_ISOLATED count should be done
in manually.

Signed-off-by: Minchan Kim <minchan@...nel.org>
---
 mm/migrate.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 84b89d2d69065..96ae0c3cada8d 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1166,6 +1166,7 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page,
 {
 	int rc = MIGRATEPAGE_SUCCESS;
 	struct page *newpage;
+	bool is_lru = __PageMovable(page);
 
 	if (!thp_migration_supported() && PageTransHuge(page))
 		return -ENOMEM;
@@ -1175,17 +1176,10 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page,
 		return -ENOMEM;
 
 	if (page_count(page) == 1) {
-		bool is_lru = !__PageMovable(page);
-
 		/* page was freed from under us. So we are done. */
 		ClearPageActive(page);
 		ClearPageUnevictable(page);
-		if (likely(is_lru))
-			mod_node_page_state(page_pgdat(page),
-						NR_ISOLATED_ANON +
-						page_is_file_cache(page),
-						-hpage_nr_pages(page));
-		else {
+		if (unlikely(!is_lru)) {
 			lock_page(page);
 			if (!PageMovable(page))
 				__ClearPageIsolated(page);
@@ -1229,6 +1223,12 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page,
 			if (set_hwpoison_free_buddy_page(page))
 				num_poisoned_pages_inc();
 		}
+
+		if (likely(is_lru))
+			mod_node_page_state(page_pgdat(page),
+					NR_ISOLATED_ANON +
+						page_is_file_cache(page),
+					-hpage_nr_pages(page));
 	} else {
 		if (rc != -EAGAIN) {
 			if (likely(!__PageMovable(page))) {
-- 
2.22.0.709.g102302147b-goog

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ