linux-kernel - Re: sync-Regression in 2.6.28.2?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <200902041717.27320.nickpiggin@yahoo.com.au>
Date:	Wed, 4 Feb 2009 17:17:26 +1100
From:	Nick Piggin <nickpiggin@...oo.com.au>
To:	Federico Cuello <fedux@...men.org.ar>,
	Ralf Hildebrandt <Ralf.Hildebrandt@...rite.de>,
	Artem Bityutskiy <Artem.Bityutskiy@...ia.com>
Cc:	linux-kernel@...r.kernel.org
Subject: Re: sync-Regression in 2.6.28.2?

On Wednesday 04 February 2009 06:51:16 Federico Cuello wrote:
> Nick Piggin wrote:
> > On Wednesday 28 January 2009 08:09:10 Federico Cuello wrote:
> >> Ralf Hildebrandt escribió:
> >>> I recently installed 2.6.28.2 on our postfix/dovecot-based
> >>> mailboxserver. Previously, 2.6.28 and 2.6.28.1 have been running there
> >>> without a hitch.
> >>>
> >>> Now with 2.6.28.2S I had two major lockups: All writes to the users'
> >>> Maildirs (on ext4) would stall, the load would rise, "sync" would never
> >>> return.
> >>>
> >>> I had to "reboot -f -n" to get the machine back. All hanging processes
> >>> were unkillable, even with kill -9.
> >>> [...]
> >>
> >> The same is happening to me, but I have some logs taken with sysrq.
[...]
> >>  0  2  99028  46536  26368 1569400    0    0     4     0 1407  387  0
> >> 0  0 100
> >>
> >> Notice the 100% iowait.
> >>
> >> I also managed to reproduce it doing a rsync from one partition to a USB
> >> drive. After the lockup I can't read any file from the source partition,
> >> but the other partitions can be accessed normally.
> >
> > Hm, thanks for reporting, can you guys get a sysrq+W trace when the
> > system reaches this state?
>
> Here it is, if you need something else please ask:

Thanks, could you reply-to-all when replying to retain ccs please?

Common theme is ext4, which uses no_nrwrite_index_update, and I introduced
a bug in there which could possibly cause ext4 to go into a loop...

Would it be possible if you can test the following patch?

Thanks,
Nick

---

Commit 05fe478dd04e02fa230c305ab9b5616669821dd3 introduced some
@wbc->nr_to_write breakage. Here is the change from the commit:

>--- a/mm/page-writeback.c
>+++ b/mm/page-writeback.c
>@@ -963,8 +963,10 @@ retry:
>                               }
>                       }
>
>                       if (--nr_to_write <= 0)
>                               done = 1;
>                       if (wbc->sync_mode == WB_SYNC_NONE) {
>                               if (--wbc->nr_to_write <= 0)
>                                       done = 1;
>                       }
>                       if (wbc->nonblocking && bdi_write_congested(bdi)) {
>                               wbc->encountered_congestion = 1;
>                               done = 1
>                       }

It makes the following changes:
1. Decrement wbc->nr_to_write instead of nr_to_write
2. Decrement wbc->nr_to_write _only_ if wbc->sync_mode == WB_SYNC_NONE
3. If synced nr_to_write pages, stop only if if wbc->sync_mode ==
WB_SYNC_NONE, otherwise keep going.

However, according to the commit message, the intention was to
only make change 3. Change 1 is a bug. Change 2 does not seem to be
necessary, and it breaks UBIFS expectations, so if needed, it
should be done separately later. And change 2 does not seem to
be documented in the commit message.

This patch does the following:
1. Undo changes 1 and 2
2. Add a comment explaining change 3 (it very useful to have comments in
_code_, not only in the commit).

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@...ia.com>
Acked-by: Nick Piggin <npiggin@...e.de>
Cc: Andrew Morton <akpm@...ux-foundation.org>
---
 mm/page-writeback.c |   21 +++++++++++++++------
 1 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index b493db7..dc32dae 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1051,13 +1051,22 @@ continue_unlock:
 				}
  			}
 
-			if (wbc->sync_mode == WB_SYNC_NONE) {
-				wbc->nr_to_write--;
-				if (wbc->nr_to_write <= 0) {
-					done = 1;
-					break;
-				}
+			if (nr_to_write > 0)
+				nr_to_write--;
+			else if (wbc->sync_mode == WB_SYNC_NONE) {
+				/*
+				 * We stop writing back only if we are not
+				 * doing integrity sync. In case of integrity
+				 * sync we have to keep going because someone
+				 * may be concurrently dirtying pages, and we
+				 * might have synced a lot of newly appeared
+				 * dirty pages, but have not synced all of the
+				 * old dirty pages.
+				 */
+				done = 1;
+				break;
 			}
+
 			if (wbc->nonblocking && bdi_write_congested(bdi)) {
 				wbc->encountered_congestion = 1;
 				done = 1;
.