linux-kernel - Re: sync-Regression in 2.6.28.2?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 10 Feb 2009 00:45:42 +1100
From:	Nick Piggin <nickpiggin@...oo.com.au>
To:	Federico Cuello <fedux@...men.org.ar>
Cc:	Ralf Hildebrandt <Ralf.Hildebrandt@...rite.de>,
	Artem Bityutskiy <Artem.Bityutskiy@...ia.com>,
	linux-kernel@...r.kernel.org
Subject: Re: sync-Regression in 2.6.28.2?

On Thursday 05 February 2009 22:54:16 Federico Cuello wrote:
> Nick Piggin wrote:
> > On Thursday 05 February 2009 04:31:00 Federico Cuello wrote:
> >> Nick Piggin wrote:
> >>> [...]
> >>> Thanks, could you reply-to-all when replying to retain ccs please?
> >>>
> >>> Common theme is ext4, which uses no_nrwrite_index_update, and I
> >>> introduced a bug in there which could possibly cause ext4 to go into a
> >>> loop...
> >>>
> >>> Would it be possible if you can test the following patch?
> >>
> >> I'll test it as soon as I get home.
> >
> > Thanks.
> >
> >> Meanwhile, I think the new patch may be slightly wrong. If I understand
> >> correctly PageWriteback(page) is called before nr_to_write is tested for
> >> being > 0 and then decremented if true, but "done" is  not set to 1
> >> until the next iteration. So another call to PageWriteback(page) while
> >> take place and then "done" will be set to true (if wbc->sync_mode ==
> >> WB_SYNC_NONE).
> >>
> >> If nr_to_write == 1 at the beginning of the loop then two pages will be
> >> written.
> >>
> >> I think the test condition should something like:
> >>
> >> if (--nr_to_write <= 0 && wbc->sync_mode == WB_SYNC_NONE) {
> >>     done = 1;
> >>     break;
> >> }
> >
> > I think you're quite right. Good catch. We probably want to prevent
> > nr_to_write from going -ve, though.
> >
> > I think something like this
> >
> >   if (nr_to_write > 0)
> >     nr_to_write--;
> >   if (!nr_to_write && wbc->sync_mode == WB_SYNC_NONE) {
> >     ...
> >
> > Would you care to send a patch?
>
> Ok, after extensive testing I haven't been able to solve the problem.
>
> I'm posting below the patch I used. I tried 3 different patches with one
> successful test run with the one you sent me. I don't know if it was
> just a coincidence as I had no time to test it again.
>
> Now, with the patch below, it stalls with 50% IO-wait (dual core, one
> core at 100%). Perhaps the patch is part of the solution, I don't know.
>
> I also have the sysrq-W logs and I'm also posting them below.
>
> Thanks,
>
>
>
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 08d2b96..9e2ae50 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -981,13 +981,23 @@ continue_unlock:
>                                 }
>                         }
>
> -                       if (wbc->sync_mode == WB_SYNC_NONE) {
> -                               wbc->nr_to_write--;
> -                               if (wbc->nr_to_write <= 0) {
> +                       if (nr_to_write > 0) {
> +                               nr_to_write--;
> +                               if (nr_to_write == 0 && wbc->sync_mode
> == WB_SYNC_NONE) {
> +                                       /*
> +                                        * We stop writing back only if
> we are not
> +                                        * doing integrity sync. In case
> of integrity
> +                                        * sync we have to keep going
> because someone
> +                                        * may be concurrently dirtying
> pages, and we
> +                                        * might have synced a lot of
> newly appeared
> +                                        * dirty pages, but have not
> synced all of the
> +                                        * old dirty pages.
> +                                        */
>                                         done = 1;
>                                         break;
>                                 }
>                         }
> +
>                         if (wbc->nonblocking && bdi_write_congested(bdi)) {
>                                 wbc->encountered_congestion = 1;
>                                 done = 1;

This patch seems good to me. If you would care to add a changelog and
Signed-off-by: line, then we could get it merged?

I am not too sure about this bug. I have reproduced a strange hang with
ext4 (which does include sys_sync and write_cache_pages traces), and
also turned up a lockdep report. Also, we haven't seen any reports of
this problem on other filesystems. So it could be an ext4 bug.

Your traces also have lots of tasks hung waiting for page lock. It is
possible that wakeups get lost, which is fixed by this commit in
mainline
777c6c5f1f6e757ae49ecca2ed72d6b1f523c007

Which might also be your bug.


Any chance you can test this patch (as well as the existing patches
you are using to fix write_cache_pages?).

Thanks,
Nick

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/