lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 7 Jul 2021 19:52:41 -0400
From:   "Theodore Ts'o" <tytso@....edu>
To:     Ivan Zahariev <famzah@...zah.net>
Cc:     linux-ext4@...r.kernel.org
Subject: Re: jbd2: fix deadlock while checkpoint thread waits commit thread
 to finish (backport to 4.14)

On Wed, Jul 07, 2021 at 09:42:25PM +0300, Ivan Zahariev wrote:
> Hello,
> 
> We're running Linux kernel 4.14.x and our systems occasionally suffer a bug
> which is already fixed: https://github.com/torvalds/linux/commit/53cf978457325d8fb2cdecd7981b31a8229e446e
> 
> This bugfix hasn't been ported to Linux kernels 4.14 or 4.19. The patch
> applies cleanly. The two files "fs/jbd2/checkpoint.c" and
> "fs/jbd2/journal.c" seem pretty identical in the affected sections compared
> to kernel 5.4 where we have this bugfix already applied.
> 
> Is it on purpose that this bugfix hasn't been ported to 4.14? Is it safe
> that we backport it manually in our kernel 4.14 builds? Or is the "ext4"
> system in 4.14 and 5.4 fundamentally different and this would lead to data
> loss or other problems?

The commit was over two years ago, so my memory is not going to be
perfect.  However, Jan had made a comment suggesting the approach in
this commit because it should be easier to backport into older stble
kernels[1].

   "Since proper locking change is going to be a bit more involved, can you
    perhaps fix this deadlock by just dropping j_checkpoint_mutex in
    log_do_checkpoint() when we are going to wait for transaction commit. I've
    checked and that should be fine and that is going to be much easier change
    to backport into stable kernels..."

[1] https://marc.info/?l=linux-ext4&m=154212553014669&w=2

So I suspect it was just that I failed to remember to add a "Cc:
stable@...nel.org" and so it was never automatically backported into
4.14 or 4.19.

Do you have a reliable reproduction which is triggering the deadlock
on your kernels?  If so, have you tried applying the patch and does it
make the problem go away for you?

Cheers,

						- Ted

Powered by blists - more mailing lists