lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230614203740.GE51259@mit.edu>
Date:   Wed, 14 Jun 2023 16:37:40 -0400
From:   "Theodore Ts'o" <tytso@....edu>
To:     Zhang Yi <yi.zhang@...weicloud.com>
Cc:     Zhihao Cheng <chengzhihao1@...wei.com>, linux-ext4@...r.kernel.org,
        adilger.kernel@...ger.ca, jack@...e.cz, yi.zhang@...wei.com,
        yukuai3@...wei.com
Subject: Re: [PATCH v3 4/6] jbd2: Fix wrongly judgement for buffer head
 removing while doing checkpoint

On Wed, Jun 14, 2023 at 09:25:28PM +0800, Zhang Yi wrote:
> 
> Sorry about the regression, I found that this issue is not introduced
> by the first patch in this patch series ("jbd2: recheck chechpointing
> non-dirty buffer"), is d9eafe0afafa ("jbd2: factor out journal
> initialization from journal_get_superblock()") [1] on your dev branch.
> 
> The problem is the journal super block had been failed to write out
> due to IO fault, it's uptodate bit was cleared by
> end_buffer_write_syn() and didn't reset yet in jbd2_write_superblock().
> And it raced by jbd2_journal_revoke()->jbd2_journal_set_features()->
> jbd2_journal_check_used_features()->journal_get_superblock()->bh_read(),
> unfortunately, the read IO is also fail, so the error handling in
> journal_fail_superblock() clear the journal->j_sb_buffer, finally lead
> to above NULL pointer dereference issue.

Thanks for looking into this.  What I believe you are saying is that
the root cause is that earlier patch, but it is still something about
the patch "jbd2: recheck chechpointing non-dirty buffer" which is
changing the timing enough that we're hitting this buffer (because
without the "recheck checkpointing" patch, I'm not seeing the NULL
pointer dereference.

As far as the e2fsck bug that was causing it to hang in the ext4/adv
test scenario, the patch was a simple one, and I have also checked in
a test case which was a reliable reproducer of the problem.  (See
attached for the patches and more detail.)

It is really interesting that "recheck checkpointing" patch is making
enough of a difference that it is unmasking these bugs.  If you could
take a look at these changes and perhaps think about how this patch
series could be changing the nature of the corruption (specifically,
how symlink inodes referenced from inline directories could be
corupted with "rechecking checkpointing", thus unmasking the
e2fsprogs, I'd really appreciate it.

Thanks,

					- Ted


View attachment "0001-e2fsck-fix-handling-of-a-invalid-symlink-in-an-inlin.patch" of type "text/x-diff" (1609 bytes)

View attachment "0002-tests-add-test-for-handling-an-invalid-symlink-in-an.patch" of type "text/x-diff" (3924 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ