linux-ext4 - Re: [fstests generic/388, 455, 475, 482 ...] Ext4 journal recovery test fails

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZPjYTDB6x83BIJMc@casper.infradead.org>
Date:   Wed, 6 Sep 2023 20:51:40 +0100
From:   Matthew Wilcox <willy@...radead.org>
To:     Ritesh Harjani <ritesh.list@...il.com>
Cc:     Theodore Ts'o <tytso@....edu>, Zorro Lang <zlang@...nel.org>,
        linux-ext4@...r.kernel.org, fstests@...r.kernel.org,
        regressions@...ts.linux.dev,
        Andrew Morton <akpm@...ux-foundation.org>,
        Jan Kara <jack@...e.cz>
Subject: Re: [fstests generic/388, 455, 475, 482 ...] Ext4 journal recovery
 test fails

On Wed, Sep 06, 2023 at 01:38:23PM +0100, Matthew Wilcox wrote:
> > Is this code path a possibility, which can cause above logs?
> > 
> >    ptr = jbd2_alloc() -> kmem_cache_alloc()
> >    <..>
> >    new_folio = virt_to_folio(ptr)
> >    new_offset = offset_in_folio(new_folio, ptr)
> > 
> > And then I am still not sure what the problem really is? 
> > Is it because at the time of checkpointing, the path is still not fully
> > converted to folio?
> 
> Oh yikes!  I didn't know that the allocation might come from kmalloc!
> Yes, slab might use high-order allocations.  I'll have to look through
> this and figure out what the problem might be.

I think the probable cause is bh_offset().  Before these patches, if
we allocated a buffer at offset 9kB into an order-2 slab, we'd fill in
b_page with the third page of the slab and calculate bh_offset as 1kB.
With these patches, we set b_page to the first page of the slab, and
bh_offset still comes back as 1kB so we read from / write to entirely
the wrong place.

With this redefinition of bh_offset(), we calculate the offset relative
to the base page if it's a tail page, and relative to the folio if it's
a folio.  Works out nicely ;-)

I have three other things I'm trying to debug right now, so this isn't
tested, but if you have time you might want to give it a run.

diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index 6cb3e9af78c9..dc8fcdc40e95 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -173,7 +173,10 @@ static __always_inline int buffer_uptodate(const struct buffer_head *bh)
 	return test_bit_acquire(BH_Uptodate, &bh->b_state);
 }
 
-#define bh_offset(bh)		((unsigned long)(bh)->b_data & ~PAGE_MASK)
+static inline unsigned long bh_offset(struct buffer_head *bh)
+{
+	return (unsigned long)(bh)->b_data & (page_size(bh->b_page) - 1);
+}
 
 /* If we *know* page->private refers to buffer_heads */
 #define page_buffers(page)					\