[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <897b426d-7ca7-4bfe-8342-d2af910f8202@roeck-us.net>
Date: Thu, 16 Jan 2025 08:05:59 -0800
From: Guenter Roeck <linux@...ck-us.net>
To: Jan Kara <jack@...e.cz>
Cc: Jim Zhao <jimzhao.ai@...il.com>, akpm@...ux-foundation.org,
willy@...radead.org, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH] mm/page-writeback: Consolidate wb_thresh bumping logic
into __wb_calc_thresh
On Thu, Jan 16, 2025 at 03:56:57PM +0100, Jan Kara wrote:
...
> >
> > Interesting. Is there some endianness issue, by any chance ? I only see the problem
> > with sheb (big endian), not with sh (little endian). I'd suspect that it is an
> > emulation bug, but it is odd that the problem did not show up before.
>
> So far I don't have a good explanation. Let me write down here the facts,
> maybe it will trigger the aha effect.
>
> 1) Ext2 stores the metadata in little endian ordering. We observe the
> problem with the first directory entry in the folio. Both entry->rec_len
> (16-bit) and entry->inode (32-bit) appear to be seen in wrong endianity
>
> 2) The function that fails is ext2_check_folio(). We kmap_local() the folio
> in ext2_get_folio(), then in ext2_check_folio() we do:
>
> ext2_dirent *p;
>
> p = (ext2_dirent *)(kaddr + 0);
> rec_len = ext2_rec_len_from_disk(p->rec_len);
> ^^^ value 3072 == 0x0c00 seen here instead of correct 0x000c
> this value is invalid so we go to:
> ext2_error(sb, __func__, "bad entry in directory #%lu: : %s - "
> "offset=%llu, inode=%lu, rec_len=%d, name_len=%d",
> dir->i_ino, error, folio_pos(folio) + offs,
> (unsigned long) le32_to_cpu(p->inode),
> rec_len, p->name_len);
>
> Here rec_len is printed so we see the wrong value. Also
> le32_to_cpu(p->inode) is printed which also shows number with swapped byte
> ordering (the message contains inode number 27393 == 0x00006b01 but the
> proper inode number is 363 == 0x0000016b). This actually releals more about
> the problem because only the two bytes were swapped in the inode number
> although we always treat it as 32-bit entity. So this would indeed point
> more at some architectural issue rather than a problem in the filesystem /
> MM.
>
> Note that to get at this point in the boot we must have correctly
> byteswapped many other directory entries in the filesystem. So the problem
> must be triggered by some parallel activity happening in the system or
> something like that.
>
> 3) The problem appears only with MTD storage, not with IDE/SATA on the same
> system + filesystem image. It it unclear how the storage influences the
> reproduction, rather than that it influences timing of events in the
> system.
>
> 4) The problem reliably happens with "mm/page-writeback: Consolidate wb_thresh
> bumping logic into __wb_calc_thresh", not without it. All this patch does
> is that it possibly changes a limit at which processes dirtying pages in
> the page cache get throttled. Thus there are fairly limited opportunities
> for how it can cause damage (I've checked for possible UAF issues or memory
> corruption but I don't really see any such possibility there, it is just
> crunching numbers from the mm counters and takes decision based on the
> result). This change doesn't have direct on the directory ext2 code. The only
> thing it does is that it possibly changes code alignment of ext2 code if it
> gets linked afterwards into vmlinux image (provided ext2 is built in). Another
> possibility is again that it changes timing of events in the system due to
> differences in throttling of processes dirtying page cache.
>
> So at this point I don't have a better explanation than blame the HW. What
> really tipped my conviction in this direction is the 16-bit byteswap in a
> 32-bit entity. Hence I guess I'll ask Andrew to put Jim's patch back into
> tree if you don't object.
I agree that this is most likely an emulation problem (it would not be the
first one), so please go ahead.
Thanks,
Guenter
Powered by blists - more mailing lists