linux-ext4 - Re: [PATCH] ext4: add barrier info if journal device write cache is not enabled

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20221128151551.fo6ct7nbozlqjvci@quack3>
Date:   Mon, 28 Nov 2022 16:15:51 +0100
From:   Jan Kara <jack@...e.cz>
To:     Zhang Yi <yi.zhang@...wei.com>
Cc:     Jan Kara <jack@...e.cz>, linux-ext4@...r.kernel.org, tytso@....edu,
        adilger.kernel@...ger.ca, yukuai3@...wei.com
Subject: Re: [PATCH] ext4: add barrier info if journal device write cache is
 not enabled

On Mon 28-11-22 21:01:07, Zhang Yi wrote:
> On 2022/11/28 18:11, Jan Kara wrote:
> > On Thu 24-11-22 21:57:44, Zhang Yi wrote:
> >> The block layer will check and suppress flush bio if the device write
> >> cache is not enabled, so the journal barrier will not go into effect
> >> even if uer specify 'barrier=1' mount option. It's dangerous if the
> >> write cache state is false negative, and we cannot distinguish such
> >> case easily. So just give an info and an inquire interface to let
> >> sysadmin know the barrier is suppressed for the case of write cache is
> >> not enabled.
> >>
> >> Signed-off-by: Zhang Yi <yi.zhang@...wei.com>
> > 
> > Hum, so have you seen a situation when write cache information is incorrect
> > in the block layer? Does it happen often enough that it warrants extra
> > sysfs file?
> > 
> 
> Thanks for response. Yes, It often happens on some SCSI devices with RAID
> card, the disks below the RAID card enabled write cache, but the RAID driver
> declare the write cache was disabled when probing, and the RAID card seems
> cannot guarantee data writing back to disk medium on power failure. So the
> ext4 filesystem will probably be corrupted at the next startup. It's
> difficult to distinguish it's a hardware or an software problem.
> I am not familiar with the RAID card. So I don't know why the cache state
> is incorrect (maybe incorrect configured or firmware bug).

OK, thanks for info. I believe usually you're expected to disable write
cache on the disks themselves and leave caching to the RAID card. But I'm
not an expert here and it's a bit besides the point anyway ;)

> > After all you should be able to query what the block layer thinks about the
> > write cache - you definitely can for SCSI devices, I'm not sure about
> > others. So you can have a look there. Providing this info in the filesystem
> > seems like doing it in the wrong layer - I don't see anything jbd2/ext4
> > specific here...
> > 
> 
> Yes, the best way is to figure out the RAID card problem.
> This patch is not to aim to fix something in ext4. The reason why I want to add
> this in ext4 is just give a hint from the fs barrier's point of view, it show the
> barrier's running state at mount time, could help us to delimit the cache problem
> more easily when we found ext4 corruption after power failure. Before this patch,
> we could do that through SCSI probing info and /sys/block/sda/queue/write_cache
> (maybe some others?), it's not quite clear.
> 
>   [    2.520176] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> 
>   [root@...alhost ~]# cat /sys/block/sda/queue/write_cache
>   write back

Yes. /sys/block/<device>/queue/write_cache is what you should query to find
whether barriers will be ignored or not. My point is - you need this for
ext4, now if you start using XFS filesystem you'd need similar patch for
XFS and then if you transition to btrfs you'd need this for btrfs as well
and all this duplication is there because you are querying through the
filesystem a property of the underlying block device. So why not ask the
block device directly?

I understand it may be more *convenient* to grab the information from the
filesystem given the infrastructure you have for gathering filesystem
information. But carrying around various sysfs files has its cost as well.

								Honza
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR