linux-ext4 - Re: [PATCH] ext4: add barrier info if journal device write cache is not enabled

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <91aff807-ecde-b37f-444c-010276fd09f7@huawei.com>
Date:   Tue, 29 Nov 2022 14:16:47 +0800
From:   Zhang Yi <yi.zhang@...wei.com>
To:     Jan Kara <jack@...e.cz>
CC:     <linux-ext4@...r.kernel.org>, <tytso@....edu>,
        <adilger.kernel@...ger.ca>, <yukuai3@...wei.com>
Subject: Re: [PATCH] ext4: add barrier info if journal device write cache is
 not enabled

On 2022/11/28 23:15, Jan Kara wrote:
> On Mon 28-11-22 21:01:07, Zhang Yi wrote:
>> On 2022/11/28 18:11, Jan Kara wrote:
>>> On Thu 24-11-22 21:57:44, Zhang Yi wrote:
>>>> The block layer will check and suppress flush bio if the device write
>>>> cache is not enabled, so the journal barrier will not go into effect
>>>> even if uer specify 'barrier=1' mount option. It's dangerous if the
>>>> write cache state is false negative, and we cannot distinguish such
>>>> case easily. So just give an info and an inquire interface to let
>>>> sysadmin know the barrier is suppressed for the case of write cache is
>>>> not enabled.
>>>>
>>>> Signed-off-by: Zhang Yi <yi.zhang@...wei.com>
>>>
>>> Hum, so have you seen a situation when write cache information is incorrect
>>> in the block layer? Does it happen often enough that it warrants extra
>>> sysfs file?
>>>
>>
>> Thanks for response. Yes, It often happens on some SCSI devices with RAID
>> card, the disks below the RAID card enabled write cache, but the RAID driver
>> declare the write cache was disabled when probing, and the RAID card seems
>> cannot guarantee data writing back to disk medium on power failure. So the
>> ext4 filesystem will probably be corrupted at the next startup. It's
>> difficult to distinguish it's a hardware or an software problem.
>> I am not familiar with the RAID card. So I don't know why the cache state
>> is incorrect (maybe incorrect configured or firmware bug).
> 
> OK, thanks for info. I believe usually you're expected to disable write
> cache on the disks themselves and leave caching to the RAID card. But I'm
> not an expert here and it's a bit besides the point anyway ;)
> 
>>> After all you should be able to query what the block layer thinks about the
>>> write cache - you definitely can for SCSI devices, I'm not sure about
>>> others. So you can have a look there. Providing this info in the filesystem
>>> seems like doing it in the wrong layer - I don't see anything jbd2/ext4
>>> specific here...
>>>
>>
>> Yes, the best way is to figure out the RAID card problem.
>> This patch is not to aim to fix something in ext4. The reason why I want to add
>> this in ext4 is just give a hint from the fs barrier's point of view, it show the
>> barrier's running state at mount time, could help us to delimit the cache problem
>> more easily when we found ext4 corruption after power failure. Before this patch,
>> we could do that through SCSI probing info and /sys/block/sda/queue/write_cache
>> (maybe some others?), it's not quite clear.
>>
>>   [    2.520176] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
>>
>>   [root@...alhost ~]# cat /sys/block/sda/queue/write_cache
>>   write back
> 
> Yes. /sys/block/<device>/queue/write_cache is what you should query to find
> whether barriers will be ignored or not. My point is - you need this for
> ext4, now if you start using XFS filesystem you'd need similar patch for
> XFS and then if you transition to btrfs you'd need this for btrfs as well
> and all this duplication is there because you are querying through the
> filesystem a property of the underlying block device. So why not ask the
> block device directly?
> 
> I understand it may be more *convenient* to grab the information from the
> filesystem given the infrastructure you have for gathering filesystem
> information. But carrying around various sysfs files has its cost as well.
> 
OK, it's fine, let's keep querying the block layer.

Thanks,
Yi.