linux-kernel - Re: [PATCH] f2fs: f2fs supports uncached buffered I/O

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4366bf0f-64a1-44ae-8f81-301af2d179d8@vivo.com>
Date: Wed, 16 Jul 2025 16:27:49 +0800
From: hanqi <hanqi@...o.com>
To: Jens Axboe <axboe@...nel.dk>, jaegeuk@...nel.org, chao@...nel.org
Cc: linux-f2fs-devel@...ts.sourceforge.net, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] f2fs: f2fs supports uncached buffered I/O



在 2025/7/16 11:43, Jens Axboe 写道:
> On 7/15/25 9:34 PM, hanqi wrote:
>>
>> ? 2025/7/15 22:28, Jens Axboe ??:
>>> On 7/14/25 9:10 PM, Qi Han wrote:
>>>> Jens has already completed the development of uncached buffered I/O
>>>> in git [1], and in f2fs, the feature can be enabled simply by setting
>>>> the FOP_DONTCACHE flag in f2fs_file_operations.
>>> You need to ensure that for any DONTCACHE IO that the completion is
>>> routed via non-irq context, if applicable. I didn't verify that this is
>>> the case for f2fs. Generally you can deduce this as well through
>>> testing, I'd say the following cases would be interesting to test:
>>>
>>> 1) Normal DONTCACHE buffered read
>>> 2) Overwrite DONTCACHE buffered write
>>> 3) Append DONTCACHE buffered write
>>>
>>> Test those with DEBUG_ATOMIC_SLEEP set in your config, and it that
>>> doesn't complain, that's a great start.
>>>
>>> For the above test cases as well, verify that page cache doesn't grow as
>>> IO is performed. A bit is fine for things like meta data, but generally
>>> you want to see it remain basically flat in terms of page cache usage.
>>>
>>> Maybe this is all fine, like I said I didn't verify. Just mentioning it
>>> for completeness sake.
>> Hi, Jens
>> Thanks for your suggestion. As I mentioned earlier in [1], in f2fs,
>> the regular buffered write path invokes folio_end_writeback from a
>> softirq context. Therefore, it seems that f2fs may not be suitable
>> for DONTCACHE I/O writes.
>>
>> I?d like to ask a question: why is DONTCACHE I/O write restricted to
>> non-interrupt context only? Is it because dropping the page might be
>> too time-consuming to be done safely in interrupt context? This might
>> be a naive question, but I?d really appreciate your clarification.
>> Thanks in advance.
> Because (as of right now, at least) the code doing the invalidation
> needs process context. There are various reasons for this, which you'll
> see if you follow the path off folio_end_writeback() ->
> filemap_end_dropbehind_write() -> filemap_end_dropbehind() ->
> folio_unmap_invalidate(). unmap_mapping_folio() is one case, and while
> that may be doable, the inode i_lock is not IRQ safe.
>
> Most file systems have a need to punt some writeback completions to
> non-irq context, eg for file extending etc. Hence for most file systems,
> the dontcache case just becomes another case that needs to go through
> that path.
>
> It'd certainly be possible to improve upon this, for example by having
> an opportunistic dontcache unmap from IRQ/soft-irq context, and then
> punting to a workqueue if that doesn't pan out. But this doesn't exist
> as of yet, hence the need for the workqueue punt.

Hi, Jens
Thank you for your response. I tested uncached buffer I/O reads with
a 50GB dataset on a local F2FS filesystem, and the page cache size
only increased slightly, which I believe aligns with expectations.
After clearing the page cache, the page cache size returned to its
initial state. The test results are as follows:

stat 50G.txt
   File: 50G.txt
   Size: 53687091200      Blocks: 104960712       IO Blocks: 512  regular file

[read before]:
echo 3 > /proc/sys/vm/drop_caches
01:48:17        kbmemfree kbavail     kbmemused  %memused      kbbuffers kbcached   kbcommit     %commit   kbactive    kbinact     kbdirty
01:50:59      6404648   8149508   2719384   23.40     512     1898092   199384760    823.75   1846756    466832     44

./uncached_io_test 8192 1 1 50G.txt
Starting 1 threads
reading bs 8192, uncached 1
   1s: 754MB/sec, MB=754
   ...
  64s: 844MB/sec, MB=262144

[read after]:
01:52:33      6326664   8121240   2747968    23.65      728     1947656   199384788    823.75   1887896    502004     68
echo 3 > /proc/sys/vm/drop_caches
01:53:11      6351136   8096936   2772400   23.86     512     1900500   199385216    823.75   1847252    533768      104

Hi Chao,
Given that F2FS currently calls folio_end_writeback in the softirq
context for normal write scenarios, could we first support uncached
buffer I/O reads? For normal uncached buffer I/O writes, would it be
feasible for F2FS to introduce an asynchronous workqueue to handle the
page drop operation in the future? What are your thoughts on this?
Thank you!