linux-kernel - Re: [PATCH] f2fs: f2fs supports uncached buffered I/O

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <06b9d287-816c-4347-945b-8fda83a6f557@vivo.com>
Date: Fri, 25 Jul 2025 09:44:40 +0800
From: hanqi <hanqi@...o.com>
To: Chao Yu <chao@...nel.org>, Jens Axboe <axboe@...nel.dk>,
 jaegeuk@...nel.org
Cc: linux-f2fs-devel@...ts.sourceforge.net, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] f2fs: f2fs supports uncached buffered I/O



在 2025/7/24 21:09, Chao Yu 写道:
> On 2025/7/16 16:27, hanqi wrote:
>>
>>
>> 在 2025/7/16 11:43, Jens Axboe 写道:
>>> On 7/15/25 9:34 PM, hanqi wrote:
>>>>
>>>> ? 2025/7/15 22:28, Jens Axboe ??:
>>>>> On 7/14/25 9:10 PM, Qi Han wrote:
>>>>>> Jens has already completed the development of uncached buffered I/O
>>>>>> in git [1], and in f2fs, the feature can be enabled simply by 
>>>>>> setting
>>>>>> the FOP_DONTCACHE flag in f2fs_file_operations.
>>>>> You need to ensure that for any DONTCACHE IO that the completion is
>>>>> routed via non-irq context, if applicable. I didn't verify that 
>>>>> this is
>>>>> the case for f2fs. Generally you can deduce this as well through
>>>>> testing, I'd say the following cases would be interesting to test:
>>>>>
>>>>> 1) Normal DONTCACHE buffered read
>>>>> 2) Overwrite DONTCACHE buffered write
>>>>> 3) Append DONTCACHE buffered write
>>>>>
>>>>> Test those with DEBUG_ATOMIC_SLEEP set in your config, and it that
>>>>> doesn't complain, that's a great start.
>>>>>
>>>>> For the above test cases as well, verify that page cache doesn't 
>>>>> grow as
>>>>> IO is performed. A bit is fine for things like meta data, but 
>>>>> generally
>>>>> you want to see it remain basically flat in terms of page cache 
>>>>> usage.
>>>>>
>>>>> Maybe this is all fine, like I said I didn't verify. Just 
>>>>> mentioning it
>>>>> for completeness sake.
>>>> Hi, Jens
>>>> Thanks for your suggestion. As I mentioned earlier in [1], in f2fs,
>>>> the regular buffered write path invokes folio_end_writeback from a
>>>> softirq context. Therefore, it seems that f2fs may not be suitable
>>>> for DONTCACHE I/O writes.
>>>>
>>>> I?d like to ask a question: why is DONTCACHE I/O write restricted to
>>>> non-interrupt context only? Is it because dropping the page might be
>>>> too time-consuming to be done safely in interrupt context? This might
>>>> be a naive question, but I?d really appreciate your clarification.
>>>> Thanks in advance.
>>> Because (as of right now, at least) the code doing the invalidation
>>> needs process context. There are various reasons for this, which you'll
>>> see if you follow the path off folio_end_writeback() ->
>>> filemap_end_dropbehind_write() -> filemap_end_dropbehind() ->
>>> folio_unmap_invalidate(). unmap_mapping_folio() is one case, and while
>>> that may be doable, the inode i_lock is not IRQ safe.
>>>
>>> Most file systems have a need to punt some writeback completions to
>>> non-irq context, eg for file extending etc. Hence for most file 
>>> systems,
>>> the dontcache case just becomes another case that needs to go through
>>> that path.
>>>
>>> It'd certainly be possible to improve upon this, for example by having
>>> an opportunistic dontcache unmap from IRQ/soft-irq context, and then
>>> punting to a workqueue if that doesn't pan out. But this doesn't exist
>>> as of yet, hence the need for the workqueue punt.
>
> Thanks Jens for the detailed explanation.
>
>>
>> Hi, Jens
>> Thank you for your response. I tested uncached buffer I/O reads with
>> a 50GB dataset on a local F2FS filesystem, and the page cache size
>> only increased slightly, which I believe aligns with expectations.
>> After clearing the page cache, the page cache size returned to its
>> initial state. The test results are as follows:
>>
>> stat 50G.txt
>>     File: 50G.txt
>>     Size: 53687091200      Blocks: 104960712       IO Blocks: 512  
>> regular file
>>
>> [read before]:
>> echo 3 > /proc/sys/vm/drop_caches
>> 01:48:17        kbmemfree kbavail     kbmemused  %memused 
>> kbbuffers kbcached   kbcommit     %commit   kbactive kbinact     kbdirty
>> 01:50:59      6404648   8149508   2719384   23.40     512 1898092   
>> 199384760    823.75   1846756    466832     44
>>
>> ./uncached_io_test 8192 1 1 50G.txt
>> Starting 1 threads
>> reading bs 8192, uncached 1
>>     1s: 754MB/sec, MB=754
>>     ...
>>    64s: 844MB/sec, MB=262144
>>
>> [read after]:
>> 01:52:33      6326664   8121240   2747968    23.65      728 1947656   
>> 199384788    823.75   1887896    502004     68
>> echo 3 > /proc/sys/vm/drop_caches
>> 01:53:11      6351136   8096936   2772400   23.86     512 1900500   
>> 199385216    823.75   1847252    533768      104
>>
>> Hi Chao,
>> Given that F2FS currently calls folio_end_writeback in the softirq
>> context for normal write scenarios, could we first support uncached
>> buffer I/O reads? For normal uncached buffer I/O writes, would it be
>> feasible for F2FS to introduce an asynchronous workqueue to handle the
>> page drop operation in the future? What are your thoughts on this?
>
> Qi,
>
> Sorry for the delay.
>
> I think it will be good to support uncached buffered I/O in read path
> first, and then let's take a look what we can do for write path, anyway,
> let's do this step by step.
>
> Can you please update the patch?
> - support read path only
> - include test data in commit message
Chao

I will re-submit a patch to first enable F2FS support for uncached
buffer I/O reads. Following that, I will work on implementing
asynchronous page dropping in F2FS.

Thank you!
>
>> Thank you!
>>
>>