linux-kernel - Re: [PATCH v2] fs-writeback: writeback_sb_inodes：Recalculate 'wrote' according skipped pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <b0167ea3-55ae-5e4e-7022-4105844b0495@kernel.dk>
Date:   Mon, 18 Apr 2022 16:20:20 -0600
From:   Jens Axboe <axboe@...nel.dk>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Zhihao Cheng <chengzhihao1@...wei.com>,
        Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Al Viro <viro@...iv.linux.org.uk>,
        Christoph Hellwig <hch@....de>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        yukuai3@...wei.com
Subject: Re: [PATCH v2] fs-writeback: writeback_sb_inodes：Recalculate 'wrote' according skipped pages

On 4/18/22 4:12 PM, Jens Axboe wrote:
> On 4/18/22 4:01 PM, Linus Torvalds wrote:
>> On Mon, Apr 18, 2022 at 2:16 PM Jens Axboe <axboe@...nel.dk> wrote:
>>>
>>> So as far as I can tell, we really have two options:
>>>
>>> 1) Don't preempt a task that has a plug active
>>> 2) Flush for any schedule out, not just going to sleep
>>>
>>> 1 may not be feasible if we're queueing lots of IO, which then leaves 2.
>>> Linus, do you remember what your original patch here was motivated by?
>>> I'm assuming it was an effiency thing, but do we really have a lot of
>>> cases of IO submissions being preempted a lot and hence making the plug
>>> less efficient than it should be at merging IO? Seems unlikely, but I
>>> could be wrong.
>>
>> No, it goes all the way back to 2011, my memory for those kinds of
>> details doesn't go that far back.
>>
>> That said, it clearly is about preemption, and I wonder if we had an
>> actual bug there.
>>
>> IOW, it might well not just in the "gather up more IO for bigger
>> requests" thing, but about "the IO plug is per-thread and doesn't have
>> locking because of that".
>>
>> So doing plug flushing from a preemptible kernel context might race
>> with it all being set up.
> 
> Hmm yes. But doesn't preemption imply a full barrier? As long as we
> assign the plug at the end, we should be fine. And just now looking that
> up, there's even already a comment to that effect in blk_start_plug().
> So barring any weirdness with that, maybe that's the solution.
> 
> Your comment did jog my memory a bit though, and I do in fact think it
> was something related to that that made is change it. I'll dig through
> some old emails and see if I can find it.

Here's the thread:

https://lore.kernel.org/all/1295659049-2688-6-git-send-email-jaxboe@fusionio.com/

I'll dig through it in a bit, but here's your reasoning for why it
should not flush on preemption:

https://lore.kernel.org/all/BANLkTikBEJa7bJJoLFU7NoiEgOjVHVG08A@mail.gmail.com/

-- 
Jens Axboe