[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+55aFzHMTdqefmtxcKjiGDC=HsKiimfuDQxZZ-hbu-6f1nL3g@mail.gmail.com>
Date: Sat, 25 Nov 2017 12:55:18 -1000
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: David Howells <dhowells@...hat.com>
Cc: linux-afs@...r.kernel.org,
linux-fsdevel <linux-fsdevel@...r.kernel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [GIT PULL] afs: Fixes
On Sat, Nov 25, 2017 at 12:35 PM, David Howells <dhowells@...hat.com> wrote:
>
> Doesn't clear_page_dirty_for_io() write-protect the PTE for the page to be
> written out, in which case ->page_mkwrite() will get called again before the
> page is redirtied?
No, it literally just sets the dirty bit (and does accounting).
But I think you may be right that we always write-protect he page when
we move the dirty bit from the page tables to the 'struct page'
(page_mkclean_one()).
However, even when you do that, the page can be writable in other
mappings. At least fork(), for example, only clears the dirty bit,
doesn't mark it write-protected.
So there is some rate-limiting of dirty pages, but I do not believe
that we've ever really *serialized* writes.
>> (b) can cause some really nasty latency issues
>
> True, but I think the most common case is a file being opened, written start
> to finish and then closed. Actually, the worst-handled thing I've seen is a
> shell script appending a bunch of things to a file because ->flush() syncs the
> file each time it is closed:-/
>
> What would you recommend instead? I'm currently trying and keep track of what
> needs to be written so that I only write what's changed to the server, rather
> than writing only whole pages.
I don't think that what you are doing is necessarily wrong, I'm just
pointing out that you may still see mmap'ed pages being modified
concurrently with the actual IO, and that this will potentially mean
(for example) that things like checksums won't be reliably unless you
do the checksum as you copy the data to a network packet or something.
Of course, if that kind of inconsistency happens, a later write-back
will also happen, and eventually fix it. So the server may see
temporarily 'wrong' data, but it won't be final.
I just hope that the inconsistency isn't fatal to the afs client or
server code. For example, if you retry writes forever when a checksum
were to not match the data, that would be bad.
And yes, this can be
(a) really hard to trigger in practice
(b) very hard to debug due to a combination of very specific timing
and behavior.
so I just wanted to bring this up as a potential issue, not
necessarily as a big problem.
Linus
Powered by blists - more mailing lists