[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <523cad2b-5319-6aa9-65e2-80e91a0bb050@fastmail.fm>
Date: Fri, 1 Sep 2023 19:56:56 +0200
From: Bernd Schubert <aakef@...tmail.fm>
To: Lei Huang <lei.huang@...ux.intel.com>,
Bernd Schubert <bernd.schubert@...tmail.fm>,
linux-kernel@...r.kernel.org
Cc: miklos@...redi.hu, linux-fsdevel@...r.kernel.org,
David Howells <dhowells@...hat.com>
Subject: Re: [PATCH v1] fs/fuse: Fix missing FOLL_PIN for direct-io
Hi Lei,
On 8/30/23 03:03, Lei Huang wrote:
> Hi Bernd,
>
> Thank you very much for your reply!
>
> > Hmm, iov_iter_extract_pages does not exists for a long time and the code
> > in fuse_get_user_pages didn't change much. So if you are right, there
> > would be a long term data corruption for page migrations? And a back
> > port to old kernels would not be obvious?
>
> Right. The issue has been reproduced under various versions of kernels,
> ranging from 3.10.0 to 6.3.6 in my tests. It would be different to make
> a patch under older kernels like 3.10.0. One way I tested, one can query
> the physical pages associated with read buffer after data is ready
> (right before writing the data into read buffer). This seems resolving
> the issue in my tests.
>
>
> > What confuses me further is that
> > commit 85dd2c8ff368 does not mention migration or corruption, although
> > lists several other advantages for iov_iter_extract_pages. Other commits
> > using iov_iter_extract_pages point to fork - i.e. would your data
> > corruption be possibly related that?
>
> As I mentioned above, the issue seems resolved if we query the physical
> pages as late as right before writing data into read buffer. I think the
> root cause is page migration.
>
out of interest, what is your exact reproducer and how much time does i
take? I'm just trying passthrough_hp(*) and ql-fstest (**) and don't get
and issue after about 1h run time. I let it continue over the weekend.
The system is an older dual socket xeon.
(*) with slight modification for passthrough_hp to disable O_DIRECT for
the underlying file system. It is running on xfs on an nvme.
(**) https://github.com/bsbernd/ql-fstest
Pinning the pages is certainly a good idea, I would just like to
understand how severe the issue is. And would like to test
backports/different patch on older kernels.
Thanks,
Bernd
Powered by blists - more mailing lists