linux-kernel - Re: [PATCH v1] fs/fuse: Fix missing FOLL

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <523cad2b-5319-6aa9-65e2-80e91a0bb050@fastmail.fm>
Date:   Fri, 1 Sep 2023 19:56:56 +0200
From:   Bernd Schubert <aakef@...tmail.fm>
To:     Lei Huang <lei.huang@...ux.intel.com>,
        Bernd Schubert <bernd.schubert@...tmail.fm>,
        linux-kernel@...r.kernel.org
Cc:     miklos@...redi.hu, linux-fsdevel@...r.kernel.org,
        David Howells <dhowells@...hat.com>
Subject: Re: [PATCH v1] fs/fuse: Fix missing FOLL_PIN for direct-io

Hi Lei,

On 8/30/23 03:03, Lei Huang wrote:
> Hi Bernd,
> 
> Thank you very much for your reply!
> 
>  > Hmm, iov_iter_extract_pages does not exists for a long time and the code
>  > in fuse_get_user_pages didn't change much. So if you are right, there
>  > would be a long term data corruption for page migrations? And a back
>  > port to old kernels would not be obvious?
> 
> Right. The issue has been reproduced under various versions of kernels, 
> ranging from 3.10.0 to 6.3.6 in my tests. It would be different to make 
> a patch under older kernels like 3.10.0. One way I tested, one can query
> the physical pages associated with read buffer after data is ready 
> (right before writing the data into read buffer). This seems resolving 
> the issue in my tests.
> 
> 
>  > What confuses me further is that
>  > commit 85dd2c8ff368 does not mention migration or corruption, although
>  > lists several other advantages for iov_iter_extract_pages. Other commits
>  > using iov_iter_extract_pages point to fork - i.e. would your data
>  > corruption be possibly related that?
> 
> As I mentioned above, the issue seems resolved if we query the physical 
> pages as late as right before writing data into read buffer. I think the 
> root cause is page migration.
> 

out of interest, what is your exact reproducer and how much time does i 
take? I'm just trying passthrough_hp(*) and ql-fstest (**) and don't get 
and issue after about 1h run time. I let it continue over the weekend. 
The system is an older dual socket xeon.

(*) with slight modification for passthrough_hp to disable O_DIRECT for 
the underlying file system. It is running on xfs on an nvme.

(**) https://github.com/bsbernd/ql-fstest


Pinning the pages is certainly a good idea, I would just like to 
understand how severe the issue is. And would like to test 
backports/different patch on older kernels.


Thanks,
Bernd