linux-kernel - Re: [RFC] Bypass filesystems for reading cached pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200619201216.GA8681@bombadil.infradead.org>
Date:   Fri, 19 Jun 2020 13:12:16 -0700
From:   Matthew Wilcox <willy@...radead.org>
To:     Chaitanya Kulkarni <Chaitanya.Kulkarni@....com>
Cc:     "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "agruenba@...hat.com" <agruenba@...hat.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [RFC] Bypass filesystems for reading cached pages

On Fri, Jun 19, 2020 at 07:06:19PM +0000, Chaitanya Kulkarni wrote:
> On 6/19/20 8:50 AM, Matthew Wilcox wrote:
> > This patch lifts the IOCB_CACHED idea expressed by Andreas to the VFS.
> > The advantage of this patch is that we can avoid taking any filesystem
> > lock, as long as the pages being accessed are in the cache (and we don't
> > need to readahead any pages into the cache).  We also avoid an indirect
> > function call in these cases.
> 
> I did a testing with NVMeOF target file backend with buffered I/O
> enabled with your patch and setting the IOCB_CACHED for each I/O ored 
> '|' with IOCB_NOWAIT calling call_read_iter_cached() [1].
>
> The name was changed from call_read_iter() -> call_read_iter_cached() [2]).
> 
> For the file system I've used XFS and device was null_blk with memory
> backed so entire file was cached into the DRAM.

Thanks for testing!  Can you elaborate a little more on what the test does?
Are there many threads or tasks?  What is the I/O path?  XFS on an NVMEoF
device, talking over loopback to localhost with nullblk as the server?

The nullblk device will have all the data in its pagecache, but each XFS
file will have an empty pagecache initially.  Then it'll be populated by
the test, so does the I/O pattern revisit previously accessed data at all?

> Following are the performance numbers :-
> 
> IOPS/Bandwidth :-
> 
> default-page-cache:      read:  IOPS=1389k,  BW=5424MiB/s  (5688MB/s)
> default-page-cache:      read:  IOPS=1381k,  BW=5395MiB/s  (5657MB/s)
> default-page-cache:      read:  IOPS=1391k,  BW=5432MiB/s  (5696MB/s)
> iocb-cached-page-cache:  read:  IOPS=1403k,  BW=5481MiB/s  (5747MB/s)
> iocb-cached-page-cache:  read:  IOPS=1393k,  BW=5439MiB/s  (5704MB/s)
> iocb-cached-page-cache:  read:  IOPS=1399k,  BW=5465MiB/s  (5731MB/s)

That doesn't look bad at all ... about 0.7% increase in IOPS.

> Submission lat :-
> 
> default-page-cache:      slat  (usec):  min=2,  max=1076,  avg=  3.71,
> default-page-cache:      slat  (usec):  min=2,  max=489,   avg=  3.72,
> default-page-cache:      slat  (usec):  min=2,  max=1078,  avg=  3.70,
> iocb-cached-page-cache:  slat  (usec):  min=2,  max=1731,  avg=  3.70,
> iocb-cached-page-cache:  slat  (usec):  min=2,  max=2115,  avg=  3.69,
> iocb-cached-page-cache:  slat  (usec):  min=2,  max=3055,  avg=  3.70,

Average latency unchanged, max latency up a little ... makes sense,
since we'll do a little more work in the worst case.

> @@ -264,7 +267,8 @@ static void nvmet_file_execute_rw(struct nvmet_req *req)
> 
>          if (req->ns->buffered_io) {
>                  if (likely(!req->f.mpool_alloc) &&
> -                               nvmet_file_execute_io(req, IOCB_NOWAIT))
> +                               nvmet_file_execute_io(req,
> +                                       IOCB_NOWAIT |IOCB_CACHED))
>                          return;
>                  nvmet_file_submit_buffered_io(req);

You'll need a fallback path here, right?  IOCB_CACHED can get part-way
through doing a request, and then need to be finished off after taking
the mutex.