lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200619201216.GA8681@bombadil.infradead.org>
Date:   Fri, 19 Jun 2020 13:12:16 -0700
From:   Matthew Wilcox <willy@...radead.org>
To:     Chaitanya Kulkarni <Chaitanya.Kulkarni@....com>
Cc:     "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "agruenba@...hat.com" <agruenba@...hat.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [RFC] Bypass filesystems for reading cached pages

On Fri, Jun 19, 2020 at 07:06:19PM +0000, Chaitanya Kulkarni wrote:
> On 6/19/20 8:50 AM, Matthew Wilcox wrote:
> > This patch lifts the IOCB_CACHED idea expressed by Andreas to the VFS.
> > The advantage of this patch is that we can avoid taking any filesystem
> > lock, as long as the pages being accessed are in the cache (and we don't
> > need to readahead any pages into the cache).  We also avoid an indirect
> > function call in these cases.
> 
> I did a testing with NVMeOF target file backend with buffered I/O
> enabled with your patch and setting the IOCB_CACHED for each I/O ored 
> '|' with IOCB_NOWAIT calling call_read_iter_cached() [1].
>
> The name was changed from call_read_iter() -> call_read_iter_cached() [2]).
> 
> For the file system I've used XFS and device was null_blk with memory
> backed so entire file was cached into the DRAM.

Thanks for testing!  Can you elaborate a little more on what the test does?
Are there many threads or tasks?  What is the I/O path?  XFS on an NVMEoF
device, talking over loopback to localhost with nullblk as the server?

The nullblk device will have all the data in its pagecache, but each XFS
file will have an empty pagecache initially.  Then it'll be populated by
the test, so does the I/O pattern revisit previously accessed data at all?

> Following are the performance numbers :-
> 
> IOPS/Bandwidth :-
> 
> default-page-cache:      read:  IOPS=1389k,  BW=5424MiB/s  (5688MB/s)
> default-page-cache:      read:  IOPS=1381k,  BW=5395MiB/s  (5657MB/s)
> default-page-cache:      read:  IOPS=1391k,  BW=5432MiB/s  (5696MB/s)
> iocb-cached-page-cache:  read:  IOPS=1403k,  BW=5481MiB/s  (5747MB/s)
> iocb-cached-page-cache:  read:  IOPS=1393k,  BW=5439MiB/s  (5704MB/s)
> iocb-cached-page-cache:  read:  IOPS=1399k,  BW=5465MiB/s  (5731MB/s)

That doesn't look bad at all ... about 0.7% increase in IOPS.

> Submission lat :-
> 
> default-page-cache:      slat  (usec):  min=2,  max=1076,  avg=  3.71,
> default-page-cache:      slat  (usec):  min=2,  max=489,   avg=  3.72,
> default-page-cache:      slat  (usec):  min=2,  max=1078,  avg=  3.70,
> iocb-cached-page-cache:  slat  (usec):  min=2,  max=1731,  avg=  3.70,
> iocb-cached-page-cache:  slat  (usec):  min=2,  max=2115,  avg=  3.69,
> iocb-cached-page-cache:  slat  (usec):  min=2,  max=3055,  avg=  3.70,

Average latency unchanged, max latency up a little ... makes sense,
since we'll do a little more work in the worst case.

> @@ -264,7 +267,8 @@ static void nvmet_file_execute_rw(struct nvmet_req *req)
> 
>          if (req->ns->buffered_io) {
>                  if (likely(!req->f.mpool_alloc) &&
> -                               nvmet_file_execute_io(req, IOCB_NOWAIT))
> +                               nvmet_file_execute_io(req,
> +                                       IOCB_NOWAIT |IOCB_CACHED))
>                          return;
>                  nvmet_file_submit_buffered_io(req);

You'll need a fallback path here, right?  IOCB_CACHED can get part-way
through doing a request, and then need to be finished off after taking
the mutex.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ