linux-kernel - Re: [PATCH 0/3] readfile(2): a new syscall to make open/read/close faster

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200705115851.GB1227929@kroah.com>
Date:   Sun, 5 Jul 2020 13:58:51 +0200
From:   Greg KH <gregkh@...uxfoundation.org>
To:     Jan Ziak <0xe2.0x9a.0x9b@...il.com>
Cc:     Matthew Wilcox <willy@...radead.org>, linux-api@...r.kernel.org,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-kselftest@...r.kernel.org, linux-man@...r.kernel.org,
        mtk.manpages@...il.com, shuah@...nel.org, viro@...iv.linux.org.uk
Subject: Re: [PATCH 0/3] readfile(2): a new syscall to make open/read/close
 faster

On Sun, Jul 05, 2020 at 06:09:03AM +0200, Jan Ziak wrote:
> On Sun, Jul 5, 2020 at 5:27 AM Matthew Wilcox <willy@...radead.org> wrote:
> >
> > On Sun, Jul 05, 2020 at 05:18:58AM +0200, Jan Ziak wrote:
> > > On Sun, Jul 5, 2020 at 5:12 AM Matthew Wilcox <willy@...radead.org> wrote:
> > > >
> > > > You should probably take a look at io_uring.  That has the level of
> > > > complexity of this proposal and supports open/read/close along with many
> > > > other opcodes.
> > >
> > > Then glibc can implement readfile using io_uring and there is no need
> > > for a new single-file readfile syscall.
> >
> > It could, sure.  But there's also a value in having a simple interface
> > to accomplish a simple task.  Your proposed API added a very complex
> > interface to satisfy needs that clearly aren't part of the problem space
> > that Greg is looking to address.
> 
> I believe that we should look at the single-file readfile syscall from
> a performance viewpoint. If an application is expecting to read a
> couple of small/medium-size files per second, then neither readfile
> nor readfiles makes sense in terms of improving performance. The
> benefits start to show up only in case an application is expecting to
> read at least a hundred of files per second. The "per second" part is
> important, it cannot be left out. Because readfile only improves
> performance for many-file reads, the syscall that applications
> performing many-file reads actually want is the multi-file version,
> not the single-file version.

It also is a measurable increase over reading just a single file.
Here's my really really fast AMD system doing just one call to readfile
vs. one call sequence to open/read/close:

	$ ./readfile_speed -l 1
	Running readfile test on file /sys/devices/system/cpu/vulnerabilities/meltdown for 1 loops...
	Took 3410 ns
	Running open/read/close test on file /sys/devices/system/cpu/vulnerabilities/meltdown for 1 loops...
	Took 3780 ns

370ns isn't all that much, yes, but it is 370ns that could have been
used for something else :)

Look at the overhead these days of a syscall using something like perf
to see just how bad things have gotten on Intel-based systems (above was
AMD which doesn't suffer all the syscall slowdowns, only some).

I'm going to have to now dig up my old rpi to get the stats on that
thing, as well as some Intel boxes to show the problem I'm trying to
help out with here.  I'll post that for the next round of this patch
series.

> I am not sure I understand why you think that a pointer to an array of
> readfile_t structures is very complex. If it was very complex then it
> would be a deep tree or a large graph.

Of course you can make it more complex if you want, but look at the
existing tools that currently do many open/read/close sequences.  The
apis there don't lend themselves very well to knowing the larger list of
files ahead of time.  But I could be looking at the wrong thing, what
userspace programs are you thinking of that could be easily converted
into using something like this?

thanks,

greg k-h