[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Yoe7ykNXjUerhywY@zx2c4.com>
Date: Fri, 20 May 2022 18:03:22 +0200
From: "Jason A. Donenfeld" <Jason@...c4.com>
To: Jens Axboe <axboe@...nel.dk>
Cc: Theodore Ts'o <tytso@....edu>, Christoph Hellwig <hch@....de>,
LKML <linux-kernel@...r.kernel.org>,
Al Viro <viro@...iv.linux.org.uk>
Subject: Re: [PATCH v4 0/3] random: convert to using iters, for Al Viro
Hi Jens,
On Fri, May 20, 2022 at 09:58:28AM -0600, Jens Axboe wrote:
> On 5/20/22 9:55 AM, Jason A. Donenfeld wrote:
> > Hi Jens,
> >
> > On Fri, May 20, 2022 at 09:44:25AM -0600, Jens Axboe wrote:
> >> Ran 32, 1k, 4k here and it does seem to be down aboout 3%. Which is
> >> definitely bigger than I expected, particularly for larger reads. If
> >> anything, the 32b read seems comparably better than eg 1k or 4k, which
> >> is also unexpected. Let me do a bit of profiling to see what is up.
> >
> > Something to keep in mind wrt 32b is that for complicated crypto
> > reasons, the function has this logic:
> >
> > - If len <= 32, generate one 64 byte block and give <= 32 bytes of it to
> > the caller.
> >
> > - If len > 32, generate one 64 byte block, but give 0 of it to the
> > caller. Then generate ?len/64? blocks for the caller.
> >
> > Put together, this means:
> >
> > - 1..32, 1 block
> > - 33..64, 2 blocks
> > - 65..128, 3 blocks
> > - 129..196, 4 blocks
> >
> > So you get this sort of shelf where the amortization benefits don't
> > really kick in until after 3 blocks.
>
> Ah I see, I can see if 64b is closer to the change for eg 1k.
What I meant by providing all that detail is that from a cycles-per-byte
perspective, smaller=more expensive. So it's possible that the
difference in the patchset is less visible as it gets lost in the more
expensive operation.
> >> If you're worried about it, I'd just keep the read/write and add the
> >> iter variants on the side.
> >
> > Not a chance of that. These functions are already finicky as-is; I would
> > really hate to have to duplicate all of these paths.
>
> Then I'd say there are only two options:
>
> - Add a helper that provides splice for something that only has
> read/write set.
That'd be fine with me, but wouldn't it involve bringing back set_fs(),
because of the copy_to_user() in there?
> - Just accept that we're 3% slower reading from /dev/urandom for now,
> and maybe 1-2% for small reads. Can't really imagine this being a huge
> issue, how many high throughput /dev/urandom read situations exist in
> the real world?
An option three might be that eventually the VFS overhead is worked out
and read_iter() reaches parity. One can hope, I guess.
Jason
Powered by blists - more mailing lists