[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHmME9pDjyL+9HNs4PUgb3dtrH3=zt+YmAmWs+T2b6HnWD4wuw@mail.gmail.com>
Date: Mon, 7 Nov 2016 19:23:05 +0100
From: "Jason A. Donenfeld" <Jason@...c4.com>
To: Eric Biggers <ebiggers@...gle.com>
Cc: David Miller <davem@...emloft.net>,
Herbert Xu <herbert@...dor.apana.org.au>,
linux-crypto@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>,
Martin Willi <martin@...ongswan.org>,
WireGuard mailing list <wireguard@...ts.zx2c4.com>,
René van Dorst <opensource@...rst.com>
Subject: Re: [PATCH] poly1305: generic C can be faster on chips with slow
unaligned access
On Mon, Nov 7, 2016 at 7:08 PM, Jason A. Donenfeld <Jason@...c4.com> wrote:
> Hmm... The general data flow that strikes me as most pertinent is
> something like:
>
> struct sk_buff *skb = get_it_from_somewhere();
> skb = skb_share_check(skb, GFP_ATOMIC);
> num_frags = skb_cow_data(skb, ..., ...);
> struct scatterlist sg[num_frags];
> sg_init_table(sg, num_frags);
> skb_to_sgvec(skb, sg, ..., ...);
> blkcipher_walk_init(&walk, sg, sg, len);
> blkcipher_walk_virt_block(&desc, &walk, BLOCK_SIZE);
> while (walk.nbytes >= BLOCK_SIZE) {
> size_t chunk_len = rounddown(walk.nbytes, BLOCK_SIZE);
> poly1305_update(&poly1305_state, walk.src.virt.addr, chunk_len);
> blkcipher_walk_done(&desc, &walk, walk.nbytes % BLOCK_SIZE);
> }
> if (walk.nbytes) {
> poly1305_update(&poly1305_state, walk.src.virt.addr, walk.nbytes);
> blkcipher_walk_done(&desc, &walk, 0);
> }
>
> Is your suggestion that that in the final if block, walk.src.virt.addr
> might be unaligned? Like in the case of the last fragment being 67
> bytes long?
In fact, I'm not so sure this happens here. In the while loop, each
new walk.src.virt.addr will be aligned to BLOCK_SIZE or be aligned by
virtue of being at the start of a new page. In the subsequent if
block, walk.src.virt.addr will either be
some_aligned_address+BLOCK_SIZE, which will be aligned, or it will be
a start of a new page, which will be aligned.
So what did you have in mind exactly?
I don't think anybody is running code like:
for (size_t i = 0; i < len; i += 17)
poly1305_update(&poly, &buffer[i], 17);
(And if so, those consumers should be fixed.)
Powered by blists - more mailing lists