lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 13 May 2024 11:47:56 +0200
From: Antonio Quartulli <antonio@...nvpn.net>
To: Sabrina Dubroca <sd@...asysnail.net>
Cc: netdev@...r.kernel.org, Jakub Kicinski <kuba@...nel.org>,
 Sergey Ryazanov <ryazanov.s.a@...il.com>, Paolo Abeni <pabeni@...hat.com>,
 Eric Dumazet <edumazet@...gle.com>, Andrew Lunn <andrew@...n.ch>,
 Esben Haabendal <esben@...nix.com>
Subject: Re: [PATCH net-next v3 09/24] ovpn: implement basic TX path (UDP)

On 13/05/2024 11:36, Sabrina Dubroca wrote:
> 2024-05-13, 09:37:06 +0200, Antonio Quartulli wrote:
>> On 12/05/2024 23:35, Sabrina Dubroca wrote:
>>> 2024-05-06, 03:16:22 +0200, Antonio Quartulli wrote:
>>>> +/* send skb to connected peer, if any */
>>>> +static void ovpn_queue_skb(struct ovpn_struct *ovpn, struct sk_buff *skb,
>>>> +			   struct ovpn_peer *peer)
>>>> +{
>>>> +	int ret;
>>>> +
>>>> +	if (likely(!peer))
>>>> +		/* retrieve peer serving the destination IP of this packet */
>>>> +		peer = ovpn_peer_get_by_dst(ovpn, skb);
>>>> +	if (unlikely(!peer)) {
>>>> +		net_dbg_ratelimited("%s: no peer to send data to\n",
>>>> +				    ovpn->dev->name);
>>>> +		goto drop;
>>>> +	}
>>>> +
>>>> +	ret = ptr_ring_produce_bh(&peer->tx_ring, skb);
>>>> +	if (unlikely(ret < 0)) {
>>>> +		net_err_ratelimited("%s: cannot queue packet to TX ring\n",
>>>> +				    peer->ovpn->dev->name);
>>>> +		goto drop;
>>>> +	}
>>>> +
>>>> +	if (!queue_work(ovpn->crypto_wq, &peer->encrypt_work))
>>>> +		ovpn_peer_put(peer);
>>>
>>> I wanted to come back to this after going through the crypto patch,
>>> because this felt like a strange construct when I first looked at this
>>> patch.
>>>
>>> Why are you using a workqueue here? Based on the kdoc for crypto_wq
>>> ("used to schedule crypto work that may sleep during TX/RX"), it's to
>>> deal with async crypto.
>>>
>>> If so, why not use the more standard way of dealing with async crypto
>>> in contexts that cannot sleep, ie letting the crypto core call the
>>> "done" callback asynchronously? You need to do all the proper refcount
>>> handling, but IMO it's cleaner and simpler than this workqueue and
>>> ptr_ring. You can see an example of that in macsec (macsec_encrypt_*
>>> in drivers/net/macsec.c).
>>
>> Aha! You don't know how happy I was when I found the doc describing how to
>> convert the async code into sync-looking :-) With the detail that I had to
>> move to a different context, as the code may want to sleep (hence the
>> introduction of the workqueue).
>>
>> It looks like I am little fan of WQs, while you are telling me to avoid them
>> if possible.
> 
> I'm mainly trying to simplify the code (get rid of some ptr_rings, get
> rid of some ping-pong between functions and changes of context,
> etc). And here, I'm also trying to make it look more like other
> similar pieces of code, because I'm already familiar with a few kernel
> implementations of protocols doing crypto (macsec, ipsec, tls).

Thanks for that, you already helped me getting rid of several constructs 
that weren't really needed.

> 
>> I presume that using WQs comes with a non-negligible cost, therefore if we
>> can just get things done without having to use them, then I should just
>> don't.
> 
> If you're using AESNI for your GCM implementation, the crypto API will
> also be using a workqueue (see crypto/cryptd.c), but only when the
> crypto can't be done immediately (ie, when the FPU is already busing).

Right.

> 
> In the case of crypto accelerators, there might be benefits from
> queueing multiple requests and then letting them live their life,
> instead of waiting for each request separately. I don't have access to
> that HW so I cannot test this.
> 
>> I think I could go back to no-workqueue encrypt/decrypt.
>> Do you think this may have any impact on any future multi-core optimization?
>> Back then I also thought that going through workers may make improvements in
>> this area easier. But I could just be wrong.
> 
> Without thinking about it too deeply, the workqueue looks more like a
> bottleneck that a no-workqueue approach just wouldn't have. You would
> probably need per-CPU WQs (probably separated for encrypt and
> decrypt). cryptd_enqueue_request (crypto/cryptd.c) has an example of
> that.

Ok. Sounds like something we can re-consider later on.

Let's start by killing the wq in the current code.

Thanks for the pointers.


-- 
Antonio Quartulli
OpenVPN Inc.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ