netdev - Re: [PATCH net-next v3 09/24] ovpn: implement basic TX path (UDP)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZkHfLWPJAznta1A4@hog>
Date: Mon, 13 May 2024 11:36:45 +0200
From: Sabrina Dubroca <sd@...asysnail.net>
To: Antonio Quartulli <antonio@...nvpn.net>
Cc: netdev@...r.kernel.org, Jakub Kicinski <kuba@...nel.org>,
	Sergey Ryazanov <ryazanov.s.a@...il.com>,
	Paolo Abeni <pabeni@...hat.com>, Eric Dumazet <edumazet@...gle.com>,
	Andrew Lunn <andrew@...n.ch>, Esben Haabendal <esben@...nix.com>
Subject: Re: [PATCH net-next v3 09/24] ovpn: implement basic TX path (UDP)

2024-05-13, 09:37:06 +0200, Antonio Quartulli wrote:
> On 12/05/2024 23:35, Sabrina Dubroca wrote:
> > 2024-05-06, 03:16:22 +0200, Antonio Quartulli wrote:
> > > +/* send skb to connected peer, if any */
> > > +static void ovpn_queue_skb(struct ovpn_struct *ovpn, struct sk_buff *skb,
> > > +			   struct ovpn_peer *peer)
> > > +{
> > > +	int ret;
> > > +
> > > +	if (likely(!peer))
> > > +		/* retrieve peer serving the destination IP of this packet */
> > > +		peer = ovpn_peer_get_by_dst(ovpn, skb);
> > > +	if (unlikely(!peer)) {
> > > +		net_dbg_ratelimited("%s: no peer to send data to\n",
> > > +				    ovpn->dev->name);
> > > +		goto drop;
> > > +	}
> > > +
> > > +	ret = ptr_ring_produce_bh(&peer->tx_ring, skb);
> > > +	if (unlikely(ret < 0)) {
> > > +		net_err_ratelimited("%s: cannot queue packet to TX ring\n",
> > > +				    peer->ovpn->dev->name);
> > > +		goto drop;
> > > +	}
> > > +
> > > +	if (!queue_work(ovpn->crypto_wq, &peer->encrypt_work))
> > > +		ovpn_peer_put(peer);
> > 
> > I wanted to come back to this after going through the crypto patch,
> > because this felt like a strange construct when I first looked at this
> > patch.
> > 
> > Why are you using a workqueue here? Based on the kdoc for crypto_wq
> > ("used to schedule crypto work that may sleep during TX/RX"), it's to
> > deal with async crypto.
> > 
> > If so, why not use the more standard way of dealing with async crypto
> > in contexts that cannot sleep, ie letting the crypto core call the
> > "done" callback asynchronously? You need to do all the proper refcount
> > handling, but IMO it's cleaner and simpler than this workqueue and
> > ptr_ring. You can see an example of that in macsec (macsec_encrypt_*
> > in drivers/net/macsec.c).
> 
> Aha! You don't know how happy I was when I found the doc describing how to
> convert the async code into sync-looking :-) With the detail that I had to
> move to a different context, as the code may want to sleep (hence the
> introduction of the workqueue).
> 
> It looks like I am little fan of WQs, while you are telling me to avoid them
> if possible.

I'm mainly trying to simplify the code (get rid of some ptr_rings, get
rid of some ping-pong between functions and changes of context,
etc). And here, I'm also trying to make it look more like other
similar pieces of code, because I'm already familiar with a few kernel
implementations of protocols doing crypto (macsec, ipsec, tls).

> I presume that using WQs comes with a non-negligible cost, therefore if we
> can just get things done without having to use them, then I should just
> don't.

If you're using AESNI for your GCM implementation, the crypto API will
also be using a workqueue (see crypto/cryptd.c), but only when the
crypto can't be done immediately (ie, when the FPU is already busing).

In the case of crypto accelerators, there might be benefits from
queueing multiple requests and then letting them live their life,
instead of waiting for each request separately. I don't have access to
that HW so I cannot test this.

> I think I could go back to no-workqueue encrypt/decrypt.
> Do you think this may have any impact on any future multi-core optimization?
> Back then I also thought that going through workers may make improvements in
> this area easier. But I could just be wrong.

Without thinking about it too deeply, the workqueue looks more like a
bottleneck that a no-workqueue approach just wouldn't have. You would
probably need per-CPU WQs (probably separated for encrypt and
decrypt). cryptd_enqueue_request (crypto/cryptd.c) has an example of
that.

-- 
Sabrina