lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Sun, 15 Mar 2020 00:25:13 -0700 (PDT)
From:   David Miller <davem@...emloft.net>
To:     willemdebruijn.kernel@...il.com
Cc:     netdev@...r.kernel.org, jrosen@...co.com, willemb@...gle.com
Subject: Re: [PATCH net] net/packet: tpacket_rcv: avoid a producer race
 condition

From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
Date: Fri, 13 Mar 2020 12:18:09 -0400

> From: Willem de Bruijn <willemb@...gle.com>
> 
> PACKET_RX_RING can cause multiple writers to access the same slot if a
> fast writer wraps the ring while a slow writer is still copying. This
> is particularly likely with few, large, slots (e.g., GSO packets).
> 
> Synchronize kernel thread ownership of rx ring slots with a bitmap.
> 
> Writers acquire a slot race-free by testing tp_status TP_STATUS_KERNEL
> while holding the sk receive queue lock. They release this lock before
> copying and set tp_status to TP_STATUS_USER to release to userspace
> when done. During copying, another writer may take the lock, also see
> TP_STATUS_KERNEL, and start writing to the same slot.
> 
> Introduce a new rx_owner_map bitmap with a bit per slot. To acquire a
> slot, test and set with the lock held. To release race-free, update
> tp_status and owner bit as a transaction, so take the lock again.
> 
> This is the one of a variety of discussed options (see Link below):
> 
> * instead of a shadow ring, embed the data in the slot itself, such as
> in tp_padding. But any test for this field may match a value left by
> userspace, causing deadlock.
> 
> * avoid the lock on release. This leaves a small race if releasing the
> shadow slot before setting TP_STATUS_USER. The below reproducer showed
> that this race is not academic. If releasing the slot after tp_status,
> the race is more subtle. See the first link for details.
> 
> * add a new tp_status TP_KERNEL_OWNED to avoid the transactional store
> of two fields. But, legacy applications may interpret all non-zero
> tp_status as owned by the user. As libpcap does. So this is possible
> only opt-in by newer processes. It can be added as an optional mode.
> 
> * embed the struct at the tail of pg_vec to avoid extra allocation.
> The implementation proved no less complex than a separate field.
> 
> The additional locking cost on release adds contention, no different
> than scaling on multicore or multiqueue h/w. In practice, below
> reproducer nor small packet tcpdump showed a noticeable change in
> perf report in cycles spent in spinlock. Where contention is
> problematic, packet sockets support mitigation through PACKET_FANOUT.
> And we can consider adding opt-in state TP_KERNEL_OWNED.
> 
> Easy to reproduce by running multiple netperf or similar TCP_STREAM
> flows concurrently with `tcpdump -B 129 -n greater 60000`.
> 
> Based on an earlier patchset by Jon Rosen. See links below.
> 
> I believe this issue goes back to the introduction of tpacket_rcv,
> which predates git history.
> 
> Link: https://www.mail-archive.com/netdev@vger.kernel.org/msg237222.html
> Suggested-by: Jon Rosen <jrosen@...co.com>
> Signed-off-by: Willem de Bruijn <willemb@...gle.com>

Applied and queued up for -stable, thanks everyone.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ