lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Tue, 11 Nov 2008 11:05:07 -0800 From: "Lovich, Vitali" <vlovich@...lcomm.com> To: Johann Baudy <johaahn@...il.com> CC: Evgeniy Polyakov <zbr@...emap.net>, "netdev@...r.kernel.org" <netdev@...r.kernel.org> Subject: RE: [PATCH] Packet socket: mmapped IO: PACKET_TX_RING Hi Johann, > Ok, so we can skip below block according to this assumption: > for (i = 0; i < req->tp_block_nr; i++) { > void *ptr = pg_vec[i]; > int k; > > for (k = 0; k < rb->frames_per_block; k++) { > __packet_set_status(po, ptr, TP_STATUS_KERNEL); > ptr += req->tp_frame_size; > } > } > Yes. > > > > Earlier, I think I had said that the mapped variable should be placed > in packet_ring_buffer struct - I was wrong. It's correct where it is. > I had forgotten that rx & tx are mapped contiguously and > simultaneously. There needs to be a note in the doc that the rings > should be set up before mapping (i.e. cannot set up the rx or tx ring > buffer first, mmap, and then try to change it). > > Yes, we can also add a check for po->mapped in > PACKET_TX_RING/PACKET_RX_RING cases of packet_setsockopt() switch. Since those call in to packet_set_ring, we can just fix the logic there - there's no need to complicate packet_setsockopt unnecessarily. > > > > We do need still need a check for tx_pending_skb when sizing the ring > buffer (in the situation where we unfreed skbs created by a call to > send from Thread B, while Thread A unmaps the ring buffer and trys a > resize). Also, in tpacket_snd, we need to increment mapped before > allocating the skb and test if the original value was 0 (we decrement > after incrementing tx_pending_skb). If it was, then that indicates > that the user had unmapped the ring buffer, which means we should stop > sending. No need for additional locking because of the atomic > relationship this sets up between tx_pending_skb and mapped. > > I'm not sure to understand: Sorry - I do realize I rambled a bit. I'll try to clarify. > > Let me, sum up : > * when setting or resizing the ring: we have to return an error if: > - someone has mapped the socket (1.a) > - sending is ongoing - tx_pending_skb != 0 (1.b) > - a send() procedure is ongoing (1.c) Yes. > > * when closing the socket, we have to return an error if: > - sending is ongoing - tx_pending_skb != 0 (2.a) > - a send() procedure is ongoing (2.b) Only 2.a should be a reason to fail to close. With 2.b, send should abort, recognizing the socket is closing, without sending anything out. Realistically however, it's probably too hard to co-ordinate since 2.b causes 2.a. So a more realistic approach is that closing never fails. Instead, we add a flag field into packet_sock. In send, we atomically set bit 0. When closing, we atomically set bit 1. If in closing bit 0 is also set, then we defer the actual closing of the socket to send. If send sees bit 1 & bit 0 set & tx_pending_skb == 0, then it frees the actual socket. Otherwise, the skb destructor is responsible for closing the socket once tx_pending_skb drops to 0 & bit 1 is set. Calls to send fail if bit 1 is set. > > * during do-while loop of send(),: > - protect ring against simultaneous send() (3.a) > - ensure that ring is not resized (3.b) (linked with 1.c) > Our main issue is when: tx_pending_skb = 0, Thread A starts to send a > packet while Thread B unmaps and resizes the ring. > Thread B can potentially resizes the ring until increment of > tx_pending_skb in Thread A. > - loop can continue even if ring is not mapped (3.c), this has no > sense but it is not critical 3.a can be protected through the flag field as above - atomically set the 0th bit if it's not set (test_and_set_bit). Otherwise, return with -EBUSY or similar error code. After this protection, atomically increment (atomic_inc_return). If it is 1, then abort send because the user has unmapped the memory. Don't forget to atomically decrement it again (and clear the 0th flag bit) regardless before exiting from send. > > * when executing the destructor callback: > - we must ensure that ring structure has not changed since send() (4.a) This is a consequence of the above logic. > > (1.a) can be done with po->mmaped check in packet_setsockopt() > (1.b) and (2.a) can be done as you said : tx_pending_skb != 0 check > when sizing buffer in packet_set_ring() > (3.a) by a mutex M outside of the do-while loop > (1.c), (2.b) and (3.b) can be performed thanks to the same mutex M and > mutex_trylock() in packet_set_ring(). An error is returned if we can't > take the mutex . The mutex is released at the end of packet_set_ring() > call. > (4.a) is validated with tx_pending_skb increment, (3.b),(1.b) and (2.a) > > Do I miss something? I hope I covered everything above. I'm not sure if send, setsockopt, etc are in an interrupt context or not. My guess is they are and thus mutexes are not allowed - spinlocks only. However, there's also no need for mutexes or spinlocks because I believe the above atomic algorithm should work. I should be able to post a patch by the end of the day. Thanks, Vitali -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists