lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YEiLbLiXe6ju/vCO@TonyMac-Alibaba>
Date:   Wed, 10 Mar 2021 17:03:40 +0800
From:   Tony Lu <tonylu@...ux.alibaba.com>
To:     Steven Rostedt <rostedt@...dmis.org>
Cc:     davem@...emloft.net, mingo@...hat.com, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] net: add net namespace inode for all net_dev events

On Tue, Mar 09, 2021 at 12:40:11PM -0500, Steven Rostedt wrote:
> On Tue,  9 Mar 2021 12:43:50 +0800
> Tony Lu <tonylu@...ux.alibaba.com> wrote:
> 
> > There are lots of net namespaces on the host runs containers like k8s.
> > It is very common to see the same interface names among different net
> > namespaces, such as eth0. It is not possible to distinguish them without
> > net namespace inode.
> > 
> > This adds net namespace inode for all net_dev events, help us
> > distinguish between different net devices.
> > 
> > Output:
> >   <idle>-0       [006] ..s.   133.306989: net_dev_xmit: net_inum=4026531992 dev=eth0 skbaddr=0000000011a87c68 len=54 rc=0
> > 
> > Signed-off-by: Tony Lu <tonylu@...ux.alibaba.com>
> > ---
> >  include/trace/events/net.h | 35 +++++++++++++++++++++++++----------
> >  1 file changed, 25 insertions(+), 10 deletions(-)
> > 
> > diff --git a/include/trace/events/net.h b/include/trace/events/net.h
> > index 2399073c3afc..a52f90d83411 100644
> > --- a/include/trace/events/net.h
> > +++ b/include/trace/events/net.h
> > @@ -35,6 +35,7 @@ TRACE_EVENT(net_dev_start_xmit,
> >  		__field(	u16,			gso_size	)
> >  		__field(	u16,			gso_segs	)
> >  		__field(	u16,			gso_type	)
> > +		__field(	unsigned int,		net_inum	)
> >  	),
> 
> This patch made me take a look at the net_dev_start_xmit trace event, and I
> see it's very "holy". That is, it has lots of holes in the structure.
> 
> 	TP_STRUCT__entry(
> 		__string(	name,			dev->name	)
> 		__field(	u16,			queue_mapping	)
> 		__field(	const void *,		skbaddr		)
> 		__field(	bool,			vlan_tagged	)
> 		__field(	u16,			vlan_proto	)
> 		__field(	u16,			vlan_tci	)
> 		__field(	u16,			protocol	)
> 		__field(	u8,			ip_summed	)
> 		__field(	unsigned int,		len		)
> 		__field(	unsigned int,		data_len	)
> 		__field(	int,			network_offset	)
> 		__field(	bool,			transport_offset_valid)
> 		__field(	int,			transport_offset)
> 		__field(	u8,			tx_flags	)
> 		__field(	u16,			gso_size	)
> 		__field(	u16,			gso_segs	)
> 		__field(	u16,			gso_type	)
> 		__field(	unsigned int,		net_inum	)
> 	),
> 
> If you look at /sys/kernel/tracing/events/net/net_dev_start_xmit/format
> 
> name: net_dev_start_xmit
> ID: 1581
> format:
> 	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
> 	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
> 	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
> 	field:int common_pid;	offset:4;	size:4;	signed:1;
> 
> 	field:__data_loc char[] name;	offset:8;	size:4;	signed:1;
> 	field:u16 queue_mapping;	offset:12;	size:2;	signed:0;
> 	field:const void * skbaddr;	offset:16;	size:8;	signed:0;
> 
> Notice, queue_mapping is 2 bytes at offset 12 (ends at offset 14), but
> skbaddr starts at offset 16. That means there's two bytes wasted.
> 
> 	field:bool vlan_tagged;	offset:24;	size:1;	signed:0;
> 	field:u16 vlan_proto;	offset:26;	size:2;	signed:0;
> 
> Another byte missing above (24 + 1 != 26)
> 
> 	field:u16 vlan_tci;	offset:28;	size:2;	signed:0;
> 	field:u16 protocol;	offset:30;	size:2;	signed:0;
> 	field:u8 ip_summed;	offset:32;	size:1;	signed:0;
> 	field:unsigned int len;	offset:36;	size:4;	signed:0;
> 
> Again another three bytes missing (32 + 1 != 36)
> 
> 	field:unsigned int data_len;	offset:40;	size:4;	signed:0;
> 	field:int network_offset;	offset:44;	size:4;	signed:1;
> 	field:bool transport_offset_valid;	offset:48;	size:1;	signed:0;
> 	field:int transport_offset;	offset:52;	size:4;	signed:1;
> 
> Again, another 3 bytes missing (48 + 1 != 52)
> 
> 	field:u8 tx_flags;	offset:56;	size:1;	signed:0;
> 	field:u16 gso_size;	offset:58;	size:2;	signed:0;
> 
> Another byte missing (56 + 1 != 58)
> 
> 	field:u16 gso_segs;	offset:60;	size:2;	signed:0;	
> 	field:u16 gso_type;	offset:62;	size:2;	signed:0;
> 	field:unsigned int net_inum;	offset:64;	size:4;	signed:0;
> 
> The above shows 10 bytes wasted for this event.
> 
> The order of the fields is important. Don't worry about breaking API by
> fixing it. The parsing code uses this output to find where the binary data
> is.
> 
> Hmm, I should write a script that reads all the format files and point out
> areas that have holes in it.

I use pahole to read vmlinux.o directly with defconfig and
CONFIG_DEBUG_INFO enabled, the result shows 22 structs prefixed with
trace_event_raw_ that have at least one hole.

Command:
	# structs have at least one hole and can be packed
	pahole vmlinux.o -H 1 -R -P -y trace_event_raw_ > output.txt

	# details (result includes above)
	pahole vmlinux.o -H 1 -y trace_event_raw_ > output_detail.txt

Here is the lists (>= 1 hole and can be packed, #1 size, #2 ):

	trace_event_raw_mm_shrink_slab_start    80      72      8
	trace_event_raw_mm_shrink_slab_end      64      56      8
	trace_event_raw_percpu_alloc_percpu     56      48      8
	trace_event_raw_writeback_inode_template        48      40      8
	trace_event_raw_filelock_lock   88      80      8
	trace_event_raw_iomap_apply     72      64      8
	trace_event_raw_ext4__map_blocks_exit   56      48      8
	trace_event_raw_ext4_ext_rm_leaf        64      56      8
	trace_event_raw_ext4_ext_remove_space_done      64      56      8
	trace_event_raw_nfs_rename_event_done   56      48      8
	trace_event_raw_nfs4_rename     56      48      8
	trace_event_raw_net_dev_start_xmit      72      64      8
	trace_event_raw_net_dev_rx_verbose_template     88      72      16
	trace_event_raw_tcp_probe       120     112     8
	trace_event_raw_qdisc_dequeue   64      56      8
	trace_event_raw_rpc_xdr_alignment       88      80      8
	trace_event_raw_rdev_return_int_mesh_config     108     104     4
	trace_event_raw_rdev_update_mesh_config 128     120     8
	trace_event_raw_rdev_get_ftm_responder_stats    120     112     8
	trace_event_raw_drv_bss_info_changed    184     168     16
	trace_event_raw_drv_ampdu_action        88      80      8
	trace_event_raw_drv_tdls_recv_channel_switch    112     104     8


Here is the detail (net_dev_start_xmit as example):

	struct trace_event_raw_net_dev_start_xmit {
	        struct trace_entry         ent;                  /*     0     8 */
	        u32                        __data_loc_name;      /*     8     4 */
	        u16                        queue_mapping;        /*    12     2 */
	
	        /* XXX 2 bytes hole, try to pack */
	
	        const void  *              skbaddr;              /*    16     8 */
	        bool                       vlan_tagged;          /*    24     1 */
	
	        /* XXX 1 byte hole, try to pack */
	
	        u16                        vlan_proto;           /*    26     2 */
	        u16                        vlan_tci;             /*    28     2 */
	        u16                        protocol;             /*    30     2 */
	        u8                         ip_summed;            /*    32     1 */
	
	        /* XXX 3 bytes hole, try to pack */
	
	        unsigned int               len;                  /*    36     4 */
	        unsigned int               data_len;             /*    40     4 */
	        int                        network_offset;       /*    44     4 */
	        bool                       transport_offset_valid; /*    48     1 */
	
	        /* XXX 3 bytes hole, try to pack */
	
	        int                        transport_offset;     /*    52     4 */
	        u8                         tx_flags;             /*    56     1 */
	
	        /* XXX 1 byte hole, try to pack */
	
	        u16                        gso_size;             /*    58     2 */
	        u16                        gso_segs;             /*    60     2 */
	        u16                        gso_type;             /*    62     2 */
	        /* --- cacheline 1 boundary (64 bytes) --- */
	        unsigned int               net_inum;             /*    64     4 */
	        char                       __data[];             /*    68     0 */
	
	        /* size: 72, cachelines: 2, members: 20 */
	        /* sum members: 58, holes: 5, sum holes: 10 */
	        /* padding: 4 */
	        /* last cacheline: 8 bytes */
	};

pahole shows there are 5 holes with 10 bytes in net_dev_start_xmit event.


Cheers,
Tony Lu

> 
> I haven't looked at the other trace events, but they may all have the same
> issues.
> 
> -- Steve
> 
> 
> 
> >  
> >  	TP_fast_assign(
> > @@ -56,10 +57,12 @@ TRACE_EVENT(net_dev_start_xmit,
> >  		__entry->gso_size = skb_shinfo(skb)->gso_size;
> >  		__entry->gso_segs = skb_shinfo(skb)->gso_segs;
> >  		__entry->gso_type = skb_shinfo(skb)->gso_type;
> > +		__entry->net_inum = dev_net(skb->dev)->ns.inum;
> >  	),
> >  
> >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ