netdev - Re: [PATCH 0/3] increase skb size to prevent dma over skb boundary

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 10 Dec 2009 16:00:46 -0500
From:	Neil Horman <nhorman@...driver.com>
To:	"Tantilov, Emil S" <emil.s.tantilov@...el.com>
Cc:	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	"e1000-devel@...ts.sourceforge.net" 
	<e1000-devel@...ts.sourceforge.net>,
	"davem@...emloft.net" <davem@...emloft.net>,
	"Kirsher, Jeffrey T" <jeffrey.t.kirsher@...el.com>,
	"Brandeburg, Jesse" <jesse.brandeburg@...el.com>,
	"Allan, Bruce W" <bruce.w.allan@...el.com>,
	"Waskiewicz Jr, Peter P" <peter.p.waskiewicz.jr@...el.com>,
	"Ronciak, John" <john.ronciak@...el.com>
Subject: Re: [PATCH 0/3] increase skb size to prevent dma over skb boundary

On Thu, Dec 10, 2009 at 11:20:58AM -0700, Tantilov, Emil S wrote:
> Neil Horman wrote:
> > On Tue, Dec 08, 2009 at 04:37:39PM -0700, Tantilov, Emil S wrote:
> >> Neil Horman wrote:
> >>> Hey all-
> >>> 	I was tracking down a memory corruptor lately in which, with
> >>> DEBUG_SLAB enabled we were getting several redzone violoations,
> >>> which was always followed by an oops in the rx receive path when
> >>> using the e1000e driver.  Looking at the cores, the slabs that had
> >>> their redzone violated were always adjacent to a size-2048 slab
> >>> which was allocated by __netdev_alloc_skb.  Doing some
> >>> instrumentation led me to see the following.  When the e1000e
> >>> driver sets up its rctl register, which defines the length of each
> >>> of the buffers in the rx ring, we do this: 
> >>> 
> >>> switch (adapter->rx_buffer_len) {
> >>>         case 256:
> >>>                 rctl |= E1000_RCTL_SZ_256;
> >>>                 rctl &= ~E1000_RCTL_BSEX;
> >>>                 break;
> >>>         case 512:
> >>>                 rctl |= E1000_RCTL_SZ_512;
> >>>                 rctl &= ~E1000_RCTL_BSEX;
> >>>                 break;
> >>>         case 1024:
> >>>                 rctl |= E1000_RCTL_SZ_1024;
> >>>                 rctl &= ~E1000_RCTL_BSEX;
> >>>                 break;
> >>>         case 2048:
> >>>         default:
> >>>                 rctl |= E1000_RCTL_SZ_2048;
> >>>                 rctl &= ~E1000_RCTL_BSEX;
> >>>                 break;
> >>>         case 4096:
> >>>                 rctl |= E1000_RCTL_SZ_4096;
> >>>                 break;
> >>>         case 8192:
> >>>                 rctl |= E1000_RCTL_SZ_8192;
> >>>                 break;
> >>>         case 16384:
> >>>                 rctl |= E1000_RCTL_SZ_16384;
> >>>                 break;
> >>>         }
> >>> }
> >>> 
> >>> This is problematic since rx_buffer_len is set to 1500, which causes
> >>> the rctl value to fall into the default case, which configures the
> >>> hardware to think it can dma up to 2048 bytes into each buffer,
> >>> despite the fact that the buffer is only 1500 bytes long.  If we
> >>> receive a frame that is longer than 1500 bytes, then the dma will
> >>> overrun the 1500 byte limit of the sk_buff's data block spilling
> >>> into the skb_shared_info structure (which is placed immediately
> >>> after the end of the skb->end pointer (or at skb->head + skb->end if
> >>> NET_SKBUFF_DATA_USES_OFFSET is configured).  In either case, the
> >>> spillover corrupts the skb_shared_info structure, and sets us up for
> >>> any number of subsequent corruptions/oopses/failures.  I've fixed
> >>> this by normalizing the rx_buffer_length value in the setup_rctl
> >>> function, which rounds up the length to the configured value of
> >>> rctl, forcing us to allocate buffers of a size that meet the
> >>> hardwares configuration needs. 
> >>> 
> >>> I've tested this on e1000e and confirmed that it fixes my redzone
> >>> violations and the observed oops.  Visual inspection indicates that
> >>> e1000 and ixgb also need this fix.  I've not explicitly tested them
> >>> though, so I've split this into three separate patches
> >>> 
> >>> Regards
> >>> Neil
> >>> 
> >>> Signed-off-by: Neil Horman <nhorman@...driver.com>
> >> 
> >> Hi Neil,
> >> 
> >> I am trying to test the patches you submitted (thanks btw) and so
> >> far am not able to reproduce the panic you described. When MTU is at
> >> 1500 RCTL.LPE (bit 5) is set to 0 and the HW will not allow the
> >> reception of large packets (>1522 bytes, which is what rx_buffer_len
> >> is set to). This is basically what I am seeing in my tests - packets
> >> are discarded by the HW.     
> >> 
> >> Can you provide more details about the setup you used? Steps to
> >> reproduce and also details (lspci -vvv, ethtool -e, ethtool -d) of
> >> the HW you used and kernel config should help as well.  
> >> 
> >> Thanks,
> >> Emil--
> >> To unsubscribe from this list: send the line "unsubscribe netdev" in
> >> the body of a message to majordomo@...r.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> 
> > 
> > 
> > Heres the data you asked for.  The config isn't exact, as the system
> > in question was installed a few times over since I worked on this,
> > but it should be very close.  The only differnce I see on this config
> > is that CONFIG_SLAB_DEBUG is turned on.  Everything else should be
> > identical 
> 
> Thanks for the data. Your config could not apply on my kernel (net-next-2.6 master branch). So I attached the one I used for testing in case you see anything that is different. I made sure DEBUG_SLAB is enabled.
> 
It might not apply cleanly no.  as I said, I had to reconstruct it (it was
originoally from an older kernel that I applied to a newer kernel)

> Is this the connectathon suite you were talking about:
> http://www.connectathon.org/nfstests.html
> 
Yes, thats the one.

> I ran overnight stress (1000 iterations of the suite)  plus some netperf stress and so far no signs of issues on a z-200 platform. 
> 
> BTW the HP z-200 is still in pre-production stage. Looking at the eeprom you provided it seems older than what we have. You should probably look into getting an updated FW.
> 
That might be worth looking into.  Thanks.  That said however, if its possible
for the eeprom to allow overlength frames to get dma-ed, isn't it worthwhile to
ensure that our allocation size matches what we tell the driver it is?  We
already do so in e1000_change_mtu:
...

	if (max_frame <= 256)
                adapter->rx_buffer_len = 256;
        else if (max_frame <= 512)
                adapter->rx_buffer_len = 512;
        else if (max_frame <= 1024)
                adapter->rx_buffer_len = 1024;
        else if (max_frame <= 2048)
                adapter->rx_buffer_len = 2048;
        else
                adapter->rx_buffer_len = 4096;
...

These patches just ensure that the same normalization is done for the initial
size of rx_buffer_len, should userspace never change the mtu of the NIC.

Regards
Neil

> Thanks,
> Emil
> 


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html