lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20180122123615.GA29827@hmswarspite.think-freely.org>
Date:   Mon, 22 Jan 2018 07:36:16 -0500
From:   Neil Horman <nhorman@...driver.com>
To:     whiteheadm@....org
Cc:     David Miller <davem@...emloft.net>,
        netdev <netdev@...r.kernel.org>, nhorman@...hat.com,
        klassert@...hematik.tu-chemnitz.de
Subject: Re: [PATCHv3] 3c59x: fix missing dma_mapping_error check and bad
 ring refill logic

On Mon, Jan 22, 2018 at 01:27:19AM -0500, tedheadster wrote:
> On Wed, Jan 3, 2018 at 1:44 PM, David Miller <davem@...emloft.net> wrote:
> > From: Neil Horman <nhorman@...driver.com>
> > Date: Wed,  3 Jan 2018 13:09:23 -0500
> >
> >> A few spots in 3c59x missed calls to dma_mapping_error checks, casuing
> >> WARN_ONS to trigger.  Clean those up.  While we're at it, refactor the
> >> refill code a bit so that if skb allocation or dma mapping fails, we
> >> recycle the existing buffer.  This prevents holes in the rx ring, and
> >> makes for much simpler logic
> >>
> >> Note: This is compile only tested.  Ted, if you could run this and
> >> confirm that it continues to work properly, I would appreciate it, as I
> >> currently don't have access to this hardware
> >>
> 
> Neil,
>   I was able to test this patch. I did not get any WARN_ON messages.
> However, I am getting a lot of dropped receive packets; uptime is 11
> minutes and it has already dropped 214 of 743 receive packets.
> 
> Admittedly this is on a slow i486 regression testing system, but the
> drop rate is approximately 30% which seems high even for this system
> because it is on a very quiet switched network.
> 
> I enabled some debugging messages by setting msglvl to 4 and
> recompiling with DYNAMIC_DEBUG=y. I did not see any messages of the
> form "No memory to allocate a sk_buff of size" so that leaves the
> following two cases:
> 
> boomerang_rx()
> ...
> newskb = netdev_alloc_skb_ip_align(dev, PKT_BUF_SZ);
> if (!newskb) {
>   dev->stats.rx_dropped++;
>   goto clear_complete;
>   }
>   newdma = pci_map_single(VORTEX_PCI(vp), newskb->data,
>                                                 PKT_BUF_SZ, PCI_DMA_FROMDEVICE)
>   if (dma_mapping_error(&VORTEX_PCI(vp)->dev, newdma)) {
>     dev->stats.rx_dropped++;
>     consume_skb(newskb);
>     goto clear_complete;
>   }
> 
> What shall we do to determine if it is hitting the pci_map_single() or
> netdev_alloc_skb_ip_align() failure?
> 
> - Matthew
> 

Well, I would suggest that you either modify and rebuild the kernel to add a
printk in both of those locations or (if you don't want to go to all that
trouble), write a systemtap script to probe both of those locations and print a
warning if those paths are executed.

That said, while I understand its good for your understanding of the problem,
knowing which cse you hit likely won't help you fix much.  Depending on which
path you hit, it means your either low on ram from which to allocate skbs, or
your out of space in your iommu (i'm guessing your using a software iotlb lib).
Both are pretty limited resources on the system you describe.

Neil

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ