lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20101015155956.GA4286@redhat.com>
Date:	Fri, 15 Oct 2010 17:59:56 +0200
From:	Stanislaw Gruszka <sgruszka@...hat.com>
To:	Francois Romieu <romieu@...zoreil.com>
Cc:	netdev@...r.kernel.org, Denis Kirjanov <kirjanov@...il.com>
Subject: Re: [PATCH 6/6] r8169: print errors when dma mapping fail

On Fri, Oct 15, 2010 at 04:52:01PM +0200, Francois Romieu wrote:
> Stanislaw Gruszka <sgruszka@...hat.com> :
> > Print errors because dma mapping failures can cause device to stop
> > working and will need user intervention to recover.
> 
> I am hesitating (overengineered ? bloaty ? not the right place ?).

As someone who seen lot's of bug reports like "my network device stops
working, nothing in dmesg", or like "my network device stops working,
there is NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out in
dmesg" (what is nothing but useful information), I do no think this is
overengineered or bloaty. I could agree for "not the right place", but
even if the error would be reported by upper layers, exact reason of
the problem will be unknown. Regarding lower layers, I don't think iommu
or other dma code print warning with calltrace in case of failure.

> The Tx stats are kept up-to-date : Tx failure will go along a Tx drop
> stat increase.

In current implementation, I stop tx queue on dma errors, if that
happens the queue can never be started again. I will probably change
that as you suggest not returning NETDEV_TX_BUSY, stopping the queue
is also wrong. But I would like to keep this error messages, perhaps
after adding net_ratelimit() check.
 
> Regarding a mapping failure in the Rx path, either it will behave as
> an allocation failure at open / resume time -

Still it's worth to know exact reason of failure.

> and I have no idea how
> the user will recover - or it will happen during a Rx ring refill.

ifconfig eth0 down/up or reloading module

Stanislaw
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ