netdev - Re: [PATCH 6/6] r8169: print errors when dma mapping fail

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Fri, 15 Oct 2010 17:59:56 +0200
From:	Stanislaw Gruszka <sgruszka@...hat.com>
To:	Francois Romieu <romieu@...zoreil.com>
Cc:	netdev@...r.kernel.org, Denis Kirjanov <kirjanov@...il.com>
Subject: Re: [PATCH 6/6] r8169: print errors when dma mapping fail

On Fri, Oct 15, 2010 at 04:52:01PM +0200, Francois Romieu wrote:
> Stanislaw Gruszka <sgruszka@...hat.com> :
> > Print errors because dma mapping failures can cause device to stop
> > working and will need user intervention to recover.
> 
> I am hesitating (overengineered ? bloaty ? not the right place ?).

As someone who seen lot's of bug reports like "my network device stops
working, nothing in dmesg", or like "my network device stops working,
there is NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out in
dmesg" (what is nothing but useful information), I do no think this is
overengineered or bloaty. I could agree for "not the right place", but
even if the error would be reported by upper layers, exact reason of
the problem will be unknown. Regarding lower layers, I don't think iommu
or other dma code print warning with calltrace in case of failure.

> The Tx stats are kept up-to-date : Tx failure will go along a Tx drop
> stat increase.

In current implementation, I stop tx queue on dma errors, if that
happens the queue can never be started again. I will probably change
that as you suggest not returning NETDEV_TX_BUSY, stopping the queue
is also wrong. But I would like to keep this error messages, perhaps
after adding net_ratelimit() check.

> Regarding a mapping failure in the Rx path, either it will behave as
> an allocation failure at open / resume time -

Still it's worth to know exact reason of failure.

> and I have no idea how
> the user will recover - or it will happen during a Rx ring refill.

ifconfig eth0 down/up or reloading module

Stanislaw
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html