netdev - Re: [PATCH net-next] net: Use rx_nohandler for unhandled packets

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAPRrrxVJjKJ0eE_Xd78DNfyixjr=iTSfVQ1FsAssek-+XMWKUQ@mail.gmail.com>
Date:   Mon, 16 Dec 2019 15:07:14 +0100
From:   Patrick Talbert <ptalbert@...hat.com>
To:     David Miller <davem@...emloft.net>
Cc:     netdev@...r.kernel.org
Subject: Re: [PATCH net-next] net: Use rx_nohandler for unhandled packets

On Sun, Dec 15, 2019 at 9:41 PM David Miller <davem@...emloft.net> wrote:
>
> From: Patrick Talbert <ptalbert@...hat.com>
> Date: Wed, 11 Dec 2019 17:21:07 +0100
>
> > Since caf586e5f23c ("net: add a core netdev->rx_dropped counter") incoming
> > packets which do not have a handler cause a counter named rx_dropped to be
> > incremented. This can lead to confusion as some see a non-zero "drop"
> > counter as cause for concern.
> >
> > To avoid any confusion, instead use the existing rx_nohandler counter. Its
> > name more closely aligns with the activity being tracked here.
> >
> > Signed-off-by: Patrick Talbert <ptalbert@...hat.com>
>
> I disagree with this change.

Thank you for the review and consideration.

>
> When deliver_exact is false we try to deliver the packet to an appropriate
> ptype handler.  And we end up in the counter bump if no ptype handler
> exists.

If we get to this point in __netif_receive_skb_core() it means the
packet was not passed on to any handler or socket but it is not
because anything went wrong. It simply means no one cared about it. I
suppose a counter named rx_nothandled or rx_ignored would fit better
but for now, the existing rx_nohandler counter much more closely
describes what has happened. Lumping these discards into rx_dropped
along with buffer errors and that ilk is misleading at best.

>
> Therefore, the two counters allow to distinguish two very different
> situations and providing that distinction is quite valuable.

As is, we have packets which are not handled due to no fault and
packets which were discarded due to some fault lumped into rx_dropped.
I think it is much more useful to clarify that distinction by
utilizing separate counters versus worrying that someone might miss
the distinction between an inactive slave drop and any other drop.

>
> I think this distinction was very much intentional.  Having people
> understand that rx_dropped can have this meaning is merely a matter of
> education.

>From what I recall, the patch to add rx_nohandler was a reaction to
caf586e5f23c ("net: add a core netdev->rx_dropped counter") being
backported to RHEL7 and customers suddenly seeing rx_dopped piling up
on inactive bond slaves. I wish it had been more broad to trap all the
unhandled traffic, not just inactive slave traffic, but it wasn't.

I appreciate netdev rx_dropped by itself should not be considered a
count of errors but I argue that is not at all common knowledge. At
Red Hat it is a weekly, if not daily, occurrence that a customer opens
a case to ask about a non-zero rx_dropped counter. Whether they
noticed it themselves or some 3rd party monitoring software
highlighted it. Sometimes it reflects a real problem but more often
than not it turns out there is some benign LLC traffic (or the like)
arriving on the interface. That's a seemingly simple and nice answer
but it often takes a good bit of time to convince people they can
ignore a non-zero rx_dropped (if not ignore, at least appreciate it is
not a fault of the NIC/OS). For example, with pcaps and by monitoring
the counter you can show a 1:1 correlation between specific types of
traffic which do not match /proc/net/ptype and increases in
rx_dropped. Its neat but it takes time and often distracts from
whatever the real problem is.

>
> I'm not applying this patch, sorry.
>

If you do not agree with adjusting the purpose of rx_nohandler to
track these events then would you agree to a new counter?