[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHNKnsQGwV9Z9dSrKusLV7qE+Xw_4eqEDtHKTVJxuuy6H+pWRA@mail.gmail.com>
Date: Sun, 11 Sep 2022 02:38:38 +0300
From: Sergey Ryazanov <ryazanov.s.a@...il.com>
To: Thorsten Glaser <t.glaser@...ent.de>
Cc: netdev@...r.kernel.org
Subject: Re: RFH, where did I go wrong?
Hello Thorsten,
On Fri, Sep 9, 2022 at 12:19 AM Thorsten Glaser <t.glaser@...ent.de> wrote:
> under high load, my homegrown qdisc causes a system crash,
> but I’m a bit baffled at the message and location. Perhaps
> anyone has directly an idea where I could have messed up?
>
> Transcription of the most relevant info from the screen photo:
>
> virt_to_cache: Object is not a Slab page!
> … at mm/slab.h:435 kmem_cache_free+…
>
> Call Trace:
> __rtnl_unlock+0x34/0x40
> netdev_run_todo+…
> rtnetlink_rcv_msg
> ? _copy_to_iter
> ? __free_one_page
> ? rtnl_calcit.isra.0
> netlink_rcv_skb
> netlink_unicast
> netlink_sendmsg
> sock_sendmsg
> ____sys_sendmsg
> […]
>
> The trace is followed by two…
>
> BUG: Bad rss-counter state mm:0000000001b817b09
> first one is type:MM_FILEPAGES val:81
> second one is type:MM_ANONPAGES val:30
>
>
> I guess I either messed up with pointers or locking, but I don’t
> have the Linux kernel coding experience to know where to even start
> looking for causes.
>
> Source in question is…
> https://github.com/tarent/sch_jens/blob/iproute2_5.10.0-4jens14/janz/sch_janz.c
> … though I don’t exactly ask for someone to solve this for me (though
> that would, obviously, also be welcome ☺) but to get to know enough
> for me to figure out the bug.
>
> I probably would start by adding lots of debugging printks, but the
> problem occurs when throwing iperf with 40 Mbit/s on this set to limit
> to 20 Mbit/s, which’d cause a lot of information — plus I don’t even
> know what kind of error “Object is not a Slab page” is (i.e. what wrong
> thing is passed where or written to where).
At first glance, this looks like some memory access issue. Try to
enable KASAN. Maybe it will be able to provide some more details about
a source of issue.
BTW, the stack backtrace contains only RTNL related functions. Does
this warning appear when trying to reconfigure the qdisc? If so, then
the error is probably somewhere in the qdisc configuration code. Such
code is much easier to debug with printk than the packet processing
path.
If you still need some tracing support, take a look at the kernel
tracing capabilities. See Documentation/trace/tracepoints.rst for
documentation and, for example, net/mac80211/trace.h for reference.
--
Sergey
Powered by blists - more mailing lists