lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHNKnsQGwV9Z9dSrKusLV7qE+Xw_4eqEDtHKTVJxuuy6H+pWRA@mail.gmail.com>
Date:   Sun, 11 Sep 2022 02:38:38 +0300
From:   Sergey Ryazanov <ryazanov.s.a@...il.com>
To:     Thorsten Glaser <t.glaser@...ent.de>
Cc:     netdev@...r.kernel.org
Subject: Re: RFH, where did I go wrong?

Hello Thorsten,

On Fri, Sep 9, 2022 at 12:19 AM Thorsten Glaser <t.glaser@...ent.de> wrote:
> under high load, my homegrown qdisc causes a system crash,
> but I’m a bit baffled at the message and location. Perhaps
> anyone has directly an idea where I could have messed up?
>
> Transcription of the most relevant info from the screen photo:
>
> virt_to_cache: Object is not a Slab page!
> … at mm/slab.h:435 kmem_cache_free+…
>
> Call Trace:
> __rtnl_unlock+0x34/0x40
> netdev_run_todo+…
> rtnetlink_rcv_msg
> ? _copy_to_iter
> ? __free_one_page
> ? rtnl_calcit.isra.0
> netlink_rcv_skb
> netlink_unicast
> netlink_sendmsg
> sock_sendmsg
> ____sys_sendmsg
> […]
>
> The trace is followed by two…
>
> BUG: Bad rss-counter state mm:0000000001b817b09
> first one is type:MM_FILEPAGES val:81
> second one is type:MM_ANONPAGES val:30
>
>
> I guess I either messed up with pointers or locking, but I don’t
> have the Linux kernel coding experience to know where to even start
> looking for causes.
>
> Source in question is…
> https://github.com/tarent/sch_jens/blob/iproute2_5.10.0-4jens14/janz/sch_janz.c
> … though I don’t exactly ask for someone to solve this for me (though
> that would, obviously, also be welcome ☺) but to get to know enough
> for me to figure out the bug.
>
> I probably would start by adding lots of debugging printks, but the
> problem occurs when throwing iperf with 40 Mbit/s on this set to limit
> to 20 Mbit/s, which’d cause a lot of information — plus I don’t even
> know what kind of error “Object is not a Slab page” is (i.e. what wrong
> thing is passed where or written to where).

At first glance, this looks like some memory access issue. Try to
enable KASAN. Maybe it will be able to provide some more details about
a source of issue.

BTW, the stack backtrace contains only RTNL related functions. Does
this warning appear when trying to reconfigure the qdisc? If so, then
the error is probably somewhere in the qdisc configuration code. Such
code is much easier to debug with printk than the packet processing
path.

If you still need some tracing support, take a look at the kernel
tracing capabilities. See Documentation/trace/tracepoints.rst for
documentation and, for example, net/mac80211/trace.h for reference.

-- 
Sergey

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ