netdev - Re: Kernel Panics in the network stack

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-Id: <1261482533.29570.31.camel@pc1117.cambridge.arm.com>
Date:	Tue, 22 Dec 2009 11:48:53 +0000
From:	Catalin Marinas <catalin.marinas@....com>
To:	Russell King - ARM Linux <linux@....linux.org.uk>
Cc:	Eric Dumazet <eric.dumazet@...il.com>,
	Kevin Constantine <kevin.constantine@...il.com>,
	netdev@...r.kernel.org,
	linux kernel <linux-kernel@...r.kernel.org>,
	Rusty Russell <rusty@...tcorp.com.au>
Subject: Re: Kernel Panics in the network stack

On Tue, 2009-12-22 at 11:25 +0000, Russell King - ARM Linux wrote:
> On Tue, Dec 22, 2009 at 11:08:25AM +0000, Catalin Marinas wrote:
> > On Tue, 2009-12-22 at 10:09 +0000, Eric Dumazet wrote:
> > > I found an old commit mentioning a problem with LDM instruction that
> > > could be interrupted/ restarted with a base register already changed
> > > -> we load registers with garbage.
> > [...]
> > > If the low interrupt latency mode is enabled for the CPU (from ARMv6
> > > onwards), the ldm/stm instructions are no longer atomic. An ldm instruction
> > > restoring the sp and pc registers can be interrupted immediately after sp
> > > was updated but before the pc. If this happens, the CPU restores the base
> > > register to the value before the ldm instruction but if the base register
> > > is not sp, the interrupt routine will corrupt the stack and the restarted
> > > ldm instruction will load garbage.
> > [...]
> > > I found one instance of LDM instruction in 2.6.30 that could have same problem :
> > >
> > > __switch_to:
> > >
> > > ...
> > >         ldm r4, {r4, r5, r6, r7, r8, r9, sl, fp, sp, pc}
> >
> > It looks to me like it is possible to get an interrupt after SP was
> > loaded but before PC, the stack could be corrupted and PC would be
> > loaded with garbage. One instance of your oops messages looks like PC
> > corruption but the other may be caused by something else. What ARM CPU
> > are you using?
> >
> > I'm cc'ing Russell as well, it's strange that we haven't got any issue
> > with this so far.
> 
> We don't see the issue because we explicitly disable low latency
> interrupt mode.

I think there are some processors where this is always on (but I think
the no-MMU ones).

But looking at this again, I don't think it actually matters since R4
doesn't point to the current stack but to the cpu_context in
thread_info. Even if interrupt occurs after SP was loaded and before PC,
it doesn't corrupt the thread_info structure and what the LDM re-reads.
> 
> > You could try #undef'ing __ARCH_WANT_INTERRUPTS_ON_CTXSW in
> > arch/arm/include/asm/system.h as a sanity check for your aborts.
> 
> Unfortunately, we can't do that for older ARM architectures without
> severely impacting the interrupt latency there.  Not only that, but
> the interrupt latency will be increased during any context switch.

I didn't say we should have this all the time, just as a check for
Eric's problem. But I don't think it's even needed.

-- 
Catalin

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html