linux-kernel - Re: 3.14-rc2 XFS backtrace because irqs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <52FB9A01.8060601@sandeen.net>
Date:	Wed, 12 Feb 2014 09:57:53 -0600
From:	Eric Sandeen <sandeen@...deen.net>
To:	Dave Chinner <david@...morbit.com>, Dave Jones <davej@...hat.com>,
	Al Viro <viro@...IV.linux.org.uk>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Linux Kernel <linux-kernel@...r.kernel.org>, xfs@....sgi.com
Subject: Re: 3.14-rc2 XFS backtrace because irqs_disabled.

On 2/12/14, 12:10 AM, Dave Chinner wrote:
> On Wed, Feb 12, 2014 at 12:50:27AM -0500, Dave Jones wrote:
>> On Wed, Feb 12, 2014 at 04:40:43PM +1100, Dave Chinner wrote:
>>
>>  > None of the XFS code disables interrupts in that path, not does is
>>  > call outside XFS except to dispatch IO. The stack is pretty deep at
>>  > this point and I know that the standard (non stacked) IO stack can
>>  > consume >3kb of stack space when it gets down to having to do memory
>>  > reclaim during GFP_NOIO allocation at the lowest level of SCSI
>>  > drivers. Stack overruns typically show up with symptoms like we are
>>  > seeing.
>>  > ..
>>  > 
>>  > Dave, before chasing ghosts, can you (like Eric originally asked)
>>  > turn on stack overrun detection?
>>
>> CONFIG_DEBUG_STACKOVERFLOW ? Already turned on.
> 
> That only checks stack usage when an interrupt is taken. If no
> interrupts are taken when stack usage is within 128 bytes of
> overflow, then it doesn't catch it.
> 
> I tend to use CONFIG_DEBUG_STACK_USAGE=y as it records the maximum
> stack usage of a process via canary overwrites and it records it in
> do_exit(). I also use the stack tracer to record the largest stack
> usage seen so I know exactly what code paths are approaching stack
> overruns...
> 
> Cheers,
> 
> Dave.
> 


I'm not sure if I'm off base here, but maybe this would make sense: check
for a corrupted stack in __might_sleep.  Compile tested only,
possibly inelegant, and/or completely wrong, but:


From: Eric Sandeen <sandeen@...hat.com>

sched: Test for corrupted task_struct in __might_sleep

If a thread overruns the stack, it may corrupt the task_struct,
leading to false positives on tests like irqs_disabled().

Warn if this seems to be the case.

Signed-off-by: Eric Sandeen <sandeen@...hat.com>
---

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b46131e..6920c3c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6934,6 +6934,8 @@ static inline int preempt_count_equals(int preempt_offset)
 
 void __might_sleep(const char *file, int line, int preempt_offset)
 {
+	struct task_struct *tsk = current;
+	unsigned long *stackend;
 	static unsigned long prev_jiffy;	/* ratelimiting */
 
 	rcu_sleep_check(); /* WARN_ON_ONCE() by default, no rate limit reqd. */
@@ -6952,6 +6954,11 @@ void __might_sleep(const char *file, int line, int preempt_offset)
 			in_atomic(), irqs_disabled(),
 			current->pid, current->comm);
 
+	/* A corrupted stack can cause a false positive on irqs_disabled etc */
+	stackend = end_of_stack(tsk);
+	if (tsk != &init_task && *stackend != STACK_END_MAGIC)
+		printk(KERN_EMERG "Thread overran stack, or stack corrupted\n");
+	
 	debug_show_held_locks(current);
 	if (irqs_disabled())
 		print_irqtrace_events(current);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/