linux-kernel - Re: Filesystem lockup with CONFIG_PREEMPT

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANGgnMYCFVQ7aR2Qjo+c5B=Jyi3QfyhAHd6y3imSN_7a3Y4Ekg@mail.gmail.com>
Date:	Wed, 21 May 2014 14:59:30 -0700
From:	Austin Schuh <austin@...oton-tech.com>
To:	John Blackwood <john.blackwood@...r.com>
Cc:	Richard Weinberger <richard.weinberger@...il.com>,
	linux-kernel@...r.kernel.org, xfs <xfs@....sgi.com>,
	linux-rt-users@...r.kernel.org
Subject: Re: Filesystem lockup with CONFIG_PREEMPT_RT

On Wed, May 21, 2014 at 12:30 PM, John Blackwood
<john.blackwood@...r.com> wrote:
>> Date: Wed, 21 May 2014 03:33:49 -0400
>> From: Richard Weinberger <richard.weinberger@...il.com>
>> To: Austin Schuh <austin@...oton-tech.com>
>> CC: LKML <linux-kernel@...r.kernel.org>, xfs <xfs@....sgi.com>, rt-users
>>       <linux-rt-users@...r.kernel.org>
>> Subject: Re: Filesystem lockup with CONFIG_PREEMPT_RT
>
>>
>> CC'ing RT folks
>>
>> On Wed, May 21, 2014 at 8:23 AM, Austin Schuh <austin@...oton-tech.com>
>> wrote:
>> > > On Tue, May 13, 2014 at 7:29 PM, Austin Schuh
>> > > <austin@...oton-tech.com> wrote:
>> >> >> Hi,
>> >> >>
>> >> >> I am observing a filesystem lockup with XFS on a CONFIG_PREEMPT_RT
>> >> >> patched kernel.  I have currently only triggered it using dpkg.
>> >> >> Dave
>> >> >> Chinner on the XFS mailing list suggested that it was a rt-kernel
>> >> >> workqueue issue as opposed to a XFS problem after looking at the
>> >> >> kernel messages.
>> >> >>
>> >> >> The only modification to the kernel besides the RT patch is that I
>> >> >> have applied tglx's "genirq: Sanitize spurious interrupt detection
>> >> >> of
>> >> >> threaded irqs" patch.
>> > >
>> > > I upgraded to 3.14.3-rt4, and the problem still persists.
>> > >
>> > > I turned on event tracing and tracked it down further.  I'm able to
>> > > lock it up by scping a new kernel debian package to /tmp/ on the
>> > > machine.  scp is locking the inode, and then scheduling
>> > > xfs_bmapi_allocate_worker in the work queue.  The work then never gets
>> > > run.  The kworkers then lock up waiting for the inode lock.
>> > >
>> > > Here are the relevant events from the trace.  ffff8803e9f10288
>> > > (blk_delay_work) gets run later on in the trace, but ffff8803b4c158d0
>> > > (xfs_bmapi_allocate_worker) never does.  The kernel then warns about
>> > > blocked tasks 120 seconds later.
>
> Austin and Richard,
>
> I'm not 100% sure that the patch below will fix your problem, but we
> saw something that sounds pretty familiar to your issue involving the
> nvidia driver and the preempt-rt patch.  The nvidia driver uses the
> completion support to create their own driver's notion of an internally
> used semaphore.
>
> Some tasks were failing to ever wakeup from wait_for_completion() calls
> due to a race in the underlying do_wait_for_common() routine.

Hi John,

Thanks for the suggestion and patch.  The issue is that the work never
gets run, not that the work finishes but the waiter never gets woken.
I applied it anyways to see if it helps, but I still get the lockup.

Thanks,
    Austin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/