[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <363DA0ED52042842948283D2FC38E4649C3135C3@IRSMSX106.ger.corp.intel.com>
Date: Tue, 6 Nov 2018 17:14:58 +0000
From: "Hunter, Adrian" <adrian.hunter@...el.com>
To: "Theodore Y. Ts'o" <tytso@....edu>, Jan Kara <jack@...e.cz>
CC: "linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>
Subject: RE: [PATCH] jbd2: Avoid long hold times of j_state_lock while
committing a transaction
> -----Original Message-----
> From: Theodore Y. Ts'o [mailto:tytso@....edu]
> Sent: Tuesday, November 6, 2018 6:48 PM
> To: Jan Kara <jack@...e.cz>
> Cc: linux-ext4@...r.kernel.org; Hunter, Adrian <adrian.hunter@...el.com>
> Subject: Re: [PATCH] jbd2: Avoid long hold times of j_state_lock while
> committing a transaction
>
> On Tue, Nov 06, 2018 at 11:22:30AM +0100, Jan Kara wrote:
> >> So the buffer is on BJ_Shadow list while the assertion in
> > jbd2_journal_dirty_metadata() expects it to be in BJ_Metadata list.
> > This is really weird as we have also checked that jh->b_transaction ==
> > handle->h_transaction so the transaction couldn't have passed to
> > handle->commit
> > phase... Oh, I see, the code in start_this_handle() got racy with the
> > removal of j_state_lock protection from journal_commit_transaction()
> > so now transaction can start even though there are handles
> > outstanding! I'll think about the best solution for this. Thanks for report!
>
> Thanks for the analysis! I finished the bisection last night and it was too late
> for me to dive into how this was going on. I should have realized this before I
> had suggested the approach in the patch.
>
> The original complaint which Andrian made was that the long hold times of
> j_state_lock at the beginning of the commit. What he didn't mention was
> what the other "high priority tasks" were blocked on,
The high priority task does not use any file system I/O.
It was not able to run because the CPU had preemption disabled, because
J_state_lock is a spinning lock.
Have you considered using a non-spinning lock instead? E.g. rw_semaphore
> but they were almost
> certainly start_this_handle. And that's fundamental; when we are trying to
> at the beginning of the commit process is waiting for the outstanding handles
> to close; and so we can't let new handles start.
>
> What we can do is to try to decrease the handle hold times. This is why we
> track the handle type and we have the jbd2_handle_stats tracepoint. If we
> can find that handles of a particular type and a particular line number are the
> ones which are taking more time than other handles, we can try to make
> them run faster; for example, by pre-reading blocks aren't in the buffer
> cache before starting the handle.
>
> The other thing which we can probably do is to for truncate handles, if we
> notice that current_transaction->t_state == T_LOCKED, we can suspend the
> truncation activity and close the handle, and then resume it (which will block
> until a new transaction is ready to be started).
>
> - Ted
Powered by blists - more mailing lists