linux-kernel - Re: [Ocfs2-devel] [PATCH v3] ocfs2/journal: fix umount hang after flushing journal failure

Open Source and information security mailing list archives

Message-ID: <8a3b9067-2b3b-6a34-dcd7-ffe961ef061a@gmail.com>
Date:   Mon, 16 Jan 2017 08:50:01 +0800
From:   Joseph Qi <jiangqi903@...il.com>
To:     Eric Ren <zren@...e.com>, Changwei Ge <ge.changwei@....com>,
        "ocfs2-devel@....oracle.com" <ocfs2-devel@....oracle.com>
Cc:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [Ocfs2-devel] [PATCH v3] ocfs2/journal: fix umount hang after
 flushing journal failure


On 17/1/13 20:37, Eric Ren wrote:
> On 01/13/2017 10:52 AM, Changwei Ge wrote:
>> Hi Joseph,
>>
>> Do you think my last version of patch to fix umount hang after journal
>> flushing failure is OK?
>>
>> If so, I 'd like to ask Andrew's help to merge this patch into his test
>> tree.
>>
>>
>> Thanks,
>>
>> Br.
>>
>> Changwei
>
> The message above should not occur in a formal patch.  It should be 
> put in "cover-letter" if
> you want to say something to the other developers. See "git 
> format-patch --cover-letter".
>
>>
>>
>>
>>  From 686b52ee2f06395c53e36e2c7515c276dc7541fb Mon Sep 17 00:00:00 2001
>> From: Changwei Ge <ge.changwei@....com>
>> Date: Wed, 11 Jan 2017 09:05:35 +0800
>> Subject: [PATCH] fix umount hang after journal flushing failure
>
> The commit message is needed here! It should describe what's your 
> problem, how to reproduce it,
> and what's your solution, things like that.
>
>>
>> Signed-off-by: Changwei Ge <ge.changwei@....com>
>> ---
>>   fs/ocfs2/journal.c |   18 ++++++++++++++++++
>>   1 file changed, 18 insertions(+)
>>
>> diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
>> index a244f14..5f3c862 100644
>> --- a/fs/ocfs2/journal.c
>> +++ b/fs/ocfs2/journal.c
>> @@ -2315,6 +2315,24 @@ static int ocfs2_commit_thread(void *arg)
>>                               "commit_thread: %u transactions pending 
>> on "
>>                               "shutdown\n",
>> atomic_read(&journal->j_num_trans));
>> +
>> +                       if (status < 0) {
>> +                               mlog(ML_ERROR, "journal is already abort
>> and cannot be "
>> +                                        "flushed any more. So ignore
>> the pending "
>> +                                        "transactions to avoid blocking
>> ocfs2 unmount.\n");
>
> Can you find any example in the kernel source to print out message 
> like that?!
>
> I saw Joseph showed you the right way in previous email:
> "
>
> if (status < 0) {
>
>      mlog(ML_ERROR, "journal is already abort and cannot be "
>
>              "flushed any more. So ignore the pending "
>
>              "transactions to avoid blocking ocfs2 unmount.\n");
>
> "
> So, please be careful and learn from the kernel source and the right 
> way other developers do in
> their patch work. Otherwise, it's meaningless to waste others' time in 
> such basic issues.
>
>> +                               /*
>> +                                * This may a litte hacky, however, no
>> chance
>> +                                * for ocfs2/journal to decrease this
>> variable
>> +                                * thourgh commit-thread. I have to 
>> do so to
>> +                                * avoid umount hang after journal 
>> flushing
>> +                                * failure. Since jounral has been
>> marked ABORT
>> +                                * within jbd2_journal_flush, commit
>> cache will
>> +                                * never do any real work to flush
>> journal to
>> +                                * disk.Set it to ZERO so that umount 
>> will
>> +                                * continue during shutting down journal
>> +                                */
>> + atomic_set(&journal->j_num_trans, 0);
> It's possible to corrupt data doing this way. Why not just crash the 
> kernel when jbd2 aborts?
> and let the other node to do the journal recovery. It's the strength 
> of cluster filesystem.
We shouldn't crash kernel directly, which will enlarge the impact of the
issue. For example, we have mount multiple volumes and only one has this
error occurred.
But I do agree with you that we have to let other nodes know the
abnormal exit and do the recovery, which can ensure the data
consistency.

Thanks,
Joseph
>
> Anyway, it's glad to see you guys making contributions!
>
> Thanks,
> Eric
>
>
>> +                       }
>>                  }
>>          }
>>
>> -- 
>> 1.7.9.5
>>
>> ------------------------------------------------------------------------------------------------------------------------------------- 
>>
>> 本邮件及其附件含有杭州华三通信技术有限公司的保密信息，仅限于发送给上面地址中列出 
>>
>> 的个人或群组。禁止任何其他人以任何形式使用（包括但不限于全部或部分地泄露、复制、 
>>
>> 或散发）本邮件中的信息。如果您错收了本邮件，请您立即电话或邮件通知发件人并删除本 
>>
>> 邮件！
>> This e-mail and its attachments contain confidential information from 
>> H3C, which is
>> intended only for the person or entity whose address is listed above. 
>> Any use of the
>> information contained herein in any way (including, but not limited 
>> to, total or partial
>> disclosure, reproduction, or dissemination) by persons other than the 
>> intended
>> recipient(s) is prohibited. If you receive this e-mail in error, 
>> please notify the sender
>> by phone or email immediately and delete it!
>> _______________________________________________
>> Ocfs2-devel mailing list
>> Ocfs2-devel@....oracle.com
>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>
>

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives