lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Fri, 17 Nov 2023 12:09:37 +0800
From:   ChenXiaoSong <chenxiaosongemail@...mail.com>
To:     Trond Myklebust <trondmy@...merspace.com>,
        "gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>
Cc:     "linux-nfs@...r.kernel.org" <linux-nfs@...r.kernel.org>,
        "chenxiaosong@...inos.cn" <chenxiaosong@...inos.cn>,
        "stable@...r.kernel.org" <stable@...r.kernel.org>,
        "huangjinhui@...inos.cn" <huangjinhui@...inos.cn>,
        "liuzhengyuan@...inos.cn" <liuzhengyuan@...inos.cn>,
        "liuyun01@...inos.cn" <liuyun01@...inos.cn>,
        "huhai@...inos.cn" <huhai@...inos.cn>,
        "sashal@...nel.org" <sashal@...nel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "Anna.Schumaker@...app.com" <Anna.Schumaker@...app.com>
Subject: Re: Question about LTS 4.19 patch "89047634f5ce NFS: Don't interrupt
 file writeout due to fatal errors"

On 2023/10/30 22:56, Trond Myklebust wrote:
> A refactoring is by definition a change that does not affect code
> behaviour. It is obvious that this was never intended to be such a
> patch.
>
> The reason that the bug is occurring in 4.19.x, and not in the latest
> kernels, is because the former is missing another bugfix (one which
> actually is missing a "Fixes:" tag).
>
> Can you therefore please check if applying commit 22876f540bdf ("NFS:
> Don't call generic_error_remove_page() while holding locks") fixes the
> issue.
>
> Note that the latter patch is needed in any case in order to fix a read
> deadlock (as indicated on the label).
>
> Thanks,
>    Trond

Sorry, the previous email had formatting issues. I'll resend it.


After applying commit 22876f540bdf ("NFS: Don't call 
generic_error_remove_page() while holding locks"), I encountered an 
issue of infinite loop:

write
   ...
   nfs_updatepage
     nfs_writepage_setup
       nfs_setup_write_request
         nfs_try_to_update_request
           nfs_wb_page
             if (clear_page_dirty_for_io(page)) // true
             nfs_writepage_locked // return 0
               nfs_do_writepage // return 0
                 nfs_page_async_flush // return 0
                   nfs_error_is_fatal_on_server
                   nfs_write_error_remove_page
                     SetPageError // instead of generic_error_remove_page
             // loop begin
             if (clear_page_dirty_for_io(page)) // false
             if (!PagePrivate(page)) // false
             ret = nfs_commit_inode = 0
             // loop again, never quit


before applying commit 22876f540bdf ("NFS: Don't call 
generic_error_remove_page() while holding locks"), 
generic_error_remove_page() will clear PG_private, and infinite loop 
will never happen:

generic_error_remove_page
   truncate_inode_page
     truncate_cleanup_page
       do_invalidatepage
         nfs_invalidate_page
           nfs_wb_page_cancel
             nfs_inode_remove_request
               ClearPagePrivate(head->wb_page)


If applying this patch, are other patches required? And I cannot 
reproducethe read deadlock bug that the patch want to fix, are there 
specific conditions required to reproduce this read deadlock bug?


Powered by blists - more mailing lists