linux-kernel - Re: [Cluster-devel] [syzbot] WARNING in __set_page

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <2b8e94f7a401fc98a7fce3daaa9fd63e7324e426.camel@redhat.com>
Date:   Thu, 22 Jul 2021 14:52:02 +0100
From:   Steven Whitehouse <swhiteho@...hat.com>
To:     Bob Peterson <rpeterso@...hat.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        syzbot <syzbot+0d5b462a6f07447991b3@...kaller.appspotmail.com>
Cc:     cluster-devel@...hat.com, linux-mm@...ck.org,
        syzkaller-bugs@...glegroups.com, linux-kernel@...r.kernel.org
Subject: Re: [Cluster-devel] [syzbot] WARNING in __set_page_dirty

Hi,

On Thu, 2021-07-22 at 08:16 -0500, Bob Peterson wrote:
> On 7/21/21 4:58 PM, Andrew Morton wrote:
> > (cc gfs2 maintainers)
> > 
> > On Tue, 20 Jul 2021 19:07:25 -0700 syzbot <
> > syzbot+0d5b462a6f07447991b3@...kaller.appspotmail.com> wrote:
> > 
> > > Hello,
> > > 
> > > syzbot found the following issue on:
> > > 
> > > HEAD commit:    d936eb238744 Revert "Makefile: Enable -Wimplicit-
> > > fallthrou..
> > > git tree:       upstream
> > > console output: 
> > > https://syzkaller.appspot.com/x/log.txt?x=1512834a300000
> > > kernel config:  
> > > https://syzkaller.appspot.com/x/.config?x=f1b998c1afc13578
> > > dashboard link: 
> > > https://syzkaller.appspot.com/bug?extid=0d5b462a6f07447991b3
> > > userspace arch: i386
> > > 
> > > Unfortunately, I don't have any reproducer for this issue yet.
> > > 
> > > IMPORTANT: if you fix the issue, please add the following tag to
> > > the commit:
> > > Reported-by: 
> > > syzbot+0d5b462a6f07447991b3@...kaller.appspotmail.com
> > > 
> > > ------------[ cut here ]------------
> > > WARNING: CPU: 0 PID: 8696 at include/linux/backing-dev.h:283
> > > inode_to_wb include/linux/backing-dev.h:283 [inline]
> > > WARNING: CPU: 0 PID: 8696 at include/linux/backing-dev.h:283
> > > account_page_dirtied mm/page-writeback.c:2435 [inline]
> > > WARNING: CPU: 0 PID: 8696 at include/linux/backing-dev.h:283
> > > __set_page_dirty+0xace/0x1070 mm/page-writeback.c:2483
> >  
> 
> Okay, sorry for the brain fart earlier. After taking a better look, I
> know exactly what this is.
> This goes back to this discussion from April 2018:
> 
> https://listman.redhat.com/archives/cluster-devel/2018-April/msg00017.html
> 
> in which Jan Kara pointed out that:
> 
> "The problem is we really do expect mapping->host->i_mapping ==
> mapping as
> we pass mapping and inode interchangebly in the mm code. The
> address_space
> and inodes are separate structures because you can have many inodes
> pointing to one address space (block devices). However it is not
> allowed
> for several address_spaces to point to one inode!"
> The problem is that GFS2 keeps separate address spaces for its
> glocks, and they
> don't correspond 1:1 to any inode. So mapping->host is not really an
> inode for these,
> and there's really almost no relation between the glock->mapping and
> the inode it
> points to.
> 
> Even in the recent past, GFS2 did this for all metadata for both its
> media-backed glocks:
> resource groups and inodes.
> 
> I recently posted a patch set to cluster-devel ("gfs2: replace
> sd_aspace with sd_inode" -
> https://listman.redhat.com/archives/cluster-devel/2021-July/msg00066.html) in
> which
> I fixed half the problem, which is the resource group case.
> 
> Unfortunately, for inode glocks it gets a lot trickier and I haven't
> found a proper solution.
> But as I said, it's been a known issue for several years now. The
> errors only appear
> if LOCKDEP is turned on. It would be ideal if address spaces were
> treated as fully
> independent from their inodes, but no one seemed to jump on that
> idea, nor even try to
> explain why we make the assumptions Jan Kara pointed out.
> 
> In the meantime, I'll keep looking for a more proper solution. This
> won't be an easy
> thing to fix or I would have already fixed it.
> 
> Regards,
> 
> Bob Peterson
> 
> 

The reason for having address_spaces pointed to by many inodes is to
allow for stackable filesytems so that you can make the file content
available on the upper layer by just pointing the upper layer inode at
the lower layer address_space. That is presumably what Jan is thinking
of.

This however seems to be an issue with a page flag, so it isn't clear
why that would relate to the address_space? If the page is metadata
which would be the most usual case for something being unpinned, then
that page should definitely be up to date.

Looking back at the earlier rgrp fix mentioned above, the fix is not
unreasonable since there only needs to be a single inode to contain all
the rgrps. For the inode metadata that is not the case, there is a one
to one mapping between inodes and metadata address_spaces, and if the
working assumption is that multiple address_spaces per inode is not
allowed, then I think that has changed over time. I'm pretty sure that
I had checked the expectations way back when we adopted this solution,
and that there were no issues with the multiple address_spaces per
inode case. We definitely don't want to go back to adding an additional
struct inode structure (which does nothing except take up space!) to
each "real" inode in cache, because it is a big overhead in case of a
filesystem with many small files.

Still if this is only a lockdep issue, then we likely have some time to
figure out a good long term solution,

Steve.