[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200323215908.tkihojhshwn7kail@naota.dhcp.fujisawa.hgst.com>
Date: Tue, 24 Mar 2020 06:59:08 +0900
From: Naohiro Aota <naohiro.aota@....com>
To: "Darrick J. Wong" <darrick.wong@...cle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Qais Yousef <qais.yousef@....com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, Christoph Hellwig <hch@....de>
Subject: Re: lockdep warning in sys_swapon
On Mon, Mar 23, 2020 at 02:12:01PM -0700, Darrick J. Wong wrote:
>On Mon, Mar 23, 2020 at 02:06:15PM -0700, Andrew Morton wrote:
>> On Mon, 23 Mar 2020 15:30:45 +0000 Qais Yousef <qais.yousef@....com> wrote:
>>
>> > On 03/23/20 15:17, Qais Yousef wrote:
>> > > Hi
>> > >
>> > > I hit the following 2 warnings when running with LOCKDEP=y on arm64 platform
>> > > (juno-r2), running on v5.6-rc6
>> > >
>> > > The 1st one is when I execute `swapon -a`. The 2nd one happens at boot. I have
>> > > /dev/sda2 as my swap in /etc/fstab
>> > >
>> > > Note that I either hit 1 OR 2, but didn't hit both warnings at the same time,
>> > > yet at least.
>> > >
>> > > /dev/sda2 is a usb flash drive, in case it matters somehow.
>> >
>> > By the way, I noticed that in claim_swapfile() if we fail we don't release the
>> > lock. Shouldn't we release the lock here?
>> >
>> > I tried with that FWIW, but it had no effect on the warnings.
>> >
>>
>> I'll be sending the below into Linus this week.
>>
>> I was hoping to hear from Darrick/Christoph (?) but it looks like the
>> right thing to do. Are you able to test it?
>
>I had questions[1] about the patch, but nobody ever replied.
Sorry, I have overlooked that one. I'm replying here.
> Sorry to wander in late, but I don't see how we unlock the inode in the
> success case. Before this patch, the "if (inode) inode_unlock(inode);"
> below out: would take care of this for both the success case and the
> bad_swap case, but now that's gone, and AFAICT after this patch we only
> unlock the inode when erroring out...
>
> > +bad_swap_unlock_inode:
> > + inode_unlock(inode);
>
> ...since we never goto bad_swap_unlock_inode when error == 0, correct?
Actually, this patch does not touch "if (inode) inode_unlock(inode);" below
the "out:" label, so we still have that "inode_unlock" after this patch and
it unlocks the inode in the success case.
>
>--D
>
>[1] https://lore.kernel.org/linux-fsdevel/20200212164956.GK6874@magnolia/
>
>> I think I'll add a cc:stable to this one.
>>
>>
>>
>> From: Naohiro Aota <naohiro.aota@....com>
>> Subject: mm/swapfile.c: move inode_lock out of claim_swapfile
>>
>> claim_swapfile() currently keeps the inode locked when it is successful,
>> or the file is already swapfile (with -EBUSY). And, on the other error
>> cases, it does not lock the inode.
>>
>> This inconsistency of the lock state and return value is quite confusing
>> and actually causing a bad unlock balance as below in the "bad_swap"
>> section of __do_sys_swapon().
>>
>> This commit fixes this issue by moving the inode_lock() and IS_SWAPFILE
>> check out of claim_swapfile(). The inode is unlocked in
>> "bad_swap_unlock_inode" section, so that the inode is ensured to be
>> unlocked at "bad_swap". Thus, error handling codes after the locking now
>> jumps to "bad_swap_unlock_inode" instead of "bad_swap".
>>
>> =====================================
>> WARNING: bad unlock balance detected!
>> 5.5.0-rc7+ #176 Not tainted
>> -------------------------------------
>> swapon/4294 is trying to release lock (&sb->s_type->i_mutex_key) at:
>> [<ffffffff8173a6eb>] __do_sys_swapon+0x94b/0x3550
>> but there are no more locks to release!
>>
>> other info that might help us debug this:
>> no locks held by swapon/4294.
>>
>> stack backtrace:
>> CPU: 5 PID: 4294 Comm: swapon Not tainted 5.5.0-rc7-BTRFS-ZNS+ #176
>> Hardware name: ASUS All Series/H87-PRO, BIOS 2102 07/29/2014
>> Call Trace:
>> dump_stack+0xa1/0xea
>> ? __do_sys_swapon+0x94b/0x3550
>> print_unlock_imbalance_bug.cold+0x114/0x123
>> ? __do_sys_swapon+0x94b/0x3550
>> lock_release+0x562/0xed0
>> ? kvfree+0x31/0x40
>> ? lock_downgrade+0x770/0x770
>> ? kvfree+0x31/0x40
>> ? rcu_read_lock_sched_held+0xa1/0xd0
>> ? rcu_read_lock_bh_held+0xb0/0xb0
>> up_write+0x2d/0x490
>> ? kfree+0x293/0x2f0
>> __do_sys_swapon+0x94b/0x3550
>> ? putname+0xb0/0xf0
>> ? kmem_cache_free+0x2e7/0x370
>> ? do_sys_open+0x184/0x3e0
>> ? generic_max_swapfile_size+0x40/0x40
>> ? do_syscall_64+0x27/0x4b0
>> ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
>> ? lockdep_hardirqs_on+0x38c/0x590
>> __x64_sys_swapon+0x54/0x80
>> do_syscall_64+0xa4/0x4b0
>> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>> RIP: 0033:0x7f15da0a0dc7
>>
>> Link: http://lkml.kernel.org/r/20200206090132.154869-1-naohiro.aota@wdc.com
>> Fixes: 1638045c3677 ("mm: set S_SWAPFILE on blockdev swap devices")
>> Signed-off-by: Naohiro Aota <naohiro.aota@....com>
>> Reviewed-by: Andrew Morton <akpm@...ux-foundation.org>
>> Cc: Darrick J. Wong <darrick.wong@...cle.com>
>> Cc: Christoph Hellwig <hch@...radead.org>
>> Signed-off-by: Andrew Morton <akpm@...ux-foundation.org>
>> ---
>>
>> mm/swapfile.c | 41 ++++++++++++++++++++---------------------
>> 1 file changed, 20 insertions(+), 21 deletions(-)
>>
>> --- a/mm/swapfile.c~mm-swap-move-inode_lock-out-of-claim_swapfile
>> +++ a/mm/swapfile.c
>> @@ -2899,10 +2899,6 @@ static int claim_swapfile(struct swap_in
>> p->bdev = inode->i_sb->s_bdev;
>> }
>>
>> - inode_lock(inode);
>> - if (IS_SWAPFILE(inode))
>> - return -EBUSY;
>> -
>> return 0;
>> }
>>
>> @@ -3157,36 +3153,41 @@ SYSCALL_DEFINE2(swapon, const char __use
>> mapping = swap_file->f_mapping;
>> inode = mapping->host;
>>
>> - /* will take i_rwsem; */
>> error = claim_swapfile(p, inode);
>> if (unlikely(error))
>> goto bad_swap;
>>
>> + inode_lock(inode);
>> + if (IS_SWAPFILE(inode)) {
>> + error = -EBUSY;
>> + goto bad_swap_unlock_inode;
>> + }
>> +
>> /*
>> * Read the swap header.
>> */
>> if (!mapping->a_ops->readpage) {
>> error = -EINVAL;
>> - goto bad_swap;
>> + goto bad_swap_unlock_inode;
>> }
>> page = read_mapping_page(mapping, 0, swap_file);
>> if (IS_ERR(page)) {
>> error = PTR_ERR(page);
>> - goto bad_swap;
>> + goto bad_swap_unlock_inode;
>> }
>> swap_header = kmap(page);
>>
>> maxpages = read_swap_header(p, swap_header, inode);
>> if (unlikely(!maxpages)) {
>> error = -EINVAL;
>> - goto bad_swap;
>> + goto bad_swap_unlock_inode;
>> }
>>
>> /* OK, set up the swap map and apply the bad block list */
>> swap_map = vzalloc(maxpages);
>> if (!swap_map) {
>> error = -ENOMEM;
>> - goto bad_swap;
>> + goto bad_swap_unlock_inode;
>> }
>>
>> if (bdi_cap_stable_pages_required(inode_to_bdi(inode)))
>> @@ -3211,7 +3212,7 @@ SYSCALL_DEFINE2(swapon, const char __use
>> GFP_KERNEL);
>> if (!cluster_info) {
>> error = -ENOMEM;
>> - goto bad_swap;
>> + goto bad_swap_unlock_inode;
>> }
>>
>> for (ci = 0; ci < nr_cluster; ci++)
>> @@ -3220,7 +3221,7 @@ SYSCALL_DEFINE2(swapon, const char __use
>> p->percpu_cluster = alloc_percpu(struct percpu_cluster);
>> if (!p->percpu_cluster) {
>> error = -ENOMEM;
>> - goto bad_swap;
>> + goto bad_swap_unlock_inode;
>> }
>> for_each_possible_cpu(cpu) {
>> struct percpu_cluster *cluster;
>> @@ -3234,13 +3235,13 @@ SYSCALL_DEFINE2(swapon, const char __use
>>
>> error = swap_cgroup_swapon(p->type, maxpages);
>> if (error)
>> - goto bad_swap;
>> + goto bad_swap_unlock_inode;
>>
>> nr_extents = setup_swap_map_and_extents(p, swap_header, swap_map,
>> cluster_info, maxpages, &span);
>> if (unlikely(nr_extents < 0)) {
>> error = nr_extents;
>> - goto bad_swap;
>> + goto bad_swap_unlock_inode;
>> }
>> /* frontswap enabled? set up bit-per-page map for frontswap */
>> if (IS_ENABLED(CONFIG_FRONTSWAP))
>> @@ -3280,7 +3281,7 @@ SYSCALL_DEFINE2(swapon, const char __use
>>
>> error = init_swap_address_space(p->type, maxpages);
>> if (error)
>> - goto bad_swap;
>> + goto bad_swap_unlock_inode;
>>
>> /*
>> * Flush any pending IO and dirty mappings before we start using this
>> @@ -3290,7 +3291,7 @@ SYSCALL_DEFINE2(swapon, const char __use
>> error = inode_drain_writes(inode);
>> if (error) {
>> inode->i_flags &= ~S_SWAPFILE;
>> - goto bad_swap;
>> + goto bad_swap_unlock_inode;
>> }
>>
>> mutex_lock(&swapon_mutex);
>> @@ -3315,6 +3316,8 @@ SYSCALL_DEFINE2(swapon, const char __use
>>
>> error = 0;
>> goto out;
>> +bad_swap_unlock_inode:
>> + inode_unlock(inode);
>> bad_swap:
>> free_percpu(p->percpu_cluster);
>> p->percpu_cluster = NULL;
>> @@ -3322,6 +3325,7 @@ bad_swap:
>> set_blocksize(p->bdev, p->old_block_size);
>> blkdev_put(p->bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL);
>> }
>> + inode = NULL;
>> destroy_swap_extents(p);
>> swap_cgroup_swapoff(p->type);
>> spin_lock(&swap_lock);
>> @@ -3333,13 +3337,8 @@ bad_swap:
>> kvfree(frontswap_map);
>> if (inced_nr_rotate_swap)
>> atomic_dec(&nr_rotate_swap);
>> - if (swap_file) {
>> - if (inode) {
>> - inode_unlock(inode);
>> - inode = NULL;
>> - }
>> + if (swap_file)
>> filp_close(swap_file, NULL);
>> - }
>> out:
>> if (page && !IS_ERR(page)) {
>> kunmap(page);
>> _
>>
Powered by blists - more mailing lists