lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <584ecb5e-b1fc-4b43-ba36-ad396d379fad@amd.com>
Date: Fri, 19 Jul 2024 11:46:09 +0530
From: Bharata B Rao <bharata@....com>
To: Mateusz Guzik <mjguzik@...il.com>
Cc: Vlastimil Babka <vbabka@...e.cz>, linux-mm@...ck.org,
 linux-kernel@...r.kernel.org, nikunj@....com,
 "Upadhyay, Neeraj" <Neeraj.Upadhyay@....com>,
 Andrew Morton <akpm@...ux-foundation.org>,
 David Hildenbrand <david@...hat.com>, willy@...radead.org,
 yuzhao@...gle.com, kinseyho@...gle.com, Mel Gorman <mgorman@...e.de>
Subject: Re: Hard and soft lockups with FIO and LTP runs on a large system

On 18-Jul-24 5:41 PM, Mateusz Guzik wrote:
> On Thu, Jul 18, 2024 at 11:00 AM Bharata B Rao <bharata@....com> wrote:
>>
>> On 17-Jul-24 4:59 PM, Mateusz Guzik wrote:
>>> As for clear_shadow_entry mentioned in the opening mail, the content is:
>>>           spin_lock(&mapping->host->i_lock);
>>>           xa_lock_irq(&mapping->i_pages);
>>>           __clear_shadow_entry(mapping, index, entry);
>>>           xa_unlock_irq(&mapping->i_pages);
>>>           if (mapping_shrinkable(mapping))
>>>                   inode_add_lru(mapping->host);
>>>           spin_unlock(&mapping->host->i_lock);
>>>
>>> so for all I know it's all about the xarray thing, not the i_lock per se.
>>
>> The soft lockup signature has _raw_spin_lock and not _raw_spin_lock_irq
>> and hence concluded it to be i_lock.
> 
> I'm not disputing it was i_lock. I am claiming that the i_pages is
> taken immediately after and it may be that in your workload this is
> the thing with the actual contention problem, making i_lock a red
> herring.
> 
> I tried to match up offsets to my own kernel binary, but things went haywire.
> 
> Can you please resolve a bunch of symbols, like this:
> ./scripts/faddr2line vmlinux clear_shadow_entry+92
> 
> and then paste the source code from reported lines? (I presume you are
> running with some local patches, so opening relevant files in my repo
> may still give bogus resutls)
> 
> Addresses are: clear_shadow_entry+92 __remove_mapping+98 __filemap_add_folio+332

clear_shadow_entry+92

$ ./scripts/faddr2line vmlinux clear_shadow_entry+92
clear_shadow_entry+92/0x180:
spin_lock_irq at include/linux/spinlock.h:376
(inlined by) clear_shadow_entry at mm/truncate.c:51

42 static void clear_shadow_entry(struct address_space *mapping,
43                                struct folio_batch *fbatch, pgoff_t 
*indices)
44 {
45         int i;
46
47         if (shmem_mapping(mapping) || dax_mapping(mapping))
48                 return;
49
50         spin_lock(&mapping->host->i_lock);
51         xa_lock_irq(&mapping->i_pages);


__remove_mapping+98

$ ./scripts/faddr2line vmlinux __remove_mapping+98
__remove_mapping+98/0x230:
spin_lock_irq at include/linux/spinlock.h:376
(inlined by) __remove_mapping at mm/vmscan.c:695

684 static int __remove_mapping(struct address_space *mapping, struct 
folio *folio,
685                             bool reclaimed, struct mem_cgroup 
*target_memcg)
686 {
687         int refcount;
688         void *shadow = NULL;
689
690         BUG_ON(!folio_test_locked(folio));
691         BUG_ON(mapping != folio_mapping(folio));
692
693         if (!folio_test_swapcache(folio))
694                 spin_lock(&mapping->host->i_lock);
695         xa_lock_irq(&mapping->i_pages);


__filemap_add_folio+332

$ ./scripts/faddr2line vmlinux __filemap_add_folio+332
__filemap_add_folio+332/0x480:
spin_lock_irq at include/linux/spinlock.h:377
(inlined by) __filemap_add_folio at mm/filemap.c:878

851 noinline int __filemap_add_folio(struct address_space *mapping,
852                 struct folio *folio, pgoff_t index, gfp_t gfp, void 
**shadowp)
853 {
854         XA_STATE(xas, &mapping->i_pages, index);
             ...
874         for (;;) {
875                 int order = -1, split_order = 0;
876                 void *entry, *old = NULL;
877
878                 xas_lock_irq(&xas);
879                 xas_for_each_conflict(&xas, entry) {

> 
> Most notably in __remove_mapping i_lock is conditional:
>          if (!folio_test_swapcache(folio))
>                  spin_lock(&mapping->host->i_lock);
>          xa_lock_irq(&mapping->i_pages);
> 
> and the disasm of the offset in my case does not match either acquire.
> For all I know i_lock in this routine is *not* taken and all the
> queued up __remove_mapping callers increase i_lock -> i_pages wait
> times in clear_shadow_entry.

So the first two are on i_pages lock and the last one is xa_lock.

> 
> To my cursory reading i_lock in clear_shadow_entry can be hacked away
> with some effort, but should this happen the contention is going to
> shift to i_pages presumably with more soft lockups (except on that
> lock). I am not convinced messing with it is justified. From looking
> at other places the i_lock is not a problem in other spots fwiw.
> 
> All that said even if it is i_lock in both cases *and* someone whacks
> it, the mm folk should look into what happens when (maybe i_lock ->)
> i_pages lock is held. To that end perhaps you could provide a
> flamegraph or output of perf record -a -g, I don't know what's
> preferred.

I have attached the flamegraph but this is for the kernel that has been 
running with all the accumulated fixes so far. The original one (w/o 
fixes) did show considerable time spent on 
native_queued_spin_lock_slowpath but unfortunately unable to locate it now.

Regards,
Bharata.
Download attachment "perf 1.svg" of type "image/svg+xml" (1215900 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ