lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <19623.48074.873182.970865@fisica.ufpr.br>
Date:	Sat, 2 Oct 2010 20:10:02 -0300
From:	Carlos Carvalho <carlos@...ica.ufpr.br>
To:	Dave Chinner <david@...morbit.com>
Cc:	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/17] fs: Inode cache scalability

We have serious problems with 34.6 in a machine with ~11TiB xfs, with
a lot of simultaneous IO, particularly hundreds of rm and a sync
afterwards. Maybe they're related to these issues.

The machine is a file server (almost all via http/apache) and has
several thousand connections all the time. It behaves quite well for
at most 4 days; from then on kswapd's start appearing on the display
of top consuming ever increasing percentages of cpu. This is no
problem, the machine has 16 nearly idle cores. However, after about
5-7 days there's an abrupt transition: in about 30s the load goes to
several thousand, apache shows up consuming all possible cpu and
downloads nearly stop. I have to reboot the machine to get service
back. It manages to unmount the filesystems and reboot properly.

Stopping/restarting apache restores the situation but only for
a short while; after about 2-3h the problem reappears. That's why I
have to reboot.

With 35.6 the behaviour seems to have changed: now often
CONFIG_DETECT_HUNG_TASK produces this kind of call trace in the log:

[<ffffffff81098578>] ? igrab+0x10/0x30
[<ffffffff811160fe>] ? xfs_sync_inode_valid+0x4c/0x76
[<ffffffff81116241>] ? xfs_sync_inode_data+0x1b/0xa8
[<ffffffff811163e0>] ? xfs_inode_ag_walk+0x96/0xe4
[<ffffffff811163dd>] ? xfs_inode_ag_walk+0x93/0xe4
[<ffffffff81116226>] ? xfs_sync_inode_data+0x0/0xa8
[<ffffffff81116495>] ? xfs_inode_ag_iterator+0x67/0xc4
[<ffffffff81116226>] ? xfs_sync_inode_data+0x0/0xa8
[<ffffffff810a48dd>] ? sync_one_sb+0x0/0x1e
[<ffffffff81116712>] ? xfs_sync_data+0x22/0x42
[<ffffffff810a48dd>] ? sync_one_sb+0x0/0x1e
[<ffffffff8111678b>] ? xfs_quiesce_data+0x2b/0x94
[<ffffffff81113f03>] ? xfs_fs_sync_fs+0x2d/0xd7
[<ffffffff810a48dd>] ? sync_one_sb+0x0/0x1e
[<ffffffff810a48c4>] ? __sync_filesystem+0x62/0x7b
[<ffffffff8108993e>] ? iterate_supers+0x60/0x9d
[<ffffffff810a493a>] ? sys_sync+0x3f/0x53
[<ffffffff81001dab>] ? system_call_fastpath+0x16/0x1b

It doesn't seem to cause service disruption (at least the flux graphs
don't show drops). I didn't see it happen while I was watching so it
may be that service degrades for short intervals. Uptime with 35.6 is
only 3d8h so it's still not sure that the breakdown of 34.6 is gone
but kswapd's cpu usages are very small, less than with 34.6 for a
similar uptime. There are only 2 filesystems, and the big one has 256
AGs. They're not mounted with delaylog.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ