[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090621004919.GA6798@mit.edu>
Date: Sat, 20 Jun 2009 20:49:19 -0400
From: Theodore Tso <tytso@....edu>
To: Eric Sandeen <sandeen@...hat.com>
Cc: linux-ext4@...r.kernel.org
Subject: Re: Need to potentially watch stack usage for ext4 and AIO...
On Fri, Jun 19, 2009 at 08:46:12PM -0500, Eric Sandeen wrote:
> if you got within 372 bytes on 32-bit (with 8k stacks) then that's
> indeed pretty worrisome.
Fortunately this was with a 4k stack, but it's still not a good thing;
the 8k stack also has to support interrupts / soft irq's, whereas the
CONFIG_4KSTACK has a separate interrupt stack....
Anyone have statistics on what the worst, most evil proprietary
SCSI/FC/10gigE driver might use in terms of stack space, combined with
say, the most evil proprietary multipath product at interrupt/softirq
time, by any chance?
In any case, here are two stack dumps that I captured, the first using
a 1k blocksize, and the second using a 4k blocksize (not that the
blocksize should make a huge amount of difference). This time, I got
to within 200 bytes of disaster on the second stack dump. Worse yet,
the stack usage bloat isn't in any one place, it seems to be finally
peanut-buttered across the call stack.
I can see some things we can do to optimize stack usage; for example,
struct ext4_allocation_request is allocated on the stack, and the
structure was laid out without any regard to space wastage caused by
alignment requirements. That won't help on x86 at all, but it will
help substantially on x86_64 (since x86_64 requires that 8 byte
variables must be 8-byte aligned, where as x86_64 only requires 4 byte
alignment, even for unsigned long long's). But it's going have to be
a whole series of incremental improvements; I don't see any magic
bullet solution to our stack usage.
- Ted
Depth Size Location (38 entries)
----- ---- --------
0) 3064 48 kvm_mmu_write+0x5f/0x67
1) 3016 16 kvm_set_pte+0x21/0x27
2) 3000 208 __change_page_attr_set_clr+0x272/0x73b
3) 2792 76 kernel_map_pages+0xd4/0x102
4) 2716 80 get_page_from_freelist+0x2dd/0x3b5
5) 2636 108 __alloc_pages_nodemask+0xf6/0x435
6) 2528 16 alloc_slab_page+0x20/0x26
7) 2512 60 __slab_alloc+0x171/0x470
8) 2452 4 kmem_cache_alloc+0x8f/0x127
9) 2448 68 radix_tree_preload+0x27/0x66
10) 2380 56 cfq_set_request+0xf1/0x2b4
11) 2324 16 elv_set_request+0x1c/0x2b
12) 2308 44 get_request+0x1b0/0x25f
13) 2264 60 get_request_wait+0x1d/0x135
14) 2204 52 __make_request+0x24d/0x34e
15) 2152 96 generic_make_request+0x28f/0x2d2
16) 2056 56 submit_bio+0xb2/0xba
17) 2000 20 submit_bh+0xe4/0x101
18) 1980 196 ext4_mb_init_cache+0x221/0x8ad
19) 1784 232 ext4_mb_regular_allocator+0x443/0xbda
20) 1552 72 ext4_mb_new_blocks+0x1f6/0x46d
21) 1480 220 ext4_ext_get_blocks+0xad9/0xc68
22) 1260 68 ext4_get_blocks+0x10e/0x27e
23) 1192 244 mpage_da_map_blocks+0xa7/0x720
24) 948 108 ext4_da_writepages+0x27b/0x3d3
25) 840 16 do_writepages+0x28/0x39
26) 824 72 __writeback_single_inode+0x162/0x333
27) 752 68 generic_sync_sb_inodes+0x2b6/0x426
28) 684 20 writeback_inodes+0x8a/0xd1
29) 664 96 balance_dirty_pages_ratelimited_nr+0x12d/0x237
30) 568 92 generic_file_buffered_write+0x173/0x23e
31) 476 124 __generic_file_aio_write_nolock+0x258/0x280
32) 352 52 generic_file_aio_write+0x6e/0xc2
33) 300 52 ext4_file_write+0xa8/0x12c
34) 248 36 aio_rw_vect_retry+0x72/0x135
35) 212 24 aio_run_iocb+0x69/0xfd
36) 188 108 sys_io_submit+0x418/0x4dc
37) 80 80 syscall_call+0x7/0xb
-------------
Depth Size Location (47 entries)
----- ---- --------
0) 3556 8 kvm_clock_read+0x1b/0x1d
1) 3548 8 sched_clock+0x8/0xb
2) 3540 96 __lock_acquire+0x1c0/0xb21
3) 3444 44 lock_acquire+0x94/0xb7
4) 3400 16 _spin_lock_irqsave+0x37/0x6a
5) 3384 28 clocksource_get_next+0x12/0x48
6) 3356 96 update_wall_time+0x661/0x740
7) 3260 8 do_timer+0x1b/0x22
8) 3252 44 tick_do_update_jiffies64+0xed/0x127
9) 3208 24 tick_sched_timer+0x47/0xa0
10) 3184 40 __run_hrtimer+0x67/0x97
11) 3144 56 hrtimer_interrupt+0xfe/0x151
12) 3088 16 smp_apic_timer_interrupt+0x6f/0x82
13) 3072 92 apic_timer_interrupt+0x2f/0x34
14) 2980 48 kvm_mmu_write+0x5f/0x67
15) 2932 16 kvm_set_pte+0x21/0x27
16) 2916 208 __change_page_attr_set_clr+0x272/0x73b
17) 2708 76 kernel_map_pages+0xd4/0x102
18) 2632 32 free_hot_cold_page+0x74/0x1bc
19) 2600 20 __pagevec_free+0x22/0x2a
20) 2580 168 shrink_page_list+0x542/0x61a
21) 2412 168 shrink_list+0x26a/0x50b
22) 2244 96 shrink_zone+0x211/0x2a7
23) 2148 116 try_to_free_pages+0x1db/0x2f3
24) 2032 92 __alloc_pages_nodemask+0x2ab/0x435
25) 1940 40 find_or_create_page+0x43/0x79
26) 1900 84 __getblk+0x13a/0x2de
27) 1816 164 ext4_ext_insert_extent+0x853/0xb56
28) 1652 224 ext4_ext_get_blocks+0xb27/0xc68
29) 1428 68 ext4_get_blocks+0x10e/0x27e
30) 1360 244 mpage_da_map_blocks+0xa7/0x720
31) 1116 32 __mpage_da_writepage+0x35/0x158
32) 1084 132 write_cache_pages+0x1b1/0x293
33) 952 112 ext4_da_writepages+0x262/0x3d3
34) 840 16 do_writepages+0x28/0x39
35) 824 72 __writeback_single_inode+0x162/0x333
36) 752 68 generic_sync_sb_inodes+0x2b6/0x426
37) 684 20 writeback_inodes+0x8a/0xd1
38) 664 96 balance_dirty_pages_ratelimited_nr+0x12d/0x237
39) 568 92 generic_file_buffered_write+0x173/0x23e
40) 476 124 __generic_file_aio_write_nolock+0x258/0x280
41) 352 52 generic_file_aio_write+0x6e/0xc2
42) 300 52 ext4_file_write+0xa8/0x12c
43) 248 36 aio_rw_vect_retry+0x72/0x135
44) 212 24 aio_run_iocb+0x69/0xfd
45) 188 108 sys_io_submit+0x418/0x4dc
46) 80 80 syscall_call+0x7/0xb
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists