linux-kernel - Re: 2.6.22.5-cfs20.2 another page allication failure

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20070828140602.9e2f5a26.delist@gmx.net>
Date:	Tue, 28 Aug 2007 14:06:02 +0200
From:	Richard Mittendorfer <delist@....net>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	linux-kernel <linux-kernel@...r.kernel.org>,
	Ingo Molnar <mingo@...e.hu>
Subject: Re: 2.6.22.5-cfs20.2 another page allication failure

Also sprach Peter Zijlstra <peterz@...radead.org> (Tue, 28 Aug 2007 13:22:17 +0200):
> On Tue, 2007-08-28 at 12:55 +0200, Richard Mittendorfer wrote:
> > Hello,
> > 
> > The box was up for 9 days when this one showed up. Currently rebuilding
> > w/ cfs lastest version.
> 
> I'm curious why you think this is related to the scheduler?

Oh, I'm not. At least I don't know and there' a new version out. 

I maybe had troubles witch ccache/distcc when I builded that one, so
I'll rebuild. I'll call in if it shows up with the rebuild.

> >  kernel: gs: page allocation failure. order:1, mode:0x4020
> >  kernel:  [<c0150070>] __alloc_pages+0x2f0/0x340
> >  kernel:  [<c0169edc>] __slab_alloc+0x1dc/0x860
> >  kernel:  [<c03569ed>] ip_local_deliver+0xfd/0x250
> >  kernel:  [<c0356160>] ip_local_deliver_finish+0x0/0x1b0
> >  kernel:  [<c0326d52>] __netdev_alloc_skb+0x22/0x50
> >  kernel:  [<c016ac7b>] __kmalloc_track_caller+0x6b/0x70
> >  kernel:  [<c0326d52>] __netdev_alloc_skb+0x22/0x50
> >  kernel:  [<c0326a07>] __alloc_skb+0x57/0x120
> >  kernel:  [<c0326d52>] __netdev_alloc_skb+0x22/0x50
> >  kernel:  [<c02a31f4>] e100_rx_alloc_skb+0x24/0xa0
> >  kernel:  [<c02a59a0>] e100_poll+0x1b0/0x430
> >  kernel:  [<c01274a0>] process_timeout+0x0/0x10
> >  kernel:  [<c032d34c>] net_rx_action+0x6c/0x170
> >  kernel:  [<c0123f62>] __do_softirq+0x42/0x90
> >  kernel:  [<c010691c>] do_softirq+0x5c/0xb0
> >  kernel:  [<c016f0a4>] vfs_read+0x154/0x1d0
> >  kernel:  [<c0148460>] handle_edge_irq+0x0/0xf0
> >  kernel:  [<c0123e9a>] irq_exit+0x5a/0x60
> >  kernel:  [<c01069ec>] do_IRQ+0x7c/0xc0
> >  kernel:  [<c016f1e1>] sys_read+0x41/0x70
> >  kernel:  [<c0104b7f>] common_interrupt+0x23/0x28
> >  kernel:  =======================
> >  kernel: Mem-info:
> >  kernel: DMA per-cpu:
> >  kernel: CPU    0: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
> >  kernel: Normal per-cpu:
> >  kernel: CPU    0: Hot: hi:  186, btch:  31 usd:  17   Cold: hi:   62, btch:  15 usd:  57
> >  kernel: Active:106252 inactive:15242 dirty:9376 writeback:0 unstable:0
> >  kernel:  free:1508 slab:3818 mapped:6284 pagetables:453 bounce:0
> >  kernel: DMA free:1988kB min:64kB low:80kB high:96kB active:9244kB inactive:128kB present:16256kB pages_scanned:33 all_unreclaimable? no
> >  kernel: lowmem_reserve[]: 0 491
> >  kernel: Normal free:4044kB min:1980kB low:2472kB high:2968kB active:415764kB inactive:60840kB present:502920kB pages_scanned:140 all_unreclaimable? no
> >  kernel: lowmem_reserve[]: 0 0
> >  kernel: DMA: 21*4kB 0*8kB 1*16kB 1*32kB 1*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1988kB
> >  kernel: Normal: 917*4kB 1*8kB 1*16kB 1*32kB 1*64kB 0*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 4044kB
> >  kernel: Swap cache: add 6475, delete 6087, find 125398/125927, race 0+0
> >  kernel: Free swap  = 1171308kB
> >  kernel: Total swap = 1179344kB
> >  kernel: Free swap:       1171308kB
> >  kernel: 130816 pages of RAM
> >  kernel: 0 pages of HIGHMEM
> >  kernel: 2224 reserved pages
> >  kernel: 92116 pages shared
> >  kernel: 388 pages swap cached
> >  kernel: 9376 pages dirty
> >  kernel: 0 pages writeback
> >  kernel: 6284 pages mapped
> >  kernel: 3818 pages slab
> >  kernel: 453 pages pagetables
> 
> Also curious why it failed, looks like it just hit the watermarks.
> Could you provide us with the content of /proc/zoneinfo of that kernel?
> Also, do you use SLUB or SLAB?

I'm using SLUB, .config & dmesg is at 
http://www.mittendorfer.com/rm/temp/config-2.6.22-cfs-v2.20.txt
http://www.mittendorfer.com/rm/temp/dmesg-2.6.22.5-cfs-20.2.txt

(note, this is after I did reboot the machine.)
booze:~# cat /proc/zoneinfo 
Node 0, zone      DMA
  pages free     597
        min      16
        low      20
        high     24
        scanned  0 (a: 20 i: 18)
        spanned  4096
        present  4064
    nr_free_pages 597
    nr_inactive  6
    nr_active    2382
    nr_anon_pages 2278
    nr_mapped    7
    nr_file_pages 110
    nr_dirty     1
    nr_writeback 0
    nr_slab_reclaimable 33
    nr_slab_unreclaimable 8
    nr_page_table_pages 2
    nr_unstable  0
    nr_bounce    0
    nr_vmscan_write 896
        protection: (0, 491)
  pagesets
    cpu: 0 pcp: 0
              count: 0
              high:  0
              batch: 1
    cpu: 0 pcp: 1
              count: 0
              high:  0
              batch: 1
  all_unreclaimable: 0
  prev_priority:     12
  start_pfn:         0
Node 0, zone   Normal
  pages free     1533
        min      495
        low      618
        high     742
        scanned  0 (a: 0 i: 0)
        spanned  126720
        present  125730
    nr_free_pages 1533
    nr_inactive  17430
    nr_active    100789
    nr_anon_pages 76649
    nr_mapped    6785
    nr_file_pages 41575
    nr_dirty     2068
    nr_writeback 0
    nr_slab_reclaimable 2424
    nr_slab_unreclaimable 1686
    nr_page_table_pages 460
    nr_unstable  0
    nr_bounce    0
    nr_vmscan_write 2633
        protection: (0, 0)
  pagesets
    cpu: 0 pcp: 0
              count: 12
              high:  186
              batch: 31
    cpu: 0 pcp: 1
              count: 6
              high:  62
              batch: 15
  all_unreclaimable: 0
  prev_priority:     12
  start_pfn:         409

> Normally these errors occur because of jumbo frames, but e100 doesn't do
> that, so it seems an order-1 page was used to back smaller objects.

It's a 100Mbps switched network, MTU 1500. One e100 and one 8139too a
multihomed SOHO server. I think it showed up after a printjob came in
from LAN..., log's confirm that.

?likely unrelated - there's wondershaper on eth1 (not the one to the LAN):

qdisc cbq 1: rate 10000Kbit (bounded,isolated) prio no-transmit
 Sent 22103227 bytes 255795 pkt (dropped 0, overlimits 30959 requeues 0) 
 rate 0bit 0pps backlog 0b 0p requeues 0 
  borrowed 0 overactions 0 avgidle 781 undertime 0
qdisc sfq 10: parent 1:10 limit 128p quantum 1514b perturb 10sec 
 Sent 12179947 bytes 187593 pkt (dropped 0, overlimits 0 requeues 0) 
 rate 0bit 0pps backlog 0b 0p requeues 0 
qdisc sfq 20: parent 1:20 limit 128p quantum 1514b perturb 10sec 
 Sent 9857424 bytes 66634 pkt (dropped 0, overlimits 0 requeues 0) 
 rate 0bit 0pps backlog 0b 0p requeues 0 
qdisc sfq 30: parent 1:30 limit 128p quantum 1514b perturb 10sec 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 rate 0bit 0pps backlog 0b 0p requeues 0 
qdisc ingress ffff: ---------------- 
 Sent 218286672 bytes 310103 pkt (dropped 1679, overlimits 0 requeues 0) 
 rate 0bit 0pps backlog 0b 0p requeues 0 
class cbq 1: root rate 10000Kbit (bounded,isolated) prio no-transmit
 Sent 65814 bytes 1567 pkt (dropped 0, overlimits 0 requeues 0) 
 rate 0bit 0pps backlog 0b 0p requeues 0 
  borrowed 0 overactions 0 avgidle 781 undertime 0
class cbq 1:10 parent 1:1 leaf 10: rate 240000bit prio 1
 Sent 12179947 bytes 187593 pkt (dropped 0, overlimits 12976 requeues 0) 
 rate 0bit 0pps backlog 0b 0p requeues 0 
  borrowed 0 overactions 9967 avgidle 781 undertime 0
class cbq 1:1 parent 1: rate 240000bit (bounded,isolated) prio 5
 Sent 22037371 bytes 254227 pkt (dropped 0, overlimits 0 requeues 0) 
 rate 0bit 0pps backlog 0b 0p requeues 0 
  borrowed 7874 overactions 0 avgidle 781 undertime 0
class cbq 1:20 parent 1:1 leaf 20: rate 216000bit prio 2
 Sent 9857424 bytes 66634 pkt (dropped 0, overlimits 20637 requeues 0) 
 rate 0bit 0pps backlog 0b 0p requeues 0 
  borrowed 7874 overactions 7229 avgidle 781 undertime 0
class cbq 1:30 parent 1:1 leaf 30: rate 192000bit prio 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 rate 0bit 0pps backlog 0b 0p requeues 0 
  borrowed 0 overactions 0 avgidle 781 undertime 0

Thanks & Regards,
ritch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/