lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 8 Jan 2010 07:45:39 +0000
From:	Jarek Poplawski <jarkao2@...il.com>
To:	Michael Breuer <mbreuer@...jas.com>
Cc:	Stephen Hemminger <shemminger@...ux-foundation.org>,
	David Miller <davem@...emloft.net>, akpm@...ux-foundation.org,
	flyboy@...il.com, linux-kernel@...r.kernel.org,
	netdev@...r.kernel.org
Subject: Re: [PATCH] af_packet: Don't use skb after dev_queue_xmit()

On Thu, Jan 07, 2010 at 06:11:34PM -0500, Michael Breuer wrote:
> Results:
> * no MMAP, mtu=1500, neither alternative patch loaded: adapter crashed:
> Jan  7 15:44:23 mail kernel: DRHD: handling fault status reg 2
> Jan  7 15:44:23 mail kernel: DMAR:[DMA Read] Request device [06:00.0]  
> fault addr fffb7bffe000
> Jan  7 15:44:23 mail kernel: DMAR:[fault reason 06] PTE Read access is  
> not set
> Jan  7 15:44:23 mail kernel: sky2 0000:06:00.0: error interrupt  
> status=0x80000000
> Jan  7 15:44:23 mail kernel: sky2 0000:06:00.0: PCI hardware error (0x2010)
> Jan  7 15:44:24 mail smbd[6572]: [2010/01/07 15:44:24,  0]  
> lib/util_sock.c:539(read_fd_with_timeout)
> Jan  7 15:44:24 mail smbd[6572]: [2010/01/07 15:44:24,  0]  
> lib/util_sock.c:1491(get_peer_addr_internal)
> Jan  7 15:44:24 mail smbd[6572]:   getpeername failed. Error was  
> Transport endpoint is not connected
> Jan  7 15:44:24 mail smbd[6572]:   read_fd_with_timeout: client 0.0.0.0  
> read error = Connection timed out.
> Jan  7 15:44:44 mail kernel: ------------[ cut here ]------------
> Jan  7 15:44:44 mail kernel: WARNING: at net/sched/sch_generic.c:261  
> dev_watchdog+0xf3/0x164()
> Jan  7 15:44:44 mail kernel: Hardware name: System Product Name
> Jan  7 15:44:44 mail kernel: NETDEV WATCHDOG: eth0 (sky2): transmit  
> queue 0 timed out
> Jan  7 15:44:44 mail kernel: Modules linked in: ip6table_filter  
> ip6table_mangle ip6_tables ipt_MASQUERADE iptable_nat nf_nat  
> iptable_mangle iptable_raw bridge stp appletalk psnap llc nfsd lockd  
> nfs_acl auth_rpcgss exportfs hwmon_vid coretemp sunrpc acpi_cpufreq sit  
> tunnel4 ipt_LOG nf_conntrack_netbios_ns nf_conntrack_ftp xt_DSCP xt_dscp  
> xt_MARK nf_conntrack_ipv6 xt_multiport ipv6 dm_multipath kvm_intel kvm  
> snd_hda_codec_analog snd_ens1371 gameport snd_rawmidi snd_ac97_codec  
> snd_hda_intel snd_hda_codec ac97_bus snd_hwdep snd_seq snd_seq_device  
> snd_pcm gspca_spca505 gspca_main firewire_ohci videodev v4l1_compat  
> firewire_core pcspkr v4l2_compat_ioctl32 snd_timer iTCO_wdt i2c_i801  
> crc_itu_t iTCO_vendor_support snd soundcore snd_page_alloc sky2 wmi  
> asus_atk0110 hwmon fbcon tileblit font bitblit softcursor raid456  
> async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx  
> raid1 ata_generic pata_acpi pata_marvell nouveau ttm drm_kms_helper drm  
> agpgart fb i2c_algo_bit cfbcopyarea i2c_core cfbimgblt cfbfil
> Jan  7 15:44:44 mail kernel: lrect [last unloaded: microcode]
> Jan  7 15:44:44 mail kernel: Pid: 0, comm: swapper Tainted: G        W   

BTW, was there any other oops saved before this one?

...
> --- adapter dead after this --- rebooted.
> * no MMAP; alternative 1 patch, mtu=1500; no errors; sustained transfer  
> rates about 25% lower than what I saw with mmap enabled...(before MMAP  
> enabled crashed).

?? Read below...

> * no MMAP mtu=9000; ran ok at low transfer rates - when high rates  
> kicked in, got the sky2 interrupt error & things went south:
> Jan  7 15:09:28 mail kernel: sky2 0000:06:00.0: error interrupt  
> status=0x40000008
> Jan  7 15:09:28 mail kernel: sky2 0000:06:00.0: error interrupt  
> status=0x40000008
> After this, remote connections broke and I rebooted... decided to rerun  
> w/o MMAP again before going back to MMAP and trying those other sky2  
> options...
> * Retest of no MMAP + Alternative 1 - just to confirm consistency.  
> Worked - no errors. Only version so far that allows the win7 backup to  
> complete.

??? Hmm... Alternative 1 or 2 doesn't even compile into when no MMAP,
so it definitely needs re-retesting ;-)

> * MMAP + NO DMAR + disable_msi=1... also works w/o errors... leaving  
> this one running for a while - also completed a backup successfully.  
> Fastest of the lot... about 3x faster than any other version, working or  
> not.

Very interesting. It would be nice to give it a really long try, and
if still true, try MMAP + NO DMAR only.

>
> I'm leaving this one running for now. Not retesting jumbo for now. Be  
> happy to help dig further.
>
> Tentative recommendations:
>
> 1) The af alternative patch seems rather necessary. First alternative  
> seems to be working, I'd suggest that be submitted and backported to 
> 2.6.32.
> 2) Steven's pskb_may_pull patch also ought to be included and backported.
> 3) Jumbo frame support for yukon2 should probably be disabled until/if  
> fixed.
> 4) When possible I'll test dmar and disable_msi, and no dmar and no  
> disable_msi. When I first hit issues, I was running without DMAR, but  
> also without the above patches. I suppose the non-working permutations  
> need to be either fixed or invalidated (or well documented).
> 5) It would be nice if someone with comparable hardware could reproduce  
> these issues. FWIW, I can only recreate the crash running windows backup  
> to a cifs share. Copying large files doesn't seem to do it.  Could also  
> be some other interaction going on here that perhaps others aren't  
> running - would be happy to compare notes.
>
> Notes:
> This *could* be coincidental, but maybe not...
> With MMAP+NO DMAR + disable_msi there are far fewer ... actually almost  
> no... bind error reports... and no bind format error messages. With  
> NOMMAP and alternative one there are a few more bind error messages and  
> one format error message during the several hours that version was up.  
> All other configurations going back perhaps for two weeks have  
> significantly more bind error reports - and all versions show increasing  
> frequency of bind format errors (IPV6 only) in the roughly 10-15 minutes  
> preceding the lockup/crash/interrupt error messages. There are none  
> immediately preceding any crash, but perhaps there is some correlation  
> between the network errors and bind ipv6 packets.

OK, for now let's make sure this MMAP + NO DMAR + disable_msi is
really really working.

Thanks,
Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists