2013-02-11 // Intel 82574L NICs: network hangs / ASPM Bug / e1000 driver
A few days ago, I ran into an ugly bug on different Scientific Linux 6.3 hosts (therefore this should also affect RHEL 6.3 and CentOS 6.3). The network hangs while the system itself is up, running and responsive. “Just” no network. Restarting the affected network interfaces is not enough, only a complete reboot brings the Intel 82574L-based network cards back to life (those NICs are onBoard on the Supermicro X9SCM-F and X8SIL mainboards of the affected hosts, so I can't simply change them). The logs showed entries like the following:
[...] Jan 24 09:52:35 host2 kernel: WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x26d/0x280() (Not tainted) Jan 24 09:52:35 host2 kernel: Hardware name: X9SCL/X9SCM Jan 24 09:52:35 host2 kernel: NETDEV WATCHDOG: eth1 (e1000e): transmit queue 0 timed out Jan 24 09:52:35 host2 kernel: Modules linked in: fuse autofs4 sunrpc vboxpci(U) vboxnetadp(U) vboxnetflt(U) vboxdrv(U) cpufreq_ondemand acpi_cpufreq freq_table mperf ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 ext3 jbd uinput raid1 sg microcode i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support shpchp e1000e ext4 mbcache jbd2 fpu aesni_intel cryptd aes_x86_64 aes_generic xts gf128mul dm_crypt raid10 sd_mod crc_t10dif ahci video output dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] Jan 24 09:52:35 host2 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-279.19.1.el6.x86_64 #1 Jan 24 09:52:35 host2 kernel: Call Trace: Jan 24 09:52:35 host2 kernel: <IRQ> [<ffffffff8106a1e7>] ? warn_slowpath_common+0x87/0xc0 Jan 24 09:52:35 host2 kernel: [<ffffffff8101c0fa>] ? intel_pmu_enable_all+0xba/0x160 Jan 24 09:52:35 host2 kernel: [<ffffffff8106a2d6>] ? warn_slowpath_fmt+0x46/0x50 Jan 24 09:52:35 host2 kernel: [<ffffffff8144792d>] ? dev_watchdog+0x26d/0x280 Jan 24 09:52:35 host2 kernel: [<ffffffff814476c0>] ? dev_watchdog+0x0/0x280 Jan 24 09:52:35 host2 kernel: [<ffffffff8107d2c7>] ? run_timer_softirq+0x197/0x340 Jan 24 09:52:35 host2 kernel: [<ffffffff810a0910>] ? tick_sched_timer+0x0/0xc0 Jan 24 09:52:35 host2 kernel: [<ffffffff8102adad>] ? lapic_next_event+0x1d/0x30 Jan 24 09:52:35 host2 kernel: [<ffffffff81072991>] ? __do_softirq+0xc1/0x1e0 Jan 24 09:52:35 host2 kernel: [<ffffffff81095510>] ? hrtimer_interrupt+0x140/0x250 Jan 24 09:52:35 host2 kernel: [<ffffffff8100c1cc>] ? call_softirq+0x1c/0x30 Jan 24 09:52:35 host2 kernel: [<ffffffff8100de05>] ? do_softirq+0x65/0xa0 Jan 24 09:52:35 host2 kernel: [<ffffffff81072775>] ? irq_exit+0x85/0x90 Jan 24 09:52:35 host2 kernel: [<ffffffff814f1fa0>] ? smp_apic_timer_interrupt+0x70/0x9b Jan 24 09:52:35 host2 kernel: [<ffffffff8100bb93>] ? apic_timer_interrupt+0x13/0x20 Jan 24 09:52:35 host2 kernel: <EOI> [<ffffffff812ec17e>] ? acpi_idle_enter_c1+0xa3/0xc1 Jan 24 09:52:35 host2 kernel: [<ffffffff812ec15d>] ? acpi_idle_enter_c1+0x82/0xc1 Jan 24 09:52:35 host2 kernel: [<ffffffff813f6c67>] ? cpuidle_idle_call+0xa7/0x140 Jan 24 09:52:35 host2 kernel: [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110 Jan 24 09:52:35 host2 kernel: [<ffffffff814d109a>] ? rest_init+0x7a/0x80 Jan 24 09:52:35 host2 kernel: [<ffffffff81c21f7b>] ? start_kernel+0x424/0x430 Jan 24 09:52:35 host2 kernel: [<ffffffff81c2133a>] ? x86_64_start_reservations+0x125/0x129 Jan 24 09:52:35 host2 kernel: [<ffffffff81c21438>] ? x86_64_start_kernel+0xfa/0x109 Jan 24 09:52:35 host2 kernel: ---[ end trace 1f3cc9d5dfc619c0 ]--- Jan 24 09:52:35 host2 kernel: e1000e 0000:02:00.0: eth1: Reset adapter [...]
After some googleing, I found a useful Bug-Report and a mailing list thread. Especially three postings are quite informative:
It seems that the ASPM of the Intel 82574L is broken. The corresponding Linux driver “e1000” therefore has this chip on its ASPM blacklists and disables it when the systems boots. However, there is some side effect which re-enabled the NIC'S ASPM state L1 after a network connection was established. This does not happen on all Linux flavors and kernel versions, but it happens at least on Scientific 6.3 with kernel 2.6.32-279.19.1.
Workaround: disable the NIC's ASPM after the system boots
A quick workaround is to manually disable the NIC'S ASPM after the system booted and the network “stabilized” (e.g. after a few minutes). The following command disables ASPM for a device:
setpci -s <ID-of-device> CAP_EXP+10.b=40
You can use lspci -vnn
to get the device ID (first number of the line, 02:00.0
in the following example output):
[root@host2 ~]# lspci -vnn | grep '82574' 02:00.0 Ethernet controller [0200]: Intel Corporation 82574L Gigabit Network Connection [8086:10d3]
Example: I used /etc/rc.local
to disable ASPM on the device with ID 02:00.0
, five minutes after the system boots by putting the following lines at the end of the file:
# workaround for Intel 82574L bug, see http://bit.ly/1565w6I for details printf '%s\n' 'setpci -s 02:00.0 CAP_EXP+10.b=40' | at now + 5min
Use lspci -vvvv -s <ID-of-device>
if you want to check if ASPM is really disabled (look for “LnkCtl: ASPM Disabled”):
[root@host2 ~]# lspci -vvvv -s 02:00.0 02:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection [...] LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- [...]
I hope this helps someone else in some way.
2010-05-28 // Microsoft Internet Explorer 8: Fix the weird textarea scrolling bug
MSIE 8 brings some new, weird bugs. One of the most annoying one targets <textarea>
s with a percentage value for the width
CSS property (ironically in standards mode only, the problem disappears in IE7 compatibility view or quirks mode). When a textarea has got enough content to offer scrollbars and the user already scrolled a little bit, every keystroke will because a disturbing auto-scroll effect (even with flickering scrollbars under certain conditions like onkeyup/down
events or border
CSS property values). This is a real show-stopper which makes editing longer texts very uncomfortable for IE8 users.
The following file shows the problem. Simply download and open it with IE8.1)
- msie8-textarea-bug.html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en" dir="ltr"> <head> <meta http-equiv="content-type" content="text/html; charset=utf-8" /> <title>MSIE8 textarea scroll bug/fail demo</title> </head> <script type="text/javascript">/*<![CDATA[*/ function checkMode() { var standard = false; if (document.compatMode) { if (document.compatMode === "BackCompat") { m = "Quirks"; } else if (document.compatMode === "CSS1Compat") { m = "Standards Compliance"; standard = true; } else { m = "Almost Standards Compliance"; } if (standard === false) { alert("ATTENTION: The document is being rendered in "+m+" Mode - there will be NO bug!"); } //alert("The document is being rendered in "+m+" Mode."); } } checkMode(); /*]]>*/</script> <style type="text/css"> textarea { height: 180px; width: 50%; /* this will trigger the bug */ } </style> <body> <form name="foobar" method="post" action=""> <textarea> Please scroll down a little bit and try to edit text. E.g. change the line by clicking into the text, type "asdf", go to another line, type "asdf" and so on. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. </textarea> </body> </html>
I don't know how to get rid of the flickering scroll bars,2) but there is a workaround for the automatic scroll effect:
- The bug is triggered by a percentage value of the CSS
width
property. - To get rid of it, you need to define a fixed px width.
- If you do not want to loose the possibility to use flexible widths for
<textarea>
, you can get the needed percentage value through the back-door by usingmin-width
andmax-width
with the same value afterwards. You may use browser specific hacks to hide/overwrite the fixed width from IE6 which is not affected by this bug but does not understandmin/max-width
.
Example:
textarea { /* width for non-IE browsers */ width: 50%; /* width for IE NOTE: - "\9" at the end is a CSS hack to address only IE (all versions). - "#" in front is a CSS hack to address IE6/7 */ width: 500px\9; /* fix the bug */ /* get the needed percentage value in IE7 and IE8 */ min-width: 50%\9; max-width: 50%\9; /* get the needed percentage value in IE6 */ #width: 50%; }
Hope that helps.
2008-02-05 // 204 = 1223
Die Gleichung sieht komisch aus, ist total falsch und macht keinen Sinn. Jedenfalls für einen normalen Menschen. Die Jungs die den Internet Explorer entworfen haben scheinen dies aber anders zu sehen.
Heute war ein wenig Ajax angesagt. Das Skript hat den HTTP-Status-Code einer XML-Antwortdatei abgefangen, welches bei einer gültigen Anfrage ohne Ergebnis keine Ausgabe bringt und nur den 204
Code sendet, welcher für no content
steht – also genau die ggf. vorliegende Situation “gültige Anfrage aber kein Inhalt” beschreibt. Soweit so gut. Dies sollte nun abgefangen werden um direkt den darauffolgenden Request abzusetzen:
} else if (http_request.status == 204) {
*kabooom*. Hat überall funktioniert, nur im IE nicht. Nach einigem Debuggen staunte ich nicht schlecht, als ein
alert(http_request.status);
in jedem Browser 204
ausgab, im IE allerdings ein 1223
produzierte. Also schnell Google angeworfen und mal wieder die Bestätigung eingeholt, das der IE – auch in der Version 7 – ein Biest ist. Er nimmt sich nämlich einfach die Freiheit in machen Fällen den Status-Code von 204
auf 1223
umzuschreiben. Zum kotzen. Um ein zusätzliches
|| http_request.status == 1223
kam ich in dem Skript also nicht herum.