Server issues today

Discuss anything not covered in another forum (life, the universe etc.)... Please keep it PG-13 and avoid spam.
Post Reply
User avatar
Philip
SG VIP
Posts: 11754
Joined: Sat May 08, 1999 5:00 am
Location: Jacksonville, Florida

Server issues today

Post by Philip »

The server shut down for no apparent reason a couple of hours ago, and it wouldn't post at all. It would boot for about 5 seconds without video coming up or any other indication, and it would shut down on its own.

I drove down to the NOC and was only able to bring it up after removing a couple of RAM sticks. I switched the 3rd and 4th Gb sticks, reseated everything, ran Memtest for about 30 min with no errors.

At this point it's running as it was, with all 4 GB RAM in it. I'm just glad it is close so I could get to it, and that I was able to bring it back up.

The downside is I am not 100% positive what the exact issue was. Hopefully it was a temporary glitch and it wont happen again.

Why is it things like that always happen on long weekends ? Argh.

Anyway, back up and running, just got back from the NOC. I'll run some more tests as I can remotely.

Please excuse any inconvenience this downtime may have caused.
User avatar
Sava700
Posts: 24051
Joined: Wed Feb 27, 2002 7:51 am
Location: Somewhere

Post by Sava700 »

Hell yeah.. I was like WTF I can't get into Speedguide.. I started to Drink alot of Moonshine cause I was loosing my mind.. thank God its back up!!!!!!!!! :thumb:
User avatar
YARDofSTUF
Posts: 70006
Joined: Sat Nov 11, 2000 12:00 am
Location: USA

Post by YARDofSTUF »

It always happens that way. Its like the neat case disease. you organize all the cables and setup all the front panel connectors, lights, every option, and the damn thing doesnt post. Then you unplug it all and just throw it on over the top and plug it in in a total mess and it runs stable as anything.
User avatar
John
Senior Member
Posts: 1698
Joined: Sun Dec 05, 1999 12:00 pm

Post by John »

YARDofSTUF wrote:It always happens that way. Its like the neat case disease. you organize all the cables and setup all the front panel connectors, lights, every option, and the damn thing doesnt post. Then you unplug it all and just throw it on over the top and plug it in in a total mess and it runs stable as anything.



Yeah I always got a higher stable OC when the guts were laid out on the mobo box LOL
User avatar
YARDofSTUF
Posts: 70006
Joined: Sat Nov 11, 2000 12:00 am
Location: USA

Post by YARDofSTUF »

John wrote:Yeah I always got a higher stable OC when the guts were laid out on the mobo box LOL
:rotfl: :thumb: Is that against NOC policy?
User avatar
Philip
SG VIP
Posts: 11754
Joined: Sat May 08, 1999 5:00 am
Location: Jacksonville, Florida

Post by Philip »

NOC policy ? Hey, it's our server, we can get electrecuted anytime we feel like it. Seriously though, I had it up and running while poking around today.
Disclaimer: Please use caution when opening messages, my grasp on reality may have shaken loose during transmission (going on rusty memory circuits), even though my tin foil hat is regularly audited for potential supply chain tampering. I also eat whatever crayons are put in front of me.
๑۩۞۩๑
User avatar
TonyT
SG VIP
Posts: 10356
Joined: Fri Jan 28, 2000 12:00 am
Location: Fairfax, VA

Post by TonyT »

The downside is I am not 100% positive what the exact issue was. Hopefully it was a temporary glitch and it wont happen again
/var/log/* turn up anything? There may be something unusual in /dmesg. Which kernel is it running?
No one has any right to force data on you
and command you to believe it or else.
If it is not true for you, it isn't true.

LRH
User avatar
downhill
Posts: 34799
Joined: Sat Jan 15, 2000 12:00 pm
Location: My Own Private Idaho

Post by downhill »

The good news is that it's at least close enough for you to trouble shoot it.

Seems running as fast as it did before from my end.
User avatar
Philip
SG VIP
Posts: 11754
Joined: Sat May 08, 1999 5:00 am
Location: Jacksonville, Florida

Post by Philip »

No logged errors whatsoever TonyT. Kernel is 2.6.18. I'm using the PAE variation, since without it only ~3.6GB RAM are usable.

My reasoning is, if it was a kernel panic or some software issue, it should've posted when they tried rebooting it. It wasn't posting, it was just shutting down before even bringing up the bios. The fans were spinning up, leds were flashing, for about 5 seconds. When I opened the case, the only notable thing was some LED display on the MoBo, showing these codes: 87... then flashed through 44, 28, FF, and back to 87. That's all it was doing, then shutting down. I haven't even been able to find out what those codes mean yet, there is nothing in the manual. The server model is Tyan GT24 (B2891), the MoBo is Tyan Thunder K8SSRE.

I'm just hoping it was just bad connection on one of the RAM modules...


Here is a dmesg dump (just the RAID array reconstructing, which is understandable after the repeated reboots and parts-pulling I suppose):
Linux version 2.6.18-1.2849.fc6PAE (brewbuilder@hs20-bc2-4.build.redhat.com) (gcc version 4.1.1 20061011 (Red Hat 4.1.1-30)) #1 SMP Fri Nov 10 13:27:10 EST 2006
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 0000000000096c00 (usable)
BIOS-e820: 0000000000096c00 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000c2000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 00000000cff20000 (usable)
BIOS-e820: 00000000cff20000 - 00000000cff25000 (ACPI data)
BIOS-e820: 00000000cff25000 - 00000000cff80000 (ACPI NVS)
BIOS-e820: 00000000cff80000 - 00000000d0000000 (reserved)
BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
BIOS-e820: 00000000fec00000 - 00000000fec00400 (reserved)
BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
BIOS-e820: 00000000fff80000 - 0000000100000000 (reserved)
BIOS-e820: 0000000100000000 - 0000000130000000 (usable)
3968MB HIGHMEM available.
896MB LOWMEM available.
found SMP MP-table at 000f71c0
NX (Execute Disable) protection: active
On node 0 totalpages: 1245184
DMA zone: 4096 pages, LIFO batch:0
Normal zone: 225280 pages, LIFO batch:31
HighMem zone: 1015808 pages, LIFO batch:31
DMI present.
Using APIC driver default
IO/L-APIC allowed because system is MP or new enough
ACPI: RSDP (v000 PTLTD ) @ 0x000f7190
ACPI: RSDT (v001 PTLTD RSDT 0x06040000 LTP 0x00000000) @ 0xcff21536
ACPI: FADT (v001 NVIDIA CK8S 0x06040000 PTL_ 0x000f4240) @ 0xcff24d72
ACPI: SRAT (v001 AMD HAMMER 0x06040000 AMD 0x00000001) @ 0xcff24de6
ACPI: SPCR (v001 PTLTD $UCRTBL$ 0x06040000 PTL 0x00000001) @ 0xcff24ef6
ACPI: MADT (v001 PTLTD APIC 0x06040000 LTP 0x00000000) @ 0xcff24f46
ACPI: BOOT (v001 PTLTD $SBFTBL$ 0x06040000 LTP 0x00000001) @ 0xcff24fd8
ACPI: DSDT (v001 NVIDIA CK8 0x06040000 MSFT 0x0100000e) @ 0x00000000
ACPI: PM-Timer IO Port: 0x8008
ACPI: Local APIC address 0xfee00000
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 15:1 APIC version 16
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
Processor #1 15:1 APIC version 16
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
Processor #2 15:1 APIC version 16
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled)
Processor #3 15:1 APIC version 16
ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1])
ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, GSI 0-23
ACPI: IOAPIC (id[0x03] address[0xdf200000] gsi_base[24])
IOAPIC[1]: apic_id 3, version 17, address 0xdf200000, GSI 24-27
ACPI: IOAPIC (id[0x04] address[0xdf201000] gsi_base[28])
IOAPIC[2]: apic_id 4, version 17, address 0xdf201000, GSI 28-31
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
ACPI: IRQ9 used by override.
Enabling APIC mode: Flat. Using 3 I/O APICs
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at d1000000 (gap: d0000000:10000000)
Detected 2009.406 MHz processor.
Built 1 zonelists. Total pages: 1245184
Kernel command line: ro root=/dev/md1
mapped APIC to ffffd000 (fee00000)
mapped IOAPIC to ffffc000 (fec00000)
mapped IOAPIC to ffffb000 (df200000)
mapped IOAPIC to ffffa000 (df201000)
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
CPU 0 irqstacks, hard=c07ae000 soft=c078e000
PID hash table entries: 4096 (order: 12, 16384 bytes)
Console: colour VGA+ 80x25
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 4146268k/4980736k available (2136k kernel code, 45900k reserved, 865k data, 244k init, 3275904k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay using timer specific routine.. 4019.77 BogoMIPS (lpj=2009888)
Security Framework v1.0.0 initialized
SELinux: Initializing.
SELinux: Starting in permissive mode
selinux_register_security: Registering secondary module capability
Capability LSM initialized as secondary
Mount-cache hash table entries: 512
CPU: After generic identify, caps: 178bfbff e3d3fbff 00000000 00000000 00000001 00000000 00000003
CPU: After vendor identify, caps: 178bfbff e3d3fbff 00000000 00000000 00000001 00000000 00000003
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 0(2) -> Core 0
CPU: After all inits, caps: 178bfbff e3d3fbff 00000000 00000410 00000001 00000000 00000003
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
Checking 'hlt' instruction... OK.
SMP alternatives: switching to UP code
ACPI: Core revision 20060707
CPU0: AMD Dual Core AMD Opteron(tm) Processor 270 stepping 02
SMP alternatives: switching to SMP code
Booting processor 1/1 eip 3000
CPU 1 irqstacks, hard=c07af000 soft=c078f000
Initializing CPU#1
Calibrating delay using timer specific routine.. 4017.95 BogoMIPS (lpj=2008978)
CPU: After generic identify, caps: 178bfbff e3d3fbff 00000000 00000000 00000001 00000000 00000003
CPU: After vendor identify, caps: 178bfbff e3d3fbff 00000000 00000000 00000001 00000000 00000003
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 1(2) -> Core 1
CPU: After all inits, caps: 178bfbff e3d3fbff 00000000 00000410 00000001 00000000 00000003
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#1.
CPU1: AMD Dual Core AMD Opteron(tm) Processor 270 stepping 02
SMP alternatives: switching to SMP code
Booting processor 2/2 eip 3000
CPU 2 irqstacks, hard=c07b0000 soft=c0790000
Initializing CPU#2
Calibrating delay using timer specific routine.. 4017.96 BogoMIPS (lpj=2008980)
CPU: After generic identify, caps: 178bfbff e3d3fbff 00000000 00000000 00000001 00000000 00000003
CPU: After vendor identify, caps: 178bfbff e3d3fbff 00000000 00000000 00000001 00000000 00000003
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 2(2) -> Core 0
CPU: After all inits, caps: 178bfbff e3d3fbff 00000000 00000410 00000001 00000000 00000003
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#2.
CPU2: AMD Dual Core AMD Opteron(tm) Processor 270 stepping 02
SMP alternatives: switching to SMP code
Booting processor 3/3 eip 3000
CPU 3 irqstacks, hard=c07b1000 soft=c0791000
Initializing CPU#3
Calibrating delay using timer specific routine.. 4017.97 BogoMIPS (lpj=2008986)
CPU: After generic identify, caps: 178bfbff e3d3fbff 00000000 00000000 00000001 00000000 00000003
CPU: After vendor identify, caps: 178bfbff e3d3fbff 00000000 00000000 00000001 00000000 00000003
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 3(2) -> Core 1
CPU: After all inits, caps: 178bfbff e3d3fbff 00000000 00000410 00000001 00000000 00000003
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#3.
CPU3: AMD Dual Core AMD Opteron(tm) Processor 270 stepping 02
Total of 4 processors activated (16073.66 BogoMIPS).
ENABLING IO-APIC IRQs
..TIMER: vector=0x31 apic1=0 pin1=0 apic2=-1 pin2=-1
checking TSC synchronization across 4 CPUs: passed.
Brought up 4 CPUs
sizeof(vma)=88 bytes
sizeof(page)=32 bytes
sizeof(inode)=424 bytes
sizeof(dentry)=148 bytes
sizeof(ext3inode)=600 bytes
sizeof(buffer_head)=52 bytes
sizeof(skbuff)=172 bytes
sizeof(task_struct)=1392 bytes
migration_cost=1129
checking if image is initramfs... it is
Freeing initrd memory: 1579k freed
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: PCI BIOS revision 2.10 entry at 0xfd76f, last bus=10
PCI: Using configuration type 1
Setting up standard PCI resources
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (0000:00)
PCI: Probing PCI hardware (bus 00)
ACPI: Assume root bridge [\_SB_.PCI0] bus is 0
Boot video device is 0000:01:07.0
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P2P0._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.XVR0._PRT]
ACPI: PCI Interrupt Link [LNK1] (IRQs 16 17 18 19) *0
ACPI: PCI Interrupt Link [LNK2] (IRQs 16 17 18 19) *0
ACPI: PCI Interrupt Link [LNK3] (IRQs 16 17 18 19) *0
ACPI: PCI Interrupt Link [LNK4] (IRQs 16 17 18 19) *0, disabled.
ACPI: PCI Interrupt Link [LNK5] (IRQs 16 17 18 19) *0, disabled.
ACPI: PCI Interrupt Link [LSMB] (IRQs 20 21 22 23) *0, disabled.
ACPI: PCI Interrupt Link [LUS0] (IRQs 20 21 22 23) *0
ACPI: PCI Interrupt Link [LUS2] (IRQs 20 21 22 23) *0
ACPI: PCI Interrupt Link [LMAC] (IRQs 20 21 22 23) *0, disabled.
ACPI: PCI Interrupt Link [LACI] (IRQs 20 21 22 23) *0, disabled.
ACPI: PCI Interrupt Link [LMCI] (IRQs 20 21 22 23) *0, disabled.
ACPI: PCI Interrupt Link [LPID] (IRQs 20 21 22 23) *0, disabled.
ACPI: PCI Interrupt Link [LTID] (IRQs 20 21 22 23) *0, disabled.
ACPI: PCI Interrupt Link [LSI1] (IRQs 20 21 22 23) *0, disabled.
ACPI: PCI Interrupt Link [APCP] (IRQs 20 21 22 23) *0, disabled.
ACPI: PCI Root Bridge [PCI2] (0000:08)
PCI: Probing PCI hardware (bus 08)
ACPI: Assume root bridge [\_SB_.PCI0] bus is 0
ACPI: PCI Interrupt Routing Table [\_SB_.PCI2.G0PA._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI2.G0PB._PRT]
Linux Plug and Play Support v0.97 (c) Adam Belay
pnp: PnP ACPI init
pnp: PnP ACPI: found 14 devices
usbcore: registered new driver usbfs
usbcore: registered new driver hub
PCI: Using ACPI for IRQ routing
PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report
NetLabel: Initializing
NetLabel: domain hash size = 128
NetLabel: protocols = UNLABELED CIPSOv4
NetLabel: unlabeled traffic allowed by default
pnp: 00:07: ioport range 0x8000-0x807f could not be reserved
pnp: 00:07: ioport range 0x8080-0x80ff has been reserved
pnp: 00:07: ioport range 0x8400-0x847f has been reserved
pnp: 00:07: ioport range 0x8480-0x84ff has been reserved
pnp: 00:07: ioport range 0x8800-0x887f has been reserved
pnp: 00:07: ioport range 0x8880-0x88ff has been reserved
pnp: 00:07: ioport range 0x5000-0x503f has been reserved
pnp: 00:07: ioport range 0x5040-0x507f has been reserved
PCI: Bridge: 0000:00:09.0
IO window: 2000-2fff
MEM window: dd100000-deffffff
PREFETCH window: d1000000-d10fffff
PCI: Bridge: 0000:00:0e.0
IO window: disabled.
MEM window: disabled.
PREFETCH window: disabled.
PCI: Setting latency timer of device 0000:00:09.0 to 64
PCI: Setting latency timer of device 0000:00:0e.0 to 64
PCI: Bridge: 0000:08:0a.0
IO window: 3000-3fff
MEM window: df300000-df3fffff
PREFETCH window: d1100000-d11fffff
PCI: Bridge: 0000:08:0b.0
IO window: disabled.
MEM window: df400000-df4fffff
PREFETCH window: d1200000-d12fffff
NET: Registered protocol family 2
IP route cache hash table entries: 32768 (order: 5, 131072 bytes)
TCP established hash table entries: 131072 (order: 9, 2621440 bytes)
TCP bind hash table entries: 65536 (order: 8, 1310720 bytes)
TCP: Hash tables configured (established 131072 bind 65536)
TCP reno registered
Simple Boot Flag at 0x36 set to 0x1
apm: BIOS not found.
audit: initializing netlink socket (disabled)
audit(1164570367.274:1): initialized
highmem bounce pool size: 64 pages
Total HugeTLB memory allocated, 0
VFS: Disk quotas dquot_6.5.1
Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
SELinux: Registering netfilter hooks
Initializing Cryptographic API
ksign: Installing public key data
Loading keyring
- Added public key CA06D81CB13FD94
- User ID: Red Hat, Inc. (Kernel Module GPG key)
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered (default)
PCI: MSI quirk detected. PCI_BUS_FLAGS_NO_MSI set for subordinate bus.
PCI: MSI quirk detected. PCI_BUS_FLAGS_NO_MSI set for subordinate bus.
PCI: Setting latency timer of device 0000:00:0e.0 to 64
pcie_portdrv_probe->Dev[005d:10de] has invalid IRQ. Check vendor BIOS
assign_interrupt_mode Found MSI capability
Allocate Port Service[0000:00:0e.:p cie00]
pci_hotplug: PCI Hot Plug PCI Core version: 0.5
isapnp: Scanning for PnP cards...
isapnp: No Plug & Play device found
Real Time Clock Driver v1.12ac
Non-volatile memory driver v1.2
Linux agpgart interface v0.101 (c) Dave Jones
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled
serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
00:02: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
00:03: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
RAMDISK driver initialized: 16 RAM disks of 16384K size 4096 blocksize
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
NFORCE-CK804: IDE controller at PCI slot 0000:00:06.0
NFORCE-CK804: chipset revision 162
NFORCE-CK804: not 100% native mode: will probe irqs later
NFORCE-CK804: 0000:00:06.0 (rev a2) UDMA133 controller
ide0: BM-DMA at 0x1400-0x1407, BIOS settings: hda :p io, hdb :p io
ide1: BM-DMA at 0x1408-0x140f, BIOS settings: hdc :p io, hdd :D MA
Probing IDE interface ide0...
Probing IDE interface ide1...
hdd: CD-224E-N, ATAPI CD/DVD-ROM drive
ide1 at 0x170-0x177,0x376 on irq 15
Probing IDE interface ide0...
ide-floppy driver 0.99.newide
usbcore: registered new driver libusual
usbcore: registered new driver hiddev
usbcore: registered new driver usbhid
drivers/usb/input/hid-core.c: v2.6:USB HID core driver
PNP: PS/2 Controller [PNP0303:PS2K,PNP0f13:PS2M] at 0x60,0x64 irq 1,12
serio: i8042 AUX port at 0x60,0x64 irq 12
serio: i8042 KBD port at 0x60,0x64 irq 1
mice: PS/2 mouse device common for all mice
md: md driver 0.90.3 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: bitmap version 4.39
TCP bic registered
Initializing IPsec netlink socket
NET: Registered protocol family 1
NET: Registered protocol family 17
powernow-k8: Found 4 Dual Core AMD Opteron(tm) Processor 270 processors (version 2.00.00)
powernow-k8: MP systems not supported by PSB BIOS structure
powernow-k8: MP systems not supported by PSB BIOS structure
powernow-k8: MP systems not supported by PSB BIOS structure
powernow-k8: MP systems not supported by PSB BIOS structure
Using IPI No-Shortcut mode
ACPI: (supports S0 S1 S4 S5)
Time: acpi_pm clocksource has been installed.
Freeing unused kernel memory: 244k freed
Write protecting the kernel read-only data: 385k
USB Universal Host Controller Interface driver v3.0
ohci_hcd: 2005 April 22 USB 1.1 'Open' Host Controller (OHCI) Driver (PCI)
ACPI: PCI Interrupt Link [LUS0] enabled at IRQ 23
ACPI: PCI Interrupt 0000:00:02.0[A] -> Link [LUS0] -> GSI 23 (level, high) -> IRQ 193
PCI: Setting latency timer of device 0000:00:02.0 to 64
ohci_hcd 0000:00:02.0: OHCI Host Controller
ohci_hcd 0000:00:02.0: new USB bus registered, assigned bus number 1
ohci_hcd 0000:00:02.0: irq 193, io mem 0xdd000000
usb usb1: configuration #1 chosen from 1 choice
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 10 ports detected
ACPI: PCI Interrupt Link [LUS2] enabled at IRQ 22
ACPI: PCI Interrupt 0000:00:02.1 -> Link [LUS2] -> GSI 22 (level, high) -> IRQ 201
PCI: Setting latency timer of device 0000:00:02.1 to 64
ehci_hcd 0000:00:02.1: EHCI Host Controller
ehci_hcd 0000:00:02.1: new USB bus registered, assigned bus number 2
ehci_hcd 0000:00:02.1: debug port 1
PCI: cache line size of 64 is not supported by device 0000:00:02.1
ehci_hcd 0000:00:02.1: irq 201, io mem 0xdd001000
ehci_hcd 0000:00:02.1: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004
usb usb2: configuration #1 chosen from 1 choice
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 10 ports detected
md: raid1 personality registered for level 1
SCSI subsystem initialized
ACPI: PCI Interrupt 0000:09:0a.0[A] -> GSI 26 (level, low) -> IRQ 209
usb 1-2: new low speed USB device using ohci_hcd and address 2
usb 1-2: configuration #1 chosen from 1 choice
input: HID 1267:0103 as /class/input/input0
input: USB HID v1.10 Keyboard [HID 1267:0103] on usb-0000:00:02.0-2
input: HID 1267:0103 as /class/input/input1
input: USB HID v1.10 Device [HID 1267:0103] on usb-0000:00:02.0-2
scsi0 : Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 3.0
<Adaptec AIC7901 Ultra320 SCSI adapter>
aic7901: Ultra320 Wide Channel A, SCSI Id=7, PCI-X 101-133Mhz, 512 SCBs

Vendor: SEAGATE Model: ST373207LC Rev: 0004
Type: Direct-Access ANSI SCSI revision: 03
target0:0:0: asynchronous
scsi0:A:0:0: Tagged Queuing enabled. Depth 4
target0:0:0: Beginning Domain Validation
target0:0:0: wide asynchronous
target0:0:0: FAST-160 WIDE SCSI 320.0 MB/s DT IU QAS RDSTRM RTI WRFLOW PCOMP (6.25 ns, offset 63)
target0:0:0: Ending Domain Validation
SCSI device sda: 143374744 512-byte hdwr sectors (73408 MB)
sda: Write Protect is off
sda: Mode Sense: ab 00 10 08
SCSI device sda: drive cache: write back w/ FUA
SCSI device sda: 143374744 512-byte hdwr sectors (73408 MB)
sda: Write Protect is off
sda: Mode Sense: ab 00 10 08
SCSI device sda: drive cache: write back w/ FUA
sda: sda1 sda2 sda3
sd 0:0:0:0: Attached scsi disk sda
Vendor: SEAGATE Model: ST373207LC Rev: 0004
Type: Direct-Access ANSI SCSI revision: 03
target0:0:3: asynchronous
scsi0:A:3:0: Tagged Queuing enabled. Depth 4
target0:0:3: Beginning Domain Validation
target0:0:3: wide asynchronous
target0:0:3: FAST-160 WIDE SCSI 320.0 MB/s DT IU QAS RDSTRM RTI WRFLOW PCOMP (6.25 ns, offset 63)
target0:0:3: Ending Domain Validation
SCSI device sdb: 143374744 512-byte hdwr sectors (73408 MB)
sdb: Write Protect is off
sdb: Mode Sense: ab 00 10 08
SCSI device sdb: drive cache: write back w/ FUA
SCSI device sdb: 143374744 512-byte hdwr sectors (73408 MB)
sdb: Write Protect is off
sdb: Mode Sense: ab 00 10 08
SCSI device sdb: drive cache: write back w/ FUA
sdb: sdb1 sdb2 sdb3
sd 0:0:3:0: Attached scsi disk sdb
libata version 2.00 loaded.
md: Autodetecting RAID arrays.
md: autorun ...
md: considering sdb3 ...
md: adding sdb3 ...
md: sdb1 has different UUID to sdb3
md: adding sda3 ...
md: sda1 has different UUID to sdb3
md: created md1
md: bind<sda3>
md: bind<sdb3>
md: running: <sdb3><sda3>
md: md1: raid array is not clean -- starting background reconstruction
raid1: raid set md1 active with 2 out of 2 mirrors
md: considering sdb1 ...
md: syncing RAID array md1
md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
md: using 128k window, over a total of 70557376 blocks.
md: adding sdb1 ...
md: adding sda1 ...
md: created md0
md: bind<sda1>
md: bind<sdb1>
md: running: <sdb1><sda1>
raid1: raid set md0 active with 2 out of 2 mirrors
md: ... autorun DONE.
kjournald starting. Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
SELinux: Disabled at runtime.
SELinux: Unregistering netfilter hooks
audit(1164570389.981:2): selinux=0 auid=4294967295
input: PC Speaker as /class/input/input2
EDAC MC: Ver: 2.0.1 Nov 10 2006
EDAC MC0: Giving out device to k8_edac Athlon64/Opteron: DEV 0000:00:18.2
EDAC MC1: Giving out device to k8_edac Athlon64/Opteron: DEV 0000:00:19.2
tg3.c:v3.65 (August 07, 2006)
ACPI: PCI Interrupt 0000:0a:09.0[A] -> GSI 28 (level, low) -> IRQ 217
eth0: Tigon3 [partno(BCM95704A7) rev 2003 PHY(5704)] (PCIX:133MHz:64-bit) 10/100/1000BaseT Ethernet 00:e0:81:33:4b:2c
eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] TSOcap[1]
eth0: dma_rwctrl[769f4000] dma_mask[64-bit]
ACPI: PCI Interrupt 0000:0a:09.1 -> GSI 29 (level, low) -> IRQ 225
eth1: Tigon3 [partno(BCM95704A7) rev 2003 PHY(5704)] (PCIX:133MHz:64-bit) 10/100/1000BaseT Ethernet 00:e0:81:33:4b:2d
eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] TSOcap[1]
eth1: dma_rwctrl[769f4000] dma_mask[64-bit]
i2c_adapter i2c-0: nForce2 SMBus adapter at 0x5000
i2c_adapter i2c-1: nForce2 SMBus adapter at 0x5040
hdd: ATAPI 24X CD-ROM drive, 256kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.20
sd 0:0:0:0: Attached scsi generic sg0 type 0
sd 0:0:3:0: Attached scsi generic sg1 type 0
floppy0: no floppy controllers found
parport: PnPBIOS parport detected.
parport0: PC-style at 0x378 (0x778), irq 7 [PCSPP,TRISTATE,EPP]
lp0: using parport0 (interrupt-driven).
lp0: console ready
ACPI: Power Button (FF) [PWRF]
ACPI: Power Button (CM) [PWRB]
ibm_acpi: ec object not found
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
device-mapper: ioctl: 4.7.0-ioctl (2006-06-24) initialised: dm-devel@redhat.com
device-mapper: multipath: version 1.0.4 loaded
EXT3 FS on md1, internal journal
kjournald starting. Commit interval 5 seconds
EXT3 FS on md0, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
Adding 1020116k swap on /dev/sdb2. Priority:-1 extents:1 across:1020116k
Adding 1020116k swap on /dev/sda2. Priority:-2 extents:1 across:1020116k



The RAID 1 array shows up clean too:

[root@sguide log]# mdadm --detail /dev/md0
/dev/md0:
Version : 00.90.03
Creation Time : Tue Oct 24 14:38:21 2006
Raid Level : raid1
Array Size : 104320 (101.89 MiB 106.82 MB)
Device Size : 104320 (101.89 MiB 106.82 MB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Sun Nov 26 14:46:37 2006
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0

UUID : d25785e3:77e33d9a:37dec6e3:2a0717e8
Events : 0.10

Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1
1 8 17 1 active sync /dev/sdb1


[root@sguide log]# mdadm --detail /dev/md1
/dev/md1:
Version : 00.90.03
Creation Time : Tue Oct 24 14:37:26 2006
Raid Level : raid1
Array Size : 70557376 (67.29 GiB 72.25 GB)
Device Size : 70557376 (67.29 GiB 72.25 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 1
Persistence : Superblock is persistent

Update Time : Sun Nov 26 17:43:13 2006
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0

UUID : 7a78086a:92c261f4:d1beca4c:5f8dd6cc
Events : 0.16

Number Major Minor RaidDevice State
0 8 3 0 active sync /dev/sda3
1 8 19 1 active sync /dev/sdb3
User avatar
MissTynker2
Posts: 6930
Joined: Sun Oct 19, 2003 12:00 pm
Location: Northern California

Post by MissTynker2 »

Thanks for the update Philip...and for your time spent in correcting it. Now, how do we get that day off back for ya? ;)
Mystical Folding Minx
User avatar
Philip
SG VIP
Posts: 11754
Joined: Sat May 08, 1999 5:00 am
Location: Jacksonville, Florida

Post by Philip »

That's what I'm here for ;)

Hmm... Here is an idea:

Click on a banner a week, visit some site. Just don't overdo it, especially if you're not interested in it :) Repeated clicks from the same user in the same day can even have negative effect.
User avatar
MissTynker2
Posts: 6930
Joined: Sun Oct 19, 2003 12:00 pm
Location: Northern California

Post by MissTynker2 »

Philip wrote:That's what I'm here for ;)

Hmm... Here is an idea:

Click on a banner a week, visit some site. Just don't overdo it, especially if you're not interested in it :) Repeated clicks from the same user in the same day can even have negative effect.
Ack!!!! Thanks for the reminder!! Premium Membership extended. :p
Mystical Folding Minx
User avatar
Shinobi
Senior Member
Posts: 4455
Joined: Sat Jan 06, 2001 12:00 am
Location: South Carolina

Post by Shinobi »

Philip, Man.. I tried to find any info on the LED display on the mobo..
Found jack.. you think that Tyan would have a error code list.
_______________________________________________
Vendor neutral certified in IT Project Management, IT Security, Cisco Networking, Cisco Security, Wide Area Networks, IPv6, IT Hardware, Unix, Linux, and Windows server administration
[SIGPIC][/SIGPIC] :thumb:
User avatar
Rivas
Posts: 10261
Joined: Sat May 11, 2002 3:42 pm
Location: Canada

Post by Rivas »

Sava700 wrote:Hell yeah.. I was like WTF I can't get into Speedguide.. I started to Drink alot of Moonshine cause I was loosing my mind.. thank God its back up!!!!!!!!! :thumb:

:nod:
To be human is to choose.


It is better to die on your feet
than to live on your knees.

- Emiliano Zapata
User avatar
Philip
SG VIP
Posts: 11754
Joined: Sat May 08, 1999 5:00 am
Location: Jacksonville, Florida

Post by Philip »

Shinobi wrote:Philip, Man.. I tried to find any info on the LED display on the mobo..
Found jack.. you think that Tyan would have a error code list.

It has a Phoenix BIOS... The Phoenix site has some codes, but I couldn't find anything useful. I may try giving Tyan a call tomorrow, when there's someone to answer the phone.

Thanks for looking.
User avatar
Philip
SG VIP
Posts: 11754
Joined: Sat May 08, 1999 5:00 am
Location: Jacksonville, Florida

Post by Philip »

MissTynker2 wrote:Ack!!!! Thanks for the reminder!! Premium Membership extended. :p


You didn't have to... Thanks for the support though, it's appreciated.
User avatar
Shinobi
Senior Member
Posts: 4455
Joined: Sat Jan 06, 2001 12:00 am
Location: South Carolina

Post by Shinobi »

Philip wrote:It has a Phoenix BIOS... The Phoenix site has some codes, but I couldn't find anything useful. I may try giving Tyan a call tomorrow, when there's someone to answer the phone.

Thanks for looking.
Hmm.. I haven't done any business with Tyan since socket 7 days.. :)
Hopefully their support is still good.

Good Luck to you..
_______________________________________________
Vendor neutral certified in IT Project Management, IT Security, Cisco Networking, Cisco Security, Wide Area Networks, IPv6, IT Hardware, Unix, Linux, and Windows server administration
[SIGPIC][/SIGPIC] :thumb:
User avatar
Craig321
Regular Member
Posts: 145
Joined: Fri Jan 27, 2006 8:25 pm
Location: UK

Post by Craig321 »

Philip wrote:The server shut down for no apparent reason a couple of hours ago, and it wouldn't post at all. It would boot for about 5 seconds without video coming up or any other indication, and it would shut down on its own.

I drove down to the NOC and was only able to bring it up after removing a couple of RAM sticks. I switched the 3rd and 4th Gb sticks, reseated everything, ran Memtest for about 30 min with no errors.

At this point it's running as it was, with all 4 GB RAM in it. I'm just glad it is close so I could get to it, and that I was able to bring it back up.

The downside is I am not 100% positive what the exact issue was. Hopefully it was a temporary glitch and it wont happen again.

Why is it things like that always happen on long weekends ? Argh.

Anyway, back up and running, just got back from the NOC. I'll run some more tests as I can remotely.

Please excuse any inconvenience this downtime may have caused.
Maybe heat? Just a guess :)
User avatar
Philip
SG VIP
Posts: 11754
Joined: Sat May 08, 1999 5:00 am
Location: Jacksonville, Florida

Post by Philip »

It has temperature monitoring/alarms. They're well within specs, and it would've logged something. The DC has some serious cooling in place, low humidity/dust/etc. (the CPU heatsinks were cold to the touch...)

Here are the hdd temps since last boot, they actually go down with time:
Nov 26 14:47:16 speedguide smartd[3036]: Device: /dev/sda, initial Temperature is 31 Celsius
Nov 26 14:47:16 speedguide smartd[3036]: [trip Temperature is 68 Celsius]
Nov 26 14:47:16 speedguide smartd[3036]: Device: /dev/sdb, initial Temperature is 32 Celsius
Nov 26 14:47:16 speedguide smartd[3036]: [trip Temperature is 68 Celsius]
....
Nov 26 15:17:16 speedguide smartd[3055]: Device: /dev/sda, Temperature changed -2 Celsius to 29 Celsius since last report
Nov 26 23:17:16 speedguide smartd[3055]: Device: /dev/sda, Temperature changed -2 Celsius to 27 Celsius since last report
I'll play some more with the lm_stats configuration tomorrow.

I'm leaning more towards some RAM issue.
User avatar
MadDoctor
New Member
Posts: 5
Joined: Fri Apr 27, 2001 12:00 pm
Location: Looks dark

Post by MadDoctor »

Can you guys use small words like: broke, don't work, bang, fire, smoke... stuff like that. I can't keep up.
People will forget what you said... and people will forget what you did... but people will never forget how you made them feel.
User avatar
Philip
SG VIP
Posts: 11754
Joined: Sat May 08, 1999 5:00 am
Location: Jacksonville, Florida

Post by Philip »

Damn box broke dangit. Brought the sledge hammer but they wouldn't let me in the NOC, had to leave it in da car. Kick-starting the server didn't work so I had to jump-start it instead. It had some pretty lights blinking when I was banging on it, and sparks/smoke while jump-starting it. How's dat ? :D JK, I'm just glad it's up.
User avatar
Sava700
Posts: 24051
Joined: Wed Feb 27, 2002 7:51 am
Location: Somewhere

Post by Sava700 »

Well hell here is your problem
Allocate Port Service[0000:00:0e. :pcie00]
ide0: BM-DMA at 0x1400-0x1407, BIOS settings: hda :p io, hdb :p io

ide1: BM-DMA at 0x1408-0x140f, BIOS settings: hdc :p io, hdd :D MA
When you have the logs showing those faces in them then there must be something wrong with the server!! :eek: :thumb:
User avatar
Roody
SG VIP
Posts: 30735
Joined: Sun Nov 19, 2000 12:00 am
Location: East Tennessee

Post by Roody »

Philip wrote:
Click on a banner a week, visit some site. Just don't overdo it, especially if you're not interested in it :) Repeated clicks from the same user in the same day can even have negative effect.
Man ain't that the truth. Never did get Google to fix that problem and to this day I still have no idea why they kicked me out of that program. :(
User avatar
Philip
SG VIP
Posts: 11754
Joined: Sat May 08, 1999 5:00 am
Location: Jacksonville, Florida

Post by Philip »

I'd try them again, they've relaxed their policy a lot since then.
User avatar
Roody
SG VIP
Posts: 30735
Joined: Sun Nov 19, 2000 12:00 am
Location: East Tennessee

Post by Roody »

Philip wrote:I'd try them again, they've relaxed their policy a lot since then.
Just might do that. I gave it a shot about 5-6 months ago and it didn't work, but I will give it another shot.

Thanks for the suggestion. :)
User avatar
MadDoctor
New Member
Posts: 5
Joined: Fri Apr 27, 2001 12:00 pm
Location: Looks dark

Post by MadDoctor »

Philip wrote:Damn box broke dangit. Brought the sledge hammer but they wouldn't let me in the NOC, had to leave it in da car. Kick-starting the server didn't work so I had to jump-start it instead. It had some pretty lights blinking when I was banging on it, and sparks/smoke while jump-starting it. How's dat ? :D JK, I'm just glad it's up.
There ya go!!!!! :thumb:
People will forget what you said... and people will forget what you did... but people will never forget how you made them feel.
User avatar
A_old
Posts: 10663
Joined: Sun Jan 30, 2000 12:00 am
Location: Atlanta

Post by A_old »

THanks philip. I know what you mean..I have one project team - 5 of us -- NO ONE ELSE SHOWS UP to the meetings..it's stressing me out.
Post Reply