cellio: (avatar-face)
[personal profile] cellio
Thus far I've been unsuccessful in getting the new machine to talk to the digital camera. I'm awaiting a response from tech support for the camera. Aside from that, the new machine is behaving splendidly so far.

My old machine (called, for the nonce, Bouncy) is now failing in the exact same way its predecessor (Doornail) did: after increasingly-shorter periods of uptime, it reboots and, more often than not, produces a blue screen. Attempts to reboot at that point always fail; turning the machine off for a couple hours and then trying again gets a short-lived boot. This says "overheating" to me, but it's not appreciably quieter than normal, so I'm guessing the fan is still running. All the usual precautions have been in place all along -- UPS, antivirus, automatic updates (OS and virus), safe computing practices... I don't get it. If I knew what I was looking for I'd pop the cases and look around. But I'm pretty clueless about hardware. (And we just had Bouncy open a couple months ago to poke a graphics card, so I know it's not full of dustbunnies. I don't think Doornail was the last time I powered it up, either.)

The questions in my mind right now are: what happened to Doornail and Bouncy, can it be reversed, and what do I do to prevent it from happening to my new machine?

Could I have a faulty UPS? Could a faulty UPS do damage consistent with these symptoms?

(Oh, and just to clarify: this failure pattern is not the only reason I replaced Bouncy; it's just the final step in a series of annoying failures. The CD burner hasn't worked in months... stuff like that. If it were just a hard drive, that'd be different.)

Questions

Date: 2004-12-21 06:48 am (UTC)
From: [identity profile] caryabend.livejournal.com
It sounds like an overheating problem, but I can't be sure. What is the text of the blue screen error? (What OS, too)

When the system is cold (literally) how long until it fails, and then how long until it fails the second and third times?

There may be more than one fan, especially if the processor runs faster than 1 Ghz. You'd have to open the machine to know for sure. The loud fan is more likely to be the one in the power supply, and that can mask the sound of a cpu or case fan. If any of the fans goes bad, you'll get temperature problems. Even if there are no dustbunnies, dust can accumulate on fan blades and cause problems.

Under normal conditions, it would be unusual for two successive machines to fail in the same way, so I think we need to look for "abnormalities" especially of the environmental kind.

What is the normal temp/humidity of the room with the computer? The room could be too dry, causing static discharge problems internally with the fan(s). When this problem gets bad, you can smell the ozone or a vague burning smell, usually from the dust, even if there aren't bunnies.

How much does the room vibrate? Do passing trucks make the room or table shake? If the components get just barely unseated they can exhibit odd symptoms, but these are usually permanent failures.

Re: Data

Date: 2004-12-22 05:19 am (UTC)
From: [identity profile] caryabend.livejournal.com
You're going to hate me.

That string of hex numbers really means something. You can actually search the MS database for them, and frequently get useful info. Each blue screen is likely to have similar numbers if not actually the same. If they're radically different each time it usually points to a driver issue. Was this more noticeable after an update or program installation?

Oh, and what brand/model is it?

This still sounds like an overheating issue, but we need to narrow it down.

At first glance it might be a memory problem, but I need more proof. The memory chips are rather sensitive to the environment and static, and you don't always get a failure message during the boot process - sometimes you get spontaneous reboots and blue screens.

Re: Data

Date: 2004-12-22 03:41 pm (UTC)
From: [identity profile] caryabend.livejournal.com
One of the hex numbers is usually a memory location.

In Win2K the hex numbers are an error (STOP) code (The caps letters are the error class), and various related information. Sometimes they're memory locations, pointers, instructions, etc. It depends on the actual STOP code. Once you search on the STOP code, you can get more info about the other numbers. 99 times out of 100, the secondary numbers don't provide additional useful info for troubleshooting.

(no subject)

Date: 2004-12-22 07:31 pm (UTC)
From: [personal profile] rectangularcat
My step-dad just bought a pc and he found that it was running really hot - seems that some manufacturere just make it last for the warrenty period so he bought a fan that plugs in one of the pci ports to cool it down.

Re: Here we go!

Date: 2004-12-24 07:02 am (UTC)
From: [identity profile] caryabend.livejournal.com
This may be a silly question, but is your system fully patched?

Re: Here we go!

Date: 2004-12-25 03:51 am (UTC)
From: [identity profile] caryabend.livejournal.com
The automatic updates don't do the "recommended" updates, just the "critical" ones. In itself, this isn't bad, but you should probably use Internet Explorer to go to www.windowsupdate.com. This will tell you what patches are still neded on your machine, critical or not.

I asked this only because a number of Microsoft's internal documents mention the STOP error can occur if a certain patch is not applied. But, like always, Microsoft doesn't claim this is a 100% bandaid.

(no subject)

Date: 2004-12-31 06:44 am (UTC)
From: [personal profile] rectangularcat
me neither until my recent visit to Denver!

Expand Cut Tags

No cut tags