AMBER Archive (2003)

Subject: Re: AMBER: ECC vs non-ECC (non-amber issue)

From: Robert Duke (rduke_at_email.unc.edu)
Date: Mon Oct 20 2003 - 21:40:24 CDT


Yong -
Unfortunately, all the non-ECC machines are not parity machines. And parity
only detects, but does not correct errors (and it cannot detect even bit
errors). There was a huge push to economize, and a lot of non-parity
machines were made (ie., the parity standard in the original ibm pc
architecture was basically phased out). Also, there are apparently machines
with something called logic parity or false parity memory that does not do
anything (I think it is a way to cheaply fake out memory controllers with
parity support, or some such). SO I think one does have to look out for
this stuff. I know in buying cheap pc's that it is hard to tell what you
are getting. Radiation-induced bit flipping is a significant problem, with
failure rates up around once a month on significant amounts of memory,
according to some work done by IBM (skimming once again, my numbers may be a
little off). I would be more diligent if buying production machines, as
opposed to development machines, and I would not assume that the vendors are
doing the right thing - the pc business is strictly price driven, and
frequent crashes are attributed (and caused by) software as well as other
hardware problems, so rock solid memory is not something that is a selling
point to 99.99% of the folks buying these things. So, folks buying
production machines should be sure to look for "true" parity, or even
better, ECC (which will keep you running through single bit errors).
Regards - Bob

----- Original Message -----
From: "Yong Duan" <yduan_at_udel.edu>
To: <amber_at_scripps.edu>
Sent: Monday, October 20, 2003 10:04 PM
Subject: AMBER: ECC vs non-ECC (non-amber issue)

>
>
> I do not know much about this business. But I always assume that ECC is
> an on-chip correction mechanism that allows localization of the error
> bit. The traditional non-ECC does error checking by parity bit. So, one
> byte is actually stored as 9 bits, with one parity bit. Normally, memory
> chips are reasonably reliable. Occasional flip of one of the 8 data bits
> would trigger the error signal on the bus logic which triggers re-read.
> So, there is still a bit protection even without ECC. Of course, if two
> of the 9 bits of the same byte get flipped, one has no way to know it.
> This would give just wrong data. For ECC memory, one would have the same
> problem when two bits flip at the same time, because ECC can really
> correct just one bit of error.
>
> Having said this, I must confess all my memory chips are ECC because
> machine stability is a huge concern for me. I just do not like to spend
> my or anybody's time to figure out what's wrong with the machines. Good
> vendors tend to use ECC memory (and ask for higher price).
>
> yong
>
>
> -----------------------------------------------------------------------
> The AMBER Mail Reflector
> To post, send mail to amber_at_scripps.edu
> To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu
>
>

-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber_at_scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu