Single Event Effects (SEE)
Malfunctions in integrated circuits (IC) due to radiation effects from high energy neutrons or alpha particles at ground level are now becoming a major concern; especially for life-critical and safety-critical applications such as aviation, industrial automation, medical devices, automotive electronics and for high-availability, revenue-critical applications such as communication infrastructure.
Where continuous and reliable operation is of utmost importance, integrated circuits must not malfunction, careful consideration of FPGA technology becomes critical. Since unintended changes in FPGA configuration pose a much more serious threat than data corruption to the reliable operation of high reliability or high availability systems, designers cannot afford to take a risk by selecting an FPGA that will not meet reliability demands. This is especially true in applications where downtime is not an option, loss of communication is not acceptable, loss of automation and control is not tolerated and liability is important.
- Neutrons and alpha particles cause upsets in memory elements, including SRAM-based FPGA configuration elements
- These upsets occur in ground-based high reliability and high availability systems, as well as airborne electronics
- Mitigation techniques detect and correct malfunctions, but do not prevent them, and come at a cost
Sources of Errors
- Neutrons: High energy neutrons present in the atmosphere arise from interaction with atmospheric gases and high energy subatomic particles from the sun and deep space. When a neutron strikes a silicon atom, heavy ions are ejected which cause momentary current pulses, causing data to change in memory cells or flip flops.
- Alpha particles: These are emitted by naturally occurring radioactive isotopes present in IC package molding compounds. Even today’s low-alpha compounds in package materials generate sufficient alpha particles to cause a significant rate of upset in state-of-the-art SRAM FPGAs.
Types of Errors
- Configuration Memory Errors: When the interconnecting elements used for routing and configuration of logic elements are corrupted due to high energy neutrons, they can lead to functional change in a logic module or misconnected or misrouted signals, resulting eventually in system failure. Learn more »
- Soft Errors: When flip-flops or memory cells change state due to neutron-induced radiation effects, the resulting errors are commonly referred to as soft or data errors. These types of errors can be mitigated using techniques such as local or global triple module redundancy (TMR) or error correcting codes (ECC). Learn more »
SRAM Based FPGA
Microsemi's Flash and Antifuse Based FPGAs
The design of critical systems is becoming increasingly governed by safety standards. These standards are to ensure that safety critical systems adhere to guidelines that prevent the equipment from causing harm through normal usage. Designers must ensure their systems provide functional safety, which is generally defined as "freedom from unacceptable risk of physical injury or of damage to the health of people, either directly or indirectly as a result of damage to property or to the environment." For more details on how Microsemi SoC FPGAs provide an ideal platform for the implementation of designs incorporating functional safety, see Safety Critical System Solutions.
Microsemi FPGA Benefits
Neutron and alpha radiation do not have adverse effects on the configuration of Microsemi antifuse and flash-based FPGAs. Microsemi offers extremely reliable FPGAs for many applications, including military, aerospace, industrial control, medical, automotive, networking, and communications.
- Microsemi antifuse and flash-based FPGAs are not susceptible to configuration loss due to single event errors (SEE) caused by alpha or neutron radiation
- No SEE mitigation techniques for configuration upsets are required in Microsemi FPGAs, reducing overall system cost (maintaining low overall system cost)
- Microsemi FPGAs maintain system integrity at high altitudes and at sea level
Testing and Results
A repeatable accelerated testing methodology is utilized to obtain a significant number of failures quickly. An independent organization, iRoC Technologies, conducted neutron testing on FPGAs using three different programming technologies, with five different architectures from three major FPGA vendors. The FPGAs were tested until a significant number of failures were observed. Based on these results, the Failures-In-Time (FIT) rates were calculated.
Neutron Radiation Test Results Summary
|Equivalent Functional Failure Rate - FIT Rates Per Device
Normalized for New York City
|Ground Level Applications||Aviation (Typical)||Aviation (Worst Case)|
|FPGA||Technology||Sea Level, NYC||5,000', NYC||40,000', NYC||50,000' 80° N|
M2S050, M2S090, M2S150
|65nm Flash||No Failures Detected|
|Microsemi Fusion AFS1500||130nm Flash||No Failures Detected|
|Microsemi ProASIC3 A3PE600||130nm Flash||No Failures Detected|
|Microsemi ProASIC Plus APA1000||220nm Flash||No Failures Detected|
|Microsemi Axcelerator AX1000||150nm Antifuse||No Failures Detected|
|Vendor A 3M Gate FPGA||150nm SRAM||1,150||4,200||592,000||1,145,000|
|Vendor A 1M Gate FPGA||90nm SRAM||320||1,200||165,000||319,000|
|Vendor A 24K Logic Cell FPGA||45nm SRAM||1,180||4,300||608,000||1,175,000|
|Vendor A 44K Logic Cell FPGA||45nm SRAM||2,170||7,900||1,118,000||2,161,000|
|Vendor A 75K Logic Cell FPGA||40nm SRAM||2,527||9,200||1,302,000||2,517,000|
|Vendor A 75K Logic Cell FPGA||28nm SRAM||2,481||9,100||1,278,000||2,471,000|
|Vendor B 1M Gate FPGA||130nm SRAM||460||1,700||237,000||458,000|
|Vendor B 1M Gate FPGA||90nm SRAM||730||2,700||376,000||727,000|
|Vendor B 2M Gate FPGA||90nm SRAM||1,600||5,800||824,000||1,594,000|
|Vendor B 25K Logic Cell FPGA||65nm SRAM||580||2,100||299,000||578,000|
|Vendor B 55K Logic Cell FPGA||65nm SRAM||1,500||5,500||773,000||1,494,000|
|Vendor B 120K Logic Cell FPGA||65nm SRAM||2,900||10,600||1,494,000||2,888,000|
|Vendor B 50K Logic Cell FPGA||60nm SRAM||2,200||8,000||1,133,000||2,191,000|
Single Event Effects and FPGA Failures in Ground Level Applications
|Neutron-Induced Single Event Upset (SEU) FAQ||8/2011|
GlossaryA B C D E F G H I J K L M N O P Q R S T U V W X Y Z
- Alpha particle
- A helium nucleus, which contains two protons and two neutrons. Alpha particles are emitted from larger atoms as a result of radioactive decay. An alpha particle will only travel a few centimeters in air, or about 25 microns in silicon, before interacting with the matter it is raveling through.
- This is a two-terminal device that is a highly resistive element in its unprogrammed state and is programmed to a low impedance. Typical programmed impedances range from 25 to 500 ohms, depending on the specific antifuse material, technology, and programming. This element is generally inherently radiation-tolerant; certain versions can be made radiation-hard. The failure mode of these elements during irradiation is rupture from a heavy ion.
- Bathtub Curve
- The bathtub curve refers to the failure rate of many manufactured devices when viewed over the lifetime of the device. Many devices fail within a short period of time due to manufacturing defects. This failure rate decreases with time until a relatively constant failure rate is achieved. This constant failure rate applies to the normal working life of the device, after which time the failure rate starts to increase due to parts wearing out.
- See "Configuration Soft Error Rate."
- Configuration Soft Error Rate (CfgSER)
- The rate at which soft errors occur in the configuration memory of an FPGA. Because errors in the configuration memory can result in far-reaching system problems, they are referred to as firm errors.
- Cosmic radiation
- High energy rays from space which, in space, are primarily protons (92%) and alpha particles (6%). In space, cosmic rays come from all directions.
- Dose Rate
- The rate at which ionizing radiation is applied. Dose rates greater than 10rad(Si)/sec are considered high, and dose rates less than 0.1rad(Si)/sec are considered low.
- Error Correcting Code. An error correcting code specifies how to add extra information to data bits in a fashion that allows corrections to be made to the data if one (or possibly more) of the resulting bits is changed. The most common error correcting codes have the ability to correct a single bit error and detect double bit errors. To add SEC-DED (single error correction - double error detection) to a word of 64 bits, an 8 additional bits are usually used.
- See "Error Detection and Correction."
- Electromagnetic Interference (EMI)
- Noise or interference in electric circuits caused by interaction of electric and magnetic fields.
- See "Electromagnetic Interference."
- Error Detection and Correction (EDAC)
- The use of Error Correcting Codes (ECC) in applications where data may become corrupted, for example by single event upsets due to radiation effects. See "Error Correcting Code."
- Firm Error
- Corruption of configuration information stored in SRAM based FPGAs may take time to detect and correct. In the meantime, the function of the FPGA may have changed in an unpredictable and uncontrollable manner, causing system failure. For this reason, errors in SRAM FPGA configuration memory are referred to as Firm Errors.
- Firm Error Rate (FER)
- The rate at which firm errors occur in a system, caused by the corruption of SRAM FPGA configuration memory.
- Failure-in-time. One FIT corresponds to one failure per billion (1E9) chip-hours.
- A nonvolatile memory element that uses charge stored on a floating gate to indicate a logic 1 or a logic 0. Flash technology has recently been adopted by Microsemi for use as configuration storage for programmable logic, enabling a range of single chip, nonvolatile yet reprogrammable FPGAs. Flash cells are SEE tolerant.
- Functional Failure
- The point at which the device ceases to operate.
- Hamming code
- Hamming codes are one of the most commonly used types of error correcting codes.
- Hard Error
- A hard error is an error caused by a permanent physical defect in the memory system.
- Hard Error Rate (HER)
- The HER is the frequency of errors caused by permanent physical defects in the memory system. The HER is usually much lower than the soft error rate.
- See "Hard Error Rate"
- JEDEC specification covering the testing and measurement of radiation-induced soft errors
- A condition where the output of a circuit becomes fixed near one of the two voltage extremes and will not react to changes in the input signal. Latchup can result in high current flowing through the output circuit with possible permanent damage.
- See "Linear Energy Transfer."
- LET Threshold
- LET threshold (LETTH) is the minimum LET to cause an effect. The JEDEC recommended definition is the first effect when the particle fluence = 107 ions/cm2.
- See "Multiple Bit Upset."
- Mean time between failures
- A heavy subatomic particle with no electrical charge. Neutrons are produced as a result of collisions between incoming cosmic particles and atoms of oxygen and nitrogen in the atmosphere. These neutrons travel at very high speed and will pass easily through several feet of concrete.
- Neutron Flux
- The frequency of occurrence of neutrons. Described as the number of neutrons passing through an area of 1 cm2 per second (n/cm2-sec).
- The memory elements keep their contents when power is removed from the device. The element may be one time programmable or "reprogrammable." Examples of the former include fuses and antifuses. Examples of the latter include EPROM, EEPROM and Flash storage elements. Programmable devices using Flash memory elements for configuration are both nonvolatile and reprogrammable.
- Parametric Failure
- The point at which the device exceeds its specified limits.
- Parity memory is used to detect memory errors. Each byte of data is accompanied by a parity bit, which is determined by the number of ones in the eight data bits. Even (odd) parity ensures that the total number of one bits in the data bits and parity bit is even (odd). Parity memory is most commonly used on microcomputers with a small word size. A parity memory system that uses a 64 bit word requires the same number of bits as error correcting memory, which makes error correcting memory more appealing for 64 bit and larger word sizes.
- Prompt Dose
- Testing at an extremely high dose rate, to simulate the effect of a nuclear weapon detonation.
- Basic unit of absorbed dose for ionized radiation. Rad = "Radiation Absorbed Dose." 1 Rad is 100 ergs of energy deposited in 1 gram of material. Because absorption is dependent on the target material, the radiation dose is denoted as Rad (x), where x is the target material. For work on radiation effects on silicon integrated circuits, scientists describe the radiation dose as Rad (Si).
- Random Access Memory (see also DRAM and SRAM). Random access memory should allow equally fast access to any memory location in the system. Modern RAM systems are not quite random access, but compared to disk drives, they provide a very good approximation to random access memory. The term RAM, by itself, usually refers to the VLSI-based main memory of the computer system.
- These devices can have their configuration loaded more than once. SRAM-based devices may be reloaded without restriction. Many other forms of reprogrammable elements have restrictions on the number of write cycles, although they are high enough not to be of practical concern for most applications.
- Saturation Cross Section
- See "Asymptotic Cross Section."
- See "Single Event Burnout."
- See "Single Event Dialectric Rupture."
- See "Single Event Effect."
- See "Single Event Functional Interrupt."
- See "Single Event Gate Rupture."
- See "Single Event Latchup."
- Sensitive Volume
- Sensitive volume refers to the device volume affected by SEE-inducing radiation. The geometry of the sensitive volume is not easily known, but some information is gained from test cross section data.
- See "Soft Error Rate."
- See "Single Event Transient."
- See "Single Event Upset."
- See "Single Hard Error."
- See "Cross Section."
- Sigma Sat
- See "Asymptotic Cross Section."
- Single Event Burnout (SEB)
- A highly localized burnout of the drain-source in power MOSFETs. SEB is a destructive condition.
- Single Event Dialectric Rupture (SEDR)
- The rupturing of a dielectric layer, caused by an incoming high-energy particle, resulting in the creation of a conducting path between the conductors on either side of the dielectric.
- Single Event Effect (SEE)
- Generic term applied to radiation effects on a semiconductor integrated circuit, where a single bit is upset, or a single latchup occurs. Single event effects include SEBs, SEFIs, SETs, SEUs, and SHEs.
- Single Event Functional Interrupt (SEFI)
- A condition where the device stops operating in its normal mode, and usually requires a power reset or other special sequence to resume normal operations. It is a special case of SEU changing an internal control signal. One example would be a DRAM entering the test mode defined by JEDEC. Another example is a microcircuit with IEEE 1149.1 JTAG circuitry leaving the TEST_LOGIC_RESET state and loading an unintended instruction into the instruction register (IR). Like other SEUs, the system effects must be properly analyzed. For example, a JTAG upset can cause the device to draw high currents or turn inputs into outputs. The latter could, for example, drive a clock line to ground; thus, an independent clock signal should be used for the TCLK pin on devices without the optional TRST* pin.
- Single Event Gate Rupture (SEGR)
- The burnout of a gate insulator in a power MOSFET. SEGR is a destructive condition.
- Single Event Latchup (SEL)
- A potentially destructive condition involving parasitic circuit elements forming a silicon controlled rectifier (SCR). In traditional SEL, the device current may destroy the device if not current limited and removed "in time." A "microlatch" is a subset of SEL where the device current remains below the maximum specified for the device. A removal of power to the device is required in all non-catastrophic SEL conditions in order to recover device operations.
- Single Event Transient (SET)
- A current transient induced by the passage of a particle through an integrated circuit. The current can propagate to cause an output error in combinational logic.
- Single Event Upset (SEU)
- A change of state or transient induced by an ionizing particle such as a cosmic ray or proton in a device. This may occur in digital, analog, and optical components or may have effects in surrounding circuitry. These are "soft" bit errors in that a reset or rewriting of the device causes normal behavior thereafter. A full SEU analysis considers the system effects of an upset. For example, a single bit flip, while not damaging to the circuitry involved, may damage the subsystem or system (i.e., initiating a pyrotechnic event).
- Single Hard Error (SHE)
- An SEU that causes a permanent change to the operation of a device. An example is a permanent stuck bit in a memory device.
- Soft Error
- A soft error is an error that is not due to any permanent physical defect in the memory system. Soft errors can be fixed by either writing new data to the invalid memory area or by restarting the computer.
- Soft Error Rate (SER)
- The soft error rate is the frequency of data errors caused by neutrons, alpha particles, cosmic or terrestrial radiation, and other factors that do not permanently damage the memory system.
- Static RAM. SRAM is used for the cache memory and registers in computer systems. SRAM typically requires four or six transistors per bit, making it substantially more expensive than DRAM, which usually requires one transistor per bit. SRAM is able to operate at higher speeds than DRAM, and does not require refreshing.
- Total Ionizing Dose, see "Total Dose."
- See "Triple Module Redundancy."
- Total Dose
- The total accumulated amount of absorbed ionizing radiation. Measured in Rads.
- Transition fault
- A transition fault is a fault in which a memory cell or line cannot change from one particular state to a different state.
- Triple Module Redundancy (TMR)
- A method of overcoming single event effects that uses three discrete instances of a circuit, with a majority vote scheme that monitors the data from each of the three circuits. The majority vote circuit itself outputs data identical to what the majority of the three circuits are outputting. This is an effective way to prevent data corruption due to single event effects, however it cannot correct situations where more than one of the three discrete circuits experiences an upset. TMR is expensive, since it uses two additional instances of the circuit being protected, in addition to the majority vote circuit.