Reliability
Overview
When Reliability is Vital

Microsemi FPGAs are designed to address the high-reliability requirements of high availability, safety critical, and mission critical systems in industrial, aviation, military and communications applications. In many markets high-reliability requirements are being driven by the growth in safety standards over a wide range of industries.
For example the following standards; IEC 61508 for industrial, IEC 61511 for process control, IEC 62304 for medical, ISO 13849 and IEC 62601 for machines (the list goes on and on) are found in applications such as emergency shut-down systems, fire and gas systems, turbine control, defibrillator and railway signaling systems. Similar requirements for reliable operation are finding their way into a growing range of applications as electronics finds its way into every aspect of our lives.
Microsemi FPGAs features address critical high-reliability requirements, details are provided on the features tab:
- Zero Failure in Time (FIT) rate FPGA configuration
- Single Event Upset (SEU) protected memories
- Memory controllers with Single Error Correction, Double Error Detection (SECDED)
- Built-in self test
- No external configuration device required
Functional Safety
Many applications for complex electronic equipment have some degree of safety requirements in their design. For the designer of such systems this usually comes in the form of Functional Safety, generally defined as "Freedom from unacceptable risk" of physical injury or of damage to the health of people, either directly, or indirectly as a result of damage to property or to the environment.
Single Event Effects and FPGA Failues in Ground Level Applications
Malfunctions in integrated circuit (IC) due to radiation effects from high energy neutrons or alpha particles at ground level are now becoming a major concern; especially for high impact applications such as industrial automation, life critical medical devices, power-train automotive electronics and communication infrastructure. Where continuous and reliable operation is of utmost importance, integrated circuits must not malfunction, careful consideration of FPGA technology becomes critical.
Quality & Reliability Reports
Features
Exceptional Reliability
Microsemi PolarFire, SmartFusion2,and IGLOO2 devices address critical high-reliability requirements with the following features:
- Zero Failure in Time (FIT) rate FPGA configuration memory
- Single Event Upset (SEU) protected memories
- Memory controllers with Single Error Correction, Double Error Detection (SECDED)
- Built-in self test
- No external configuration device required
Zero FIT Rate FPGA configuration memory
SEU Immune Zero FIT rate FPGA configuration is a critical requirement for high-reliability operation. Microsemi flash FPGA architectures employ flash memory to configure the transistors used in both the routing matrix and logic modules. Flash memory is not susceptible to failures from alpha or neutron radiation. Other suppliers FPGAs that utilize SRAM configuration memory can have FIT rates from 1K to almost 4K (FIT = number of errors in 10exp9 hours) at sea level and as much as 12K at 5,000 feet. Acceptable FIT rates for commercial applications are less than 100 and for high-reliability applications are less than 20. It is easy to see why Microsemi Flash based FPGAs are the undisputed leader in high-reliability applications.
Caption: SmartFusion2 FPGAs are not susceptible to alpha/neutron failures, while SRAM-based FPGAs are.
Single Event Upset Protected Memories
PolarFire, SmartFusion2,and IGLOO2 FPGAs do use SRAM memories for a variety of blocks within the processor, for peripherals and even for the block memories within the FPGA. So you might ask, "Don't these SRAMs suffer the same SEU effects as those in SRAM FPGAs?" These FPGAs have special features added to each SRAM implementation to protect of them from SEU effects. Large SRAM buffers like the ARM Cortex-M3 embedded scratch pad memory (SmartFusion2) and those used in complex peripherals (Ethernet, CAN, USB, PCIe, etc) are implemented with built-in error detection and correction techniques. Specialized Hamming codes are used to create redundancy in each SRAM buffer so that any single bit error in any data word within the memory buffer can be corrected and any two-bit error can be detected. Smaller memory blocks (DDR bridges, instruction cache, UART, SPI FIFO's, etc) are implemented using latches, which are not subject to SEUs.
SECDED DRAM Controllers
In order to provide reliable high-bandwidth access to external memories SmartFusion2 FPGAs have up to two SECDED high-speed memory interfaces (controller and PHY) implemented as 'hard' IP blocks within the device. The Micro Controller Subsystem (MSS) based DRAM controller (MDDR) connects directly to the Microcontroller Subsystem and the FPGA fabric based DDR (FDDR) controller. The SECDED feature can be optionally enabled to implement Hamming codes for correcting a single bit error in any accessed word or detecting any double bit error in any accessed word. Redundant (additional) data bits are used in the external memories to implement this optional feature. Both controllers support LPDDR/DDR2/DDR3 memories and run at a maximum clock rate of 333 MHz. Each controller also includes a variety of advanced features that include:
- Support for a range of DRAM bus widths of x16, x18, x32, x36
- Support for command reordering to optimize memory efficiency
- Support for data reordering returning critical word first for each command
Built-in Self-Test
SmartFusion2 devices have a built-in self-test (BIST) mechanism that can be used (optionally) to check the reliability and security of a device automatically upon power-up, or on-demand. The contents of all the nonvolatile configuration memory segments, including security keys, security settings, and the FPGA fabric configuration, plus any memory pages declared as ROM by the user (all the write-protected pages) are tested. This test provides assurance against both natural and maliciously induced failures.
No External Configuration Device Required
Because the configuration memory for flash and antifuse devices is located inside the device this eliminates a component and the required interconnects from the overall system. This improves overall system reliability. In large systems, that use many FPGAs this can result in a significant increase in reliability. For example, in some large passenger airplanes there can be over 1,000 FPGAs used throughout the system. Eliminating configuration devices brings a significant reliability increase. Additionally, the use of internal Flash memory for each configuration transistor means that these devices configure as soon as power is applied. This 'instant on' capability improves system robustness, since the designer need not consider the various 'complexities' that occur during power up if some devices are not actually working. Flash fabric is also resistant to power 'drop outs' during configuration or other power variations that can create reliability problems for traditional SRAM based FPGAs.
System Design
Other Key Considerations when Designing Reliable Systems
A reliable system is one that operates exactly as specified every time it is turned on. The severity of a failure will depend on the type of system and its application space, from avionics to industrial automation and from military applications to communications systems. The reliability of a system is closely related to system security, both from an operational and a system design viewpoint. Each of these four application spaces has elevated requirements for reliability, as the consequence of a failure is potentially catastrophic. The impact of a failure can be limited in the following ways:
- Reduce the points of possible failure
- Employ redundant systems
- Limit the external effects that may lead to a failure
Reduce Points of Failure
SmartFusion2 devices integrate a microcontroller subsystem, high-performance FPGA fabric and advanced peripherals into a single device. By integrating several different types of components into a single, monolithic device, SmartFusion2 devices reduce the points of failure by removing the need for separate microcontroller, FPGA and high-speed interface devices
PolarFire, SmartFusion2,and IGLOO2 FPGAs also reduce points of failure by elimination of a configuration device by using nonvolatile flash memory architecture; by reducing the number of traces used to interface the separate components together. Additionally, these devices are tested prior to delivery, further reducing the possibility of a failure.
Employ Redundant Systems
The most common way to provide some form of protection against a catastrophic failure is to provide functional redundancy. Dual mode and triple mode redundancy (DMR and TMR) duplicate circuit functionality one (DMR) or more (TMR) times within a system, using similar or dissimilar technologies. The switching between the main and backup systems may be performed automatically via some form of monitoring system, or under remote or manual control, depending on the severity of the hazard caused by the faulty operation of a circuit. For more details on redundancy, see the safety critical web page.
Limit External Effects
In some applications it is possible to eliminate the source of potential errors by protecting against specific environmental effects. Physical shielding is one possible approach, but it can increase cost, weight and complicate power dissipation. Another potential external effect is related to security. Failure to protect designs from cloning and attack could prevent the predictable operation of a reliable system. In some systems (power grid, process control and medical systems) the danger of an intrusive attack can be much larger than the danger of a reliability failure. Microsemi FPGAs have many advanced security features that support the protection of a design from these types of failure inducing attacks. See the Security web page for more information.