Improving Memory Reliability Against Multiple Cell Upsets Using
Transcription
Improving Memory Reliability Against Multiple Cell Upsets Using
International Journal Of Innovative Science And Applied Engineering Research (IJISAER) ISSN: 2349-9389 Volume 13, Issue 44 Ver. I (March. 2015) Improving Memory Reliability Against Multiple Cell Upsets Using Hamming Based Matrix Code M.Sivasankaran, PG Student, VLSI Design KCG College of Technology Chennai, India [email protected] G.Renganayaki, Assistant Professor, Department of ECE KCG College of Technology Chennai, India [email protected] Abstract— The soft error rate in storage cells is rapidly the detection of errors caused by noise or other increasing due to the ionizing effects of atmospheric impairments during transmission from the transmitter neutron, alpha-particle and cosmic rays. Due to this to the destination. Error correction is the detection of Single Cell Upset (SCU) and Multiple Cell Upset (MCU) errors and reconstruction of the original, error-free will take place. The error correction codes (ECCs) are data. The general idea for achieving error detection widely applied to protect memories against soft errors. and correction is to add redundant bits (i.e., any extra Existing ECCs can correct SCU and limited MCU. Hence data) to the message. Error-detection and correction a more reliable ECC is required. The Decimal Algorithm schemes can be either systematic or non-systematic [2] based Matrix Code (DMC) uses decimal algorithm to [9]. obtain the error detection and correction capability. The Encoder-reuse technique (ERT) is employed in DMC to In a systematic scheme, the transmitter sends the minimize the area. Initially, the data bits are split into information and attaches a fixed number of redundant symbols and they are set in a 2D matrix. The Horizontal bits which are derived from the information bits by Redundant Bits (HRB) and Vertical Redundant Bits some deterministic algorithm. In a system that uses a (VRB) are computed by decimal operations in DMC non-systematic code, the information is transformed encoder. After encoding, the obtained codeword is stored into an encoded message that has at least as many bits in the memory. If the radiation affects the memory, as the original information. Good error control Multiple Cell Upset problem will happen. These troubles performance requires the scheme to be selected based can be rectified in the decoder. The ECC based DMC has on the characteristics of the communication channel been simulated implemented and compared to Hamming Code using Xilinx Design Suite 14.2. From the simulation [6]. The various sources of errors are temperature, and implementation, analysis, the ECC based DMC humidity, vibrations, aging of components cosmic yields better performance compared to Hamming Code. radiation & alpha particles which induce failures in Keywords— Decimal Matrix Code (DMC), Encoder Reuse chips having RAM, incomplete specifications. There Technique, Multipli-Cell Upset, Decimal Algorithm, Error are two types of errors. They are soft error and firm Correcting Codes, Hamming Code. error. A soft error does not damage the hardware of the I. INTRODUCTION system; the only damage is to the data that is being processed. An error in a memory element is considered The information and coding theory have a wide variety soft because it corrupts the data. Radiation induced of applications in computer science and error in an FPGA is a "firm" error, because it is not just communication. Error control is the technique that a transient data error. When it occurs, the device's enables the reliable delivery of digital data over configuration or "personality" that is corrupted. This unreliable communication mediums. Error detection is error changes the actual function of the device [3]. IJISAER 3SI011140121 www.ijisaer.com |38 Licensed under a Creative Commons Attribution-Non Commercial 4.0 International License International Journal Of Innovative Science And Applied Engineering Research (IJISAER) ISSN: 2349-9389 Volume 13, Issue 44 Ver. I (March. 2015) Alpha particles from package decay, Cosmic rays creating energetic neutrons and protons, thermal neutrons, random noise or signal integrity are the sources of soft errors [4]. Usually, only one cell of a memory is affected. Sometimes multiple memory cells are affected due to high energy radiations. Multiplecell upset leads to only a number of separate single-bit upsets in multiple correction words. So, an error correcting code needs only to cope with a single bit in error in each correction word in order to cope with all likely soft errors. Each Error Correction Code provides different protection level against soft errors by relying on error correcting codes [5]. Various Error Detection and Error Correction Codes are Hamming code, Reed Solomon Codes, Different Set Cyclic Codes. Hamming code is a form of linear error correcting code that can detect up to two-bit error or correct onebit errors without detection of uncorrected errors. The Hamming decoder can detect and correct all single-bit errors or detect all double-bit errors. Because errorcorrection software is permanently stored in the ROM and uses the core resources IJISAER 3SI011140121 www.ijisaer.com |39 Licensed under a Creative Commons Attribution-Non Commercial 4.0 International License International Journal Of Innovative Science And Applied Engineering Research (IJISAER) ISSN: 2349-9389 Volume 13, Issue 44 Ver. I (March. 2015) whenever memory is used, an ECC solution that employs a little memory and processor cycles is the preferable scenario [8]. Reed Solomon is an ECC system that was used for correcting multiple errors – especially burst-type errors in mass storage devices, wireless and mobile communications units, satellite links, digital TV, digital video broadcasting (DVB), and modem technologies. Reed Solomon Code provides significant burst error correcting capability. The only disadvantages to using Reed Solomon code lies in the lack of an efficient maximum likelihood soft-decision decoding algorithm [7]. Difference-set cyclic code is a new class of randomerror-correcting cyclic code. The correction process includes encoding and decoding of cyclic codes. Different set cyclic codes are able to correct a large number of bits flips. It takes lesser decoding cycles and it uses less memory and low power consumption. Matrix Code overcomes all the above mentioned disadvantages in all the ECC’s with less area and delay overheads [10]. The organization of the paper is as follows, Section I gives a general introduction about Error Detection and Correction. Section II describes on the proposed Decimal Matrix Code (DMC) with an example. Section III gives the simulation results of the (DMC). Finally, Section V concludes the report. II. DECIMAL ALGORITHM BASED MATRIX CODE Multiple cell upsets (MCUs) are becoming major issues in the reliability of memories exposed to radiation. To prevent from data corruption, more complex Error Correction Codes (ECCs) are widely used to protect memory. The main drawback is that they would require higher delay overhead. In this paper, decimal algorithm based on matrix code (DMC) is exploited to enhance memory reliability with lower delay overhead. The ECC based DMC utilizes a decimal algorithm to obtain the maximum error detection capability. Moreover, the Encoder-Reuse Technique (ERT) is used to minimize the area overhead of extra circuits without disturbing the whole encoding and decoding process. In ERT the circuit used for encoding can also be used for decoding. Fig. 1. Fault tolerant memory The schematic of fault-tolerant, memory is depicted in Fig.1. First, during the encoding process, information bits are fed to the DMC encoder, and then the horizontal redundant bits and vertical redundant bits are obtained from the DMC encoder. After encoding, the obtained codeword is stored in the memory. If the radiation affects the memory, the MCU will occur. This can be corrected in the decoding process. Due to the advantage of decimal algorithm, the proposed DMC has the higher fault-tolerant capability with lower performance overheads. In the fault-tolerant memory, the ERT technique is proposed to reduce the area overhead of extra circuits. A. DMC Encoder The encoding process is given below, Step 1: Divide N-bit word into k symbols of m bits Step 2: Arrange them in a k1 × k2, 2-D matrix Step 3: Calculate the horizontal redundant bits (H) by the decimal integer addition, among the symbols in the row Step 4: Calculate the vertical redundant bits (V) by binary operation among the bits per columns Step 5: Now this code word is stored in memory(SRAM) Fig. 2. Block diagram of DMC Encoder D0 to D31 are the information bits. H0 to H19 are the Horizontal redundant bits. V0 to V15 are the vertical redundant bits. U0 to U31 are the copy of information bits. If the memory is exposed to radiation, multiple cell upset problem will occur. This can be eliminated by the decoding process. B. DMC Decoder The DMC decoder is made up of the following sub modules, and each executes a specific task in the decoding process: syndrome calculator, error locator and corrector. It can be observed from the Fig. 3. that the redundant bits must be recomputed from the received information bits and compared to the set of redundant bits in order to obtain the syndrome bits. Then error locator uses the syndrome bits to detect and locate which bits some errors occur in. If the horizontal syndrome bits are non zero, it is able to find the symbol in which the error occurs. If the vertical syndrome bits are non zero, it is able to find which particular bit is affected by the radiation particle (MCU). Finally, the error corrector corrects the error by inverting the values of error bits. The decoding process is given below, Step 1: Receive the affected information bits IJISAER 3SI011140121 www.ijisaer.com |40 Licensed under a Creative Commons Attribution-Non Commercial 4.0 International License International Journal Of Innovative Science And Applied Engineering Research (IJISAER) ISSN: 2349-9389 Volume 13, Issue 44 Ver. I (March. 2015) Step 2: Calculate the horizontal and vertical redundant bits for the received information bits Step 3: Calculate the horizontal syndrome bits by decimal integer subtraction Step 4: Calculate the vertical syndrome bits by logical EXOR If both the syndrome bits are zero, then the information bits are not affected. If the horizontal syndrome bits are non zero, it is able to find the symbol in which the error occurs. If the vertical syndrome bits are non zero, it is able to find which particular bit is affected by the radiation particle (MCU). It can be corrected by simply flipping the affected bit (or) bits. Fig. 3. Block diagram of DMC Decoder In this ECC based DMC scheme, the circuit area of DMC is minimized by reusing its encoder. This is called the ERT. The ERT can reduce the area overhead of DMC without disturbing the whole encoding and decoding processes. It can be observed from the block diagram of decoder, that the DMC encoder is also reused for obtaining the syndrome bits in decoder. Therefore, the area of DMC can be minimized as a result of using the existing circuits of the encoder. There are three cases in the decoding process: Case1: The results of horizontal syndrome bits are non-zero, if the symbols are affected by radiation particles Case2: The results of horizontal syndrome bits are zero, if the symbols are not affected by the radiation particles Case3: The results of horizontal syndrome bits are zero, if the symbols are affected by the radiation particles To explain the DMC scheme, take a 32-bit word as an example. The cells from D0 to D31 are information bits. This 32-bit word has been divided into eight symbols of 4-bit. k1 = 2 and k2 = 4 have been chosen simultaneously. H0–H19 is horizontal redundant bits; V0 through V15 are vertical redundant bits. The maximum correction capability (i.e., the maximum size of MCUs can be corrected) and the number of redundant bits are different when the different values of k and m are chosen. Therefore, k and m should be carefully adjusted to maximize the correction capability and minimize the number of redundant bits. For example, in this case, when k1=2, k2=2 and m = 8, only one bit error can be corrected and the total number of redundant bits is 40. When k = 4 × 4 and m = 2, 3-bit errors can be corrected and the number of redundant bits is reduced to 32. However, when k = 2 × 4 and m = 4, the maximum correction capability is up to 5 bits and the number of redundant bits is 36. In this paper, in order to enhance the reliability of memory, the error correction capability is first considered, so k = 2 × 4 and m = 4 are utilized to construct DMC. The encoding steps as follows, Step 1: 32-bit word is divided into 8 symbols of 4 bits. Symbol 0=D3D2D1D0; Symbol 4=D19D18D17D16; Symbol 1=D7D6D5D4; Symbol 5=D23D22D21D20; Symbol 2=D11D10D9D8; Symbol 6=D27D26D25D24; Symbol 3=D15D14D13D12; Symbol 7=D31D30D29D28; Step 2: Calculate the horizontal redundant bits by decimal integer addition. H4H3H2H1H0 = Symbol 0+Symbol 2; H9H8H7H6H5 = Symbol 1+Symbol 3; H14H13H12H11H10 = Symbol 4+Symbol 6; H19H18H17H16H1 5= Symbol 5+Symbol 7; Where “+” represents decimal integer addition. Step 3: To calculate the vertical redundant bits the below notation is used. Vn= Dn + Dn+16; Where n= 0 to 15 if the input data is 32 bit. Encoding is performed by decimal integer binary addition. The encoder that computes the redundant bits using multi bit adders and XOR. The information bits and the redundant bits are together called as codeword. Step 4: Finally, the obtained code word is stored in memory. If any radiation particle strikes the memory, then the information bits stored in the memory gets changed. This can be corrected in decoding process. The decoding steps are given below. There are three cases in the decoding process. Case1: The results of horizontal syndrome bits are non-zero, if the symbols are affected by radiation particles Case2: The results of horizontal syndrome bits are zero, if the symbols are not affected by the radiation particles Case3: The results of horizontal syndrome bits are zero, if the symbols are affected by the radiation particles. Example for decoding process Case 1: Horizontal syndrome bits are non zero - symbols are affected Now, H4H3H2H1H0=10110 The decimal value of 10110 is 22. H4H3H2H1H0=10110-10010=22-18; IJISAER 3SI011140121 www.ijisaer.com |41 Licensed under a Creative Commons Attribution-Non Commercial 4.0 International License International Journal Of Innovative Science And Applied Engineering Research (IJISAER) ISSN: 2349-9389 Volume 13, Issue 44 Ver. I (March. 2015) H4H3H2H1H0 =00100=4 (a non-zero value) represents the decimal integer difference. Error is detected in symbol 0 and symbol 2. To find the exact bit position of the error, vertical syndrome bits are used. After finding the exact position of error, it will be corrected by simply inverting the error bit. Case 2: IJISAER 3SI011140121 www.ijisaer.com |42 Licensed under a Creative Commons Attribution-Non Commercial 4.0 International License International Journal Of Innovative Science And Applied Engineering Research (IJISAER) ISSN: 2349-9389 Volume 13, Issue 44 Ver. I (March. 2015) Horizontal syndrome bits are zero - symbols are not affected H4H3H2H1H0 =00000(zero) Then, the symbols are not affected by the radiation particles. Hence there is no multiple cell upset problem. Case 3: Consider, symbol 0=0110; symbol 2=1001; Now, H4H3H2H1H0=symbol 0+symbol 2; So, H4H3H2H1H0=0110+1001=01111 If symbol 0 and symbol 2 are affected by the radiation particles, then MCU will occur and the symbols are changed to 1001, 0110. So, H4H3H2H1H0=01111. Therefore, H4H3H2H1H0 =00000(zero). Though the symbols get affected due to radiation, the horizontal syndrome bits are 0. But it must be non zero. These types of errors are considered as decoding errors. This case is very rare. For example, when m = 4, the probability of decoding errors is 0.001. If m = 8, then the probability is 0.0000011. For the binary error detection technique, although it requires low redundant bits, its error detection capability is limited. The main reason for this is that its error detection mechanism is based on binary. We illustrate the limits of this simple binary error detection, using a simple example. Let us suppose that the bits B3, B2, B1 and B0 are original information bits and the bits C0 and C1 are redundant bits. C0=B0 xor B2=1 xor 0 =1; C1=B1 xor B3=0 xor 1=1; Then assume now that MCUs occur in bits B3, B2, and B0. (i.e., B3’=0, B2’=1 and B0’=0) The received redundant bits are computed as, C0’=B0’ xor B2’=0 xor 1=0; C1’=B1’ xor B3’=0 xor 0=0; In order to detect these errors, the syndrome bits S0 and S1 are obtained as follows, S0 = C0’ xor C0 = 1 xor 1 = 0; S1 = C1’ xor C1 = 0 xor 1 = 1; These results mean that error bits B2 and B0 are wrongly regarded as the original bits so that these two error bits are not corrected. This example illustrates that for this simple binary operation, the number of even bit errors cannot be detected. From the previous discussion, it has been shown that error detection based on binary algorithm can detect only odd number of errors. When the decimal algorithm is used, it is able to detect both even and odd number of errors. The reason is that the operation mechanism of decimal algorithm is different from that of binary. DMC uses ERT, so that the area is reduced. The advantages are listed as follows: it is able to detect even and odd number of errors, reduced area overhead by the use of ERT, reduced delay by the use of Decimal algorithm. III. IMPLEMENTATION AND ANALYSIS The DMC is implemented and analyzed in Xilinx ISE Design Suite 14.2.From the results it has been observed that the DMC can correct a maximum of 5 bit errors. It was shown in Fig. . But the existing Hamming Code can correct only 1 bit error and the results are shown in Fig. Fig. 5. Simulation result of DMC Correcting Five Cell Upset Fig. 6. Simulation results of Hamming code correcting only single bit error TABLE I of Device Delay Power COMPAR No. utilization (ns) (mW) ISON OF errors corrected DMC AND HAMMI NG CODE Para meters Method LUTs DMC Hammi ng Code 5 1 Slices 653 210 2484 708 78 336 Bounded IO 6.953 23.25 IJISAER 3SI011140121 www.ijisaer.com |43 Licensed under a Creative Commons Attribution-Non Commercial 4.0 International License 597 2236