FPGA Design Tips
Transcription
FPGA Design Tips
FPGA Design Tips Programable Logic Devices dr inż. Paweł Russek Xilinx FPGA Build Process • • From design description to a bitstream eight fundamental steps: • compiling (XST, Synplify, etc): • • • pre-build, synthesis, physical implementation (ISE, PlanAhead): ngdbuild, map, place-and-route, static timing analysis, bitgen post-build Build Process Basic Steps Synthesis • • • • Synthesis: converting a design written in a HDL into a netlist (NGC or EDIF) Netlist Translation (NGDBUILD): netlist translation into Xilinx Native Generic Database (NGD) file that contains: user constraints, FPGA part. MAP (MAP tool): mapping of a design into Xilinx FPGA components (outputs NCD file). Place&Route (PAR tool): PAR outputs an NCD file that contains complete place and route information -Static Timing Analysis -Bitstream generation Writing synthesizable code • • • • • • use synchronous design reset. avoid using latches. Use synchronous registers whenever possible avoid using gated, derived, or divided clocks use clock enables instead of multiple clocks implement proper synchronization of all asynchronous signals Not supported language constructs • • 4-state values (‘0’, ‘1’, ‘x’, ‘z’). delays, wait, initialization Good coding practices • full 'case' directives CASE sel IS WHEN "00" => output <= in0; WHEN "11" => output <= in1; WHEN OTHERS => output <= 'X'; END CASE; • complete 'if' clauses IF (s0='0' AND s1='0') THEN output <= in0; ELSIF (s0='1' AND s1=‘1') THEN output <= in1; ELSE -- (s0 is not equal s1) output <= 'X'; END IF; Instantiation vs. Inference • instantiation • directly references an FPGA library primitive or macro in HDL • • • provides complete control, CoreGen Tool (complex arithmetic blocks). inference • component inference refers to writing a generic RTL description • • • more portable code, vary between different tools, complex component can't be inferred (MMCM) Inferring registers -- Flip-Flop with Positive-Edge Clock -library ieee; use ieee.std_logic_1164.all; entity registers_1 is port(C, D : in std_logic; Q : out std_logic); end registers_1; architecture archi of registers_1 is begin process (C) begin if (C'event and C='1') then Q <= D; end if; end process; end archi; Inferring memories XST User Guide for coding style Inferring Shift Registers • • using flip-flops (use reset) using SRL (only left shift operation, no reset) wire srl1_out; reg [7:0] srl1; always @(posedge clk) srl1 <= srl_enable ? {srl1[6:0] , d_in[0]} : srl1; assign srl1_out = srl1[7]; • Barell shifters can be implemented using DSP48 Shift registers using BRAMs shift operation is performed by doing read from port A of the BRAM, concatenating delayed din with the data output from the port A, and writing the result to the port B Inferring IO • • • if a port in a top-level module doesn’t contain any constraints, it’ll be implemented using an IO with default characteristics. For Xilinx FPGAs, a synthesis tool will automatically infer IBUF, OBUF, OBUFDS, Designers can specify port characteristics by using UCF constraints Inferring IOB Registers • • • • Every IO block in Xilinx FPGAs contains storage elements, The available options are input register, and output register for single or dual-data rate (DDR) outputs. Using IOB registers significantly decreases clock-toinput/output data time, which improves IO performance. Placing a register in the IOB is not guaranteed; • • • • has to have a fanout of 1 (higher fanout have to be replicated) UCF constraint: INST <register_instance_name> IOB = TRUE|FALSE; MAP option: “Pack I/O Registers into IOBs” property, which applies globally to all IOs. inferring DDR registers in the IOB requires an explicit instantiation of a primitive. Inferring latches • In many cases latches are inferred unintentionally due to poor coding style practices. • • • • • • • • • • • • • • • • • -- Latch with Positive Gate -library ieee; use ieee.std_logic_1164.all; entity latches_1 is port(G, D : in std_logic; Q : out std_logic); end latches_1; architecture archi of latches_1 is begin process (G, D) begin if (G='1') then Q <= D; end if; end process; end archi; Clocking Resources • FPGAs provide dedicated low-skew routing resources for clocks. • • A skew is defined as the difference in arrival times of a clock edge to synchronous logic elements three types of clocks: • • • global, which can drive synchronous logic on the entire die; regional, which can drive logic in specific and adjacent regions IO, which can serve the logic specific to that IO. Mixed-mode Clock Manager MMCM Asynchronous Reset Schemes • • • • • asynchronous reset nets are not included in the static timing analysis. Using asynchronous reset may result in sub-optimal logic utilization. Asynchronous resets prevent synthesis tools from performing certain logic optimizations, such as taking advantage of internal registers of DSP48 can cause problems synchronization (figure 1) glitches (figure 2) but asynchronous reset nets usually have more relaxed timing constraints comparing to the synchronous reset. Examples Synchronization problem Glitches Synchronous Reset Scheme • Xilinx recommends using synchronous reset scheme whenever possible • • High fanout cause problems No reset scheme also possible • • By default, the register is initialized with ‘0’ on FPGA power-up When no external reset • dedicated STARTUP_VIRTEX6 primitive, which Xilinx provides for Virtex-6 FPGAs.( but not portable) FPGA configuration architecture FPGA configuration modes FPGA configuration modes In master configuration modes, an FPGA controls configuration process. In slave modes, FPGA configuration is controlled by external devices: CPLD, uC,FPGA JTAG ICAP - A dedicated ICAP primitive interfaces with the user logic to perform configuration from within the FPGA fabric. FPGA configuration modes In Master Serial Mode the FPGA is controlling the Xilinx Platform Flash to provide the configuration data. In Master SPI Flash Mode, the FPGA is controlling serial SPI Flash Master SelectMAP Mode, the FPGA is controlling Xilinx Platform Flash to provide 8- or 16-bit wide configuration data. In Master BPI Mode, the FPGA is controlling a parallel NOR Flash to provide 8- or 16-bit wide configuration data. FPGA power consumption XPower Estimator Spreadsheet. – Entered design parameters, such as clocks, IOs, logic and memory resources, and toggle rates can be either entered directly or imported from the existing MAP report Simulation and Synthesis Results Mismatch Incorrect simulation environment – – Incorrect IO constraints (standard, voltage level, drive strength, or slew rate.) Incorrect timing constraints (Clock domain crossing) Incorrect project settings(speed grade) Asynchronous reset Hardware failure – – – – Incorrect device interface simulation Insufficient simulation coverage invalid voltage wrong frequency misbehaving peripheral device signal integrity issue Synthesis or physical implementation tool bug FPGA Based processors Hardwired – – Intel Atom E6x5C ARM Cortex Xilinx Zynq-7000 Actel SmartFusion Softprocessors – – – Actel ProAsic ARM IPCore Microblaze NIOS II FPGA’s processors performance Most frequently used are Million Instructions per Second (MIPS), Dhrystone. Dhrystone refers to a synthetic computing benchmark program Intel Atom + Altera Arria SiP PCIex1 Gen2 MicroBlaze processor System balance problem – – High bandwidth Small computing power (softprocessor) Area optimizations options Option opt_mode area resource sharing. same operations by independent functions maximum fanout flattening design hierarchy is a synthesis option. It allows the tool to perform optimizations across module boundaries, FSM encoding (BRAMs) – reduces the total number of slices that MAP can target Area optimization Coding Style 'case' instead of 'if‘ Balanced use of FPGA resources (try to convert some of the logic that uses registers and LUTs to BRAMs, DSP, Shift Register LUT) CFGLUT5 – a dynamically reconfigurable 5-input LUT primitive Using the CDI pin, a new INIT value can be synchronously shifted in serially to change the logical function. Example: Pattern matcher. Pattern can be directly configured as TrueTable Timing constraints NET "clk20" TNM_NET = “tnm_clk20"; TIMESPEC "TS_clk20" = PERIOD “tnm_clk20" 20 ns HIGH 50 %; PERIOD covers register to register path