FPGA Design Tips

Transcription

FPGA Design Tips
FPGA Design Tips
Programable
Logic
Devices
dr inż. Paweł Russek
Xilinx FPGA Build Process
•
•
From design description to a
bitstream
eight fundamental steps:
•
compiling (XST, Synplify, etc):
•
•
•
pre-build,
synthesis,
physical implementation
(ISE, PlanAhead):
ngdbuild,
map,
place-and-route,
static timing analysis,
bitgen
post-build
Build Process Basic Steps Synthesis
•
•
•
•
Synthesis: converting a design written in a HDL into a
netlist (NGC or EDIF)
Netlist Translation (NGDBUILD): netlist translation
into Xilinx Native Generic Database (NGD) file that
contains: user constraints, FPGA part.
MAP (MAP tool): mapping of a design into Xilinx FPGA
components (outputs NCD file).
Place&Route (PAR tool): PAR outputs an NCD file
that contains complete place and route information
-Static Timing Analysis
-Bitstream generation
Writing synthesizable code
•
•
•
•
•
•
use synchronous design reset.
avoid using latches. Use synchronous registers
whenever possible
avoid using gated, derived, or divided clocks
use clock enables instead of multiple clocks
implement proper synchronization of all
asynchronous signals
Not supported language constructs
•
•
4-state values (‘0’, ‘1’, ‘x’, ‘z’).
delays, wait, initialization
Good coding practices
•
full 'case' directives
CASE sel IS
WHEN "00" => output <= in0;
WHEN "11" => output <= in1;
WHEN OTHERS => output <= 'X';
END CASE;
•
complete 'if' clauses
IF (s0='0' AND s1='0') THEN
output <= in0;
ELSIF (s0='1' AND s1=‘1') THEN
output <= in1;
ELSE
-- (s0 is not equal s1)
output <= 'X';
END IF;
Instantiation vs. Inference
•
instantiation
•
directly references an FPGA library primitive or
macro in HDL
•
•
•
provides complete control,
CoreGen Tool (complex arithmetic blocks).
inference
•
component inference refers to writing a generic RTL
description
•
•
•
more portable code,
vary between different tools,
complex component can't be inferred (MMCM)
Inferring registers
-- Flip-Flop with Positive-Edge Clock
-library ieee;
use ieee.std_logic_1164.all;
entity registers_1 is
port(C, D : in std_logic;
Q : out std_logic);
end registers_1;
architecture archi of registers_1 is
begin
process (C)
begin
if (C'event and C='1') then
Q <= D;
end if;
end process;
end archi;
Inferring memories
XST User Guide for coding style
Inferring Shift Registers
•
•
using flip-flops (use reset)
using SRL (only left shift operation, no reset)
wire srl1_out;
reg [7:0] srl1;
always @(posedge clk)
srl1 <= srl_enable ? {srl1[6:0] , d_in[0]} : srl1;
assign srl1_out = srl1[7];
•
Barell shifters can be implemented using
DSP48
Shift registers using BRAMs
shift operation is performed by doing read from
port A of the BRAM, concatenating delayed din
with the data output from the port A, and
writing the result to the port B
Inferring IO
•
•
•
if a port in a top-level module doesn’t contain
any constraints, it’ll be implemented using an
IO with default characteristics.
For Xilinx FPGAs, a synthesis tool will
automatically infer IBUF, OBUF, OBUFDS,
Designers can specify port characteristics by
using UCF constraints
Inferring IOB Registers
•
•
•
•
Every IO block in Xilinx FPGAs contains storage
elements,
The available options are input register, and output
register for single or dual-data rate (DDR) outputs.
Using IOB registers significantly decreases clock-toinput/output data time, which improves IO
performance.
Placing a register in the IOB is not guaranteed;
•
•
•
•
has to have a fanout of 1 (higher fanout have to be replicated)
UCF constraint: INST <register_instance_name> IOB =
TRUE|FALSE;
MAP option: “Pack I/O Registers into IOBs” property, which
applies globally to all IOs.
inferring DDR registers in the IOB requires an explicit
instantiation of a primitive.
Inferring latches
•
In many cases latches are inferred unintentionally due
to poor coding style practices.
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
-- Latch with Positive Gate
-library ieee;
use ieee.std_logic_1164.all;
entity latches_1 is
port(G, D : in std_logic;
Q : out std_logic);
end latches_1;
architecture archi of latches_1 is
begin
process (G, D)
begin
if (G='1') then
Q <= D;
end if;
end process;
end archi;
Clocking Resources
•
FPGAs provide dedicated low-skew routing
resources for clocks.
•
•
A skew is defined as the difference in arrival times
of a clock edge to synchronous logic elements
three types of clocks:
•
•
•
global, which can drive synchronous logic on the
entire die;
regional, which can drive logic in specific and
adjacent regions
IO, which can serve the logic specific to that IO.
Mixed-mode Clock Manager MMCM
Asynchronous Reset Schemes
•
•
•
•
•
asynchronous reset nets are not included in the static
timing analysis.
Using asynchronous reset may result in sub-optimal
logic utilization.
Asynchronous resets prevent synthesis tools from
performing certain logic optimizations, such as taking
advantage of internal registers of DSP48
can cause problems
synchronization (figure 1)
glitches (figure 2)
but asynchronous reset nets usually have more
relaxed timing constraints comparing to the
synchronous reset.
Examples
Synchronization problem
Glitches
Synchronous Reset Scheme
•
Xilinx recommends using synchronous reset
scheme whenever possible
•
•
High fanout cause problems
No reset scheme also possible
•
•
By default, the register is initialized with ‘0’ on FPGA
power-up
When no external reset
•
dedicated STARTUP_VIRTEX6 primitive, which Xilinx
provides for Virtex-6 FPGAs.( but not portable)
FPGA configuration architecture
FPGA configuration modes
FPGA configuration modes
In master configuration modes, an FPGA
controls configuration process.
In slave modes, FPGA configuration is
controlled by external devices: CPLD,
uC,FPGA
JTAG
ICAP - A dedicated ICAP primitive interfaces
with the user logic to perform configuration
from within the FPGA fabric.
FPGA configuration modes
In Master Serial Mode the FPGA is controlling
the Xilinx Platform Flash to provide the
configuration data.
In Master SPI Flash Mode, the FPGA is
controlling serial SPI Flash
Master SelectMAP Mode, the FPGA is
controlling Xilinx Platform Flash to provide 8- or
16-bit wide configuration data.
In Master BPI Mode, the FPGA is controlling a
parallel NOR Flash to provide 8- or 16-bit wide
configuration data.
FPGA power consumption
XPower Estimator Spreadsheet.
–
Entered design parameters, such as clocks, IOs,
logic and memory resources, and toggle rates
can be either entered directly or imported from
the existing MAP report
Simulation and Synthesis Results
Mismatch
Incorrect simulation environment
–
–
Incorrect IO constraints (standard, voltage level, drive
strength, or slew rate.)
Incorrect timing constraints (Clock domain crossing)
Incorrect project settings(speed grade)
Asynchronous reset
Hardware failure
–
–
–
–
Incorrect device interface simulation
Insufficient simulation coverage
invalid voltage
wrong frequency
misbehaving peripheral device
signal integrity issue
Synthesis or physical implementation tool bug
FPGA Based processors
Hardwired
–
–
Intel Atom E6x5C
ARM Cortex
Xilinx Zynq-7000
Actel SmartFusion
Softprocessors
–
–
–
Actel ProAsic ARM IPCore
Microblaze
NIOS II
FPGA’s processors performance
Most frequently used are Million Instructions
per Second (MIPS), Dhrystone.
Dhrystone refers to a synthetic computing
benchmark program
Intel Atom + Altera Arria
SiP
PCIex1 Gen2
MicroBlaze processor
System balance problem
–
–
High bandwidth
Small computing power (softprocessor)
Area optimizations options
Option opt_mode area
resource sharing. same operations by
independent functions
maximum fanout
flattening design hierarchy is a synthesis
option. It allows the tool to perform
optimizations across module boundaries,
FSM encoding (BRAMs)
–
reduces the total number of slices that MAP can
target
Area optimization Coding Style
'case' instead of 'if‘
Balanced use of FPGA resources (try to
convert some of the logic that uses registers
and LUTs to BRAMs, DSP, Shift Register LUT)
CFGLUT5
–
a dynamically reconfigurable 5-input LUT primitive
Using the CDI pin, a new INIT value can
be synchronously shifted in serially to
change the logical function.
Example:
Pattern matcher. Pattern can be directly
configured as TrueTable
Timing constraints
NET "clk20" TNM_NET = “tnm_clk20";
TIMESPEC "TS_clk20" = PERIOD “tnm_clk20"
20 ns HIGH 50 %;
PERIOD covers register to register path