Lesson 10: Reset, Registers, and Counters in Verilog
10.1 Introduction to Reset, Registers, and Counters
Introduction to Reset, Registers, and Counters
In digital design, three building blocks appear in virtually every FPGA project: resets, registers, and counters. Before writing complex state machines or communication protocols, a designer must have a solid understanding of how these primitives behave, how they are described in Verilog, and how synthesis tools map them onto FPGA hardware resources.
This chapter introduces each concept from first principles and progressively builds toward practical, production-ready design patterns used in real FPGA development.
Register
What Is a Register?
A register is a storage element that captures and holds a binary value on the active edge of a clock signal. In FPGA devices, registers are implemented using dedicated flip-flops (FFs) embedded throughout the logic fabric.
In Verilog, a register is inferred whenever a variable of type reg is assigned inside an always @(posedge clk) block. The synthesis tool recognizes the edge-triggered assignment and maps it to a physical flip-flop on the target device.
A minimal D flip-flop in Verilog looks like this:
module dff (
input wire clk,
input wire d,
output reg q
);
always @(posedge clk) begin
q <= d;
end
endmodule
Notice the use of the non-blocking assignment <= inside the always block. This is the correct operator for sequential (clocked) logic and is essential for correct simulation and synthesis behavior.
Reset
What Is a Reset?
A reset is a control signal that forces one or more registers into a known, predictable state — typically all zeros. Without a reset, the flip-flop's initial state after power-up is undefined. This is especially important in FPGA designs because the device's flip-flops may power up in an arbitrary state.
There are two fundamental types of reset:
- Synchronous Reset — the reset takes effect only on the next rising clock edge. It is treated like any other data input to the flip-flop.
- Asynchronous Reset — the reset takes effect immediately, regardless of the clock. It is listed in the always sensitivity list alongside posedge clk.
Both styles are widely used; each has trade-offs in terms of timing, FPGA resource usage, and design safety. These trade-offs are discussed in detail in Sections 10.2-10.4.
Counter
What Is a Counter?
A counter is a register whose stored value increments (or decrements) by a fixed amount on each active clock edge. Counters are among the most frequently used circuits in digital design and appear in applications such as:
- Generating timing delays and periodic events
- Addressing memory locations sequentially
- Controlling loop iterations in hardware state machines
- Baud rate generation in communication interfaces (UART, SPI, I²C)
- PWM signal generation for motor and LED control
A counter is structurally just a register with its output fed back into an adder. In Verilog, this feedback is expressed naturally within the always block:
module counter_4bit (
input wire clk,
input wire rst_n,
output reg [3:0] count
);
always @(posedge clk or negedge rst_n) begin
if (!rst_n)
count <= 4'b0000;
else
count <= count + 4'b0001;
end
endmodule
How These Concepts Relate
Resets, registers, and counters are not independent topics — they form a hierarchy of abstraction:
Flip-Flop (FF) — the physical primitive provided by the FPGA fabric. A single-bit storage cell clocked by the system clock.
Register — one or more flip-flops grouped together to store a multi-bit value, usually sharing a common clock and reset signal.
Counter — a register augmented with combinational feedback logic (an adder or subtractor) that modifies its own value on each clock cycle.
Reset — a mechanism applied at the flip-flop level that initializes all registers and counters to a known state at startup or on demand.
Understanding this hierarchy helps designers reason about timing, area usage, and reliability of their FPGA implementations.
FPGA Hardware Context
On Intel MAX-10 FPGAs (used in this course with the Terasic DE10-Lite board), each Logic Element (LE) contains one dedicated flip-flop. The Quartus Prime synthesis tool automatically infers flip-flops from your Verilog code and maps them to these hardware resources.
Key facts relevant to this chapter:
- MAX-10 flip-flops support both synchronous and asynchronous clear (reset) natively — choosing the right reset style affects how efficiently the LE is utilized.
- Multi-bit registers consume one LE per bit. A 16-bit register requires at least 16 LEs.
- Counters synthesize efficiently because Quartus recognizes the increment pattern and applies carry-chain optimization across adjacent LEs.
- The RTL Viewer and Technology Map Viewer in Quartus Prime are valuable tools for verifying that your Verilog code infers the hardware structure you intend.
A Note on the initial Block and Register Initialization
Designers familiar with simulation often ask whether the initial block can be used to set a register's starting value in synthesized designs. The answer depends on the target technology and requires careful understanding.
On FPGAs
On FPGAs (Including Intel MAX-10)
Most modern FPGA synthesis tools — including Quartus Prime, Vivado, and ISE — do recognize initial blocks for register initialization. The specified value is programmed into the flip-flop's configuration bitstream and applied at power-up.
reg [7:0] count;
initial begin
count = 8'hFF; // Quartus will honor this as the power-up value
end
However, there are important limitations:
- The initial block is not a reset — it sets the register value only once at power-up or FPGA configuration. It is not re-applied when a reset signal is asserted during normal operation.
- It does not replace a proper reset circuit. If the board is reset via a reset pin rather than a full power cycle, the initial value will not be re-applied.
- Behavior can vary between FPGA families and tool versions. Always verify the inferred hardware using the RTL Viewer or Technology Map Viewer in Quartus Prime.
On ASICs
On ASICs
In ASIC synthesis (e.g., with Synopsys Design Compiler), the synthesizer completely ignores initial blocks. They are treated as simulation-only constructs. This is why professional RTL coding standards typically prohibit initial blocks in synthesizable RTL entirely.
Recommended Practice
Even on FPGAs, always use an explicit reset signal rather than relying on an initial block for register initialization:
// Recommended: explicit reset, works correctly in all contexts
always @(posedge clk or negedge rst_n) begin
if (!rst_n)
count <= 8'hFF;
else
count <= count + 1;
end
// Avoid in synthesizable RTL: simulation-only habit
initial begin
count = 8'hFF;
end
The following table summarizes support across different contexts:
| Context | initial Supported? | Recommended? |
|---|---|---|
| Icarus Verilog (simulation) | ✓ Yes | ✓ Yes |
| Quartus Prime / Intel MAX-10 | ✓ Yes (power-up only) | ⚠ Use with caution |
| Vivado / Xilinx | ✓ Yes (power-up only) | ⚠ Use with caution |
| ASIC Synthesis | ✗ No | ✗ Never |
Safe rule for designers: use the initial block only in testbenches, and always use an explicit reset signal in synthesizable RTL. This habit ensures your code is portable, predictable, and professionally correct across both FPGA and ASIC flows.
The reg Keyword: Combinational vs Sequential Logic
One of the most common points of confusion in Verilog is the meaning of the reg keyword. Many designers initially assume that reg always implies a flip-flop — this is incorrect. Understanding the distinction is fundamental to writing correct, synthesizable Verilog.
The Key Insight: reg Is a Data Type, Not a Hardware Primitive
In Verilog, reg simply means "a variable that can be assigned inside an always block." It does not automatically infer a flip-flop. What determines whether the synthesized hardware is combinational or sequential is the sensitivity list of the always block, not the reg keyword itself.
Combinational Logi
Combinational Logic Using reg
When a reg variable is assigned inside an always @(*) block, the synthesis tool infers combinational logic gates — no flip-flop is created. The output updates immediately whenever any input signal changes.
// reg here → NO flip-flop inferred, purely combinational
always @(*) begin
if (sel)
y = a; // blocking assignment
else
y = b;
end
• always @(*) instructs the simulator to re-evaluate the block whenever any signal in the expression changes.
• The blocking assignment = executes immediately, top to bottom, within the same simulation time step.
• The synthesis tool infers a multiplexer, not a flip-flop.
• The output y does not hold its value — it is purely a function of the current inputs.
Sequential Logic
Sequential Logic Using reg
When a reg variable is assigned inside an always @(posedge clk) block, the synthesis tool infers a D flip-flop. The output updates only on the active clock edge and holds its value between edges.
// reg here → flip-flop IS inferred
always @(posedge clk or negedge rst_n) begin
if (!rst_n)
q <= 8'h00; // non-blocking assignment
else
q <= d;
end
• always @(posedge clk) instructs the simulator to evaluate the block only on the rising clock edge.
• The non-blocking assignment <= schedules the update to occur at the end of the current simulation time step, correctly modeling concurrent flip-flop behavior.
• The synthesis tool infers a D flip-flop with an asynchronous active-low reset.
• The output q holds its value between clock edges.
The Danger Zone
The Danger Zone: Unintended Latches
If combinational logic is written with an incomplete conditional statement — for example, an if without a matching else — the synthesizer is forced to infer a latch. A latch is a level-sensitive storage element that is neither a clean combinational gate nor a proper flip-flop, and it typically indicates a design error.
// Latch inferred — missing else branch!
always @(*) begin
if (en)
y = d;
// When en = 0, what is y? The tool must hold the value → latch
end
// Correct — output is defined in every branch, no latch
always @(*) begin
if (en)
y = d;
else
y = 8'h00; // explicit default eliminates the latch
end
Quartus Prime will issue a warning whenever a latch is inferred. Treat all unintended latch warnings as errors and resolve them before proceeding.
Side-by-Side Comparison
| Combinational | Sequential | |
|---|---|---|
| Sensitivity list | always @(*) |
always @(posedge clk) |
| Assignment operator | = (blocking) |
<= (non-blocking) |
| Hardware inferred | Multiplexer/logic gates | Flip-flop (register) |
| Output updates when | Any input changes | Clock edge arrives |
| Holds value? | ✗ No | ✓ Yes |
| Clock needed? | ✗ No | ✓ Yes |
Design rule: the reg keyword tells Verilog how the variable is driven — the sensitivity list tells the synthesizer what hardware to build. Always use always @(*) with blocking = for combinational logic, and always @(posedge clk) with non-blocking <= for sequential logic. Mixing these conventions is one of the most frequent sources of simulation-synthesis mismatch encountered in FPGA design.
10.2 Reset Fundamentals: Synchronous vs Asynchronous
Reset Fundamentals: Synchronous vs Asynchronous
Reset is one of the most fundamental — and most frequently underestimated — mechanisms in digital system design. A poorly designed reset strategy can cause a system to start from an illegal state after power-up, fail to recover from runtime errors, or even result in bus lockups and damage to external devices. This section examines the behavioral differences, synthesis implications, and practical use cases of synchronous and asynchronous reset from first principles.
Why Is Reset Necessary?
Flip-flops on an FPGA power up in an undefined state. Although some FPGAs — including the Intel MAX-10 — allow the power-up value of each flip-flop to be programmed into the configuration bitstream, this only applies at the moment of power-up and does not address the following runtime scenarios:
- A runtime error has occurred and the control logic must be forced back to a known safe state.
- An external device requires reinitialization of a communication protocol (e.g., I2C bus reset, UART framing error recovery).
- A watchdog timer has detected a system fault and triggered a system-wide reset.
- A multi-clock-domain system requires all clock domains to be cleared simultaneously.
An explicit reset signal is therefore a mandatory requirement of any reliable digital system design, not an optional feature.
Synchronous Reset
Synchronous Reset
The defining characteristic of a synchronous reset is that its effect takes place only on the active clock edge. In Verilog, a synchronous reset does not appear in the always sensitivity list — it is treated as an ordinary conditional input:
// Synchronous reset example
module sync_reset_dff (
input wire clk,
input wire rst_n, // Active-low synchronous reset
input wire d,
output reg q
);
always @(posedge clk) begin // Only clk in sensitivity list
if (!rst_n)
q <= 1'b0;
else
q <= d;
end
endmodule
After synthesis, a synchronous reset is implemented as a 2-to-1 multiplexer inserted before the D input of the flip-flop — not as a connection to the flip-flop's hardware CLR pin. The hardware structure is as follows:
• A 2-to-1 Mux is placed in front of the flip-flop's D input.
• When rst_n = 0, the Mux selects 0 and drives it to the D input.
• When rst_n = 1, the Mux selects the original data signal d.
• The flip-flop's dedicated CLR/PRE pins are not used, regardless of the reset state.
Timing behavior of synchronous reset:
• Even if rst_n is asserted in the middle of a clock cycle, the output q does not change immediately.
• The output is cleared only on the next rising clock edge after reset is asserted.
• The reset pulse width must be longer than one clock period; otherwise, the reset event may be missed entirely.
Asynchronous Reset
Asynchronous Reset
The defining characteristic of an asynchronous reset is that the reset takes effect immediately when asserted, without waiting for a clock edge. In Verilog, an asynchronous reset must appear in the always sensitivity list:
// Asynchronous reset example
module async_reset_dff (
input wire clk,
input wire rst_n, // Active-low asynchronous reset
input wire d,
output reg q
);
always @(posedge clk or negedge rst_n) begin // rst_n is in the sensitivity list
if (!rst_n)
q <= 1'b0; // Takes effect immediately, no clock required
else
q <= d;
end
endmodule
After synthesis, an asynchronous reset is connected directly to the flip-flop's hardware CLR (Clear) pin — a feature natively supported by every Logic Element (LE) in the Intel MAX-10. No additional Mux logic is required.
• The reset signal drives the flip-flop's CLR pin directly.
• The CLR pin is asynchronous: it responds immediately, independent of the clock.
• No additional LUT resources are consumed compared to the synchronous Mux approach.
Timing behavior of asynchronous reset:
• As soon as rst_n is pulled low, the output q is cleared to 0 after the flip-flop's propagation delay (typically a few nanoseconds).
• The reset takes effect even if no clock edge is present.
• The minimum reset pulse width is determined by the flip-flop's CLR minimum pulse width specification — typically far shorter than one clock period.
The Risk of Asynchronous Reset
The Risk of Asynchronous Reset: Metastability at Deassertion
The primary risk of asynchronous reset occurs not during assertion, but during deassertion — when rst_n transitions from 0 back to 1. If this transition occurs too close to a rising clock edge — violating the flip-flop's recovery time specification — the flip-flop may enter metastability. This can cause individual flip-flops to exit reset at different times, leaving an FSM in an illegal initial state.
The standard solution is a Reset Synchronizer, which preserves the immediate assertion behavior of the asynchronous reset while ensuring that deassertion is synchronized to the clock:
// Reset Synchronizer: asynchronous assert, synchronous deassert
module reset_synchronizer (
input wire clk,
input wire async_rst_n, // Asynchronous reset from external source
output wire sync_rst_n // Synchronized reset for internal logic
);
reg meta_ff, sync_ff;
always @(posedge clk or negedge async_rst_n) begin
if (!async_rst_n)
{sync_ff, meta_ff} <= 2'b00; // Asynchronous assert: clear immediately
else
{sync_ff, meta_ff} <= {meta_ff, 1'b1}; // Synchronous deassert: two-stage chain
end
assign sync_rst_n = sync_ff;
endmodule
This two-stage synchronizer chain eliminates the metastability risk during deassertion. It is the industry-standard reset circuit and should be included in any design that uses asynchronous reset.
When to Use Each Reset Style
Choosing between synchronous and asynchronous reset is an engineering decision driven by design requirements, not personal preference. The following guidelines identify the appropriate choice for each scenario.
Use Synchronous Reset When:
- Designing datapath modules — pipeline registers, data buffers, DSP accumulators, and similar modules do not require a reset in the absence of a clock. Synchronous reset integrates cleanly into Static Timing Analysis (STA), allowing the tool to fully characterize the reset path delay.
- The reset signal source may contain glitches — synchronous reset naturally filters out reset pulses narrower than one clock period, making it more robust when the reset source is noisy or asynchronous to the system clock.
- Designing purely for FPGA with no ASIC portability requirement — Quartus Prime optimizes the LUT-Mux structure for synchronous reset efficiently, and the resulting design is straightforward to analyze with timing tools.
- Implementing counters and sequence generators — these modules are typically reset by internal control logic (e.g., when the count reaches a terminal value), which is inherently a synchronous operation.
Use Asynchronous Reset When:
- Designing system control FSMs (CPU datapath controller, bus arbiter) — if the control logic enters an illegal state, it must be cleared immediately without waiting for the next clock edge.
- Implementing communication protocol controllers (I2C, UART, SPI, CAN) — protocol errors such as framing errors, bus lockups, and arbitration loss require recovery within nanoseconds. A synchronous reset delay could result in a protocol violation.
- Handling power-on reset (POR) — the system clock is not yet stable at power-up. Only an asynchronous reset can initialize flip-flops before the clock is running.
- Working with multi-clock-domain designs — a global reset must clear all clock domains simultaneously. Asynchronous reset guarantees this; synchronous reset cannot, because each domain responds only on its own clock edge.
- Integrating with commercial IP cores (ARM Cortex-M, AXI, PCIe) — most commercial IP uses asynchronous reset interfaces conforming to the AMBA specification. Synchronous reset in custom modules requires additional reset bridging logic.
- Watchdog timer-triggered system resets — a watchdog fires precisely because the system (including possibly the clock) has malfunctioned. Asynchronous reset is the only reliable choice.
Practical Case Studies
Case 1: UART Receive Controller (Asynchronous Reset)
Case 1: UART Receive Controller (Asynchronous Reset)
The UART receiver FSM detects the start bit, samples data bits, and verifies the stop bit. Line noise can drive the FSM into an illegal state, locking up the entire receive process. An external reset signal must be able to clear the FSM immediately:
// UART RX FSM — Asynchronous reset
// Protocol errors require immediate recovery; cannot wait for a clock edge
module uart_rx_fsm (
input wire clk,
input wire rst_n, // Asynchronous reset from system Reset Synchronizer
input wire rx,
output reg frame_error
);
localparam IDLE = 2'd0;
localparam START = 2'd1;
localparam DATA = 2'd2;
localparam STOP = 2'd3;
reg [1:0] state;
reg [2:0] bit_cnt;
always @(posedge clk or negedge rst_n) begin
if (!rst_n) begin
state <= IDLE; // Immediately return to safe state
bit_cnt <= 3'd0;
frame_error <= 1'b0;
end else begin
case (state)
IDLE: if (!rx) state <= START;
START: state <= DATA;
DATA: begin
bit_cnt <= bit_cnt + 1;
if (bit_cnt == 3'd7) state <= STOP;
end
STOP: begin
frame_error <= !rx;
state <= IDLE;
end
endcase
end
end
endmodule
Case 2: I2C Bus Arbitration Controller (Asynchronous Reset)
Case 2: I2C Bus Arbitration Controller (Asynchronous Reset)
In the I2C protocol, if the master loses bus arbitration, it must release SDA and SCL immediately. Holding the bus beyond the allowed window locks the entire I2C bus. The reset response must occur well within one SCL period (10 µs at 100 kHz standard mode):
// I2C Master Arbitration Control — Asynchronous reset
// The bus must be released immediately upon arbitration loss
module i2c_arb_ctrl (
input wire clk,
input wire rst_n, // Asynchronous reset
input wire arb_lost, // Arbitration lost flag
output reg sda_oe, // SDA output enable (0 = release bus)
output reg scl_oe // SCL output enable
);
always @(posedge clk or negedge rst_n) begin
if (!rst_n) begin
sda_oe <= 1'b0; // Release SDA immediately
scl_oe <= 1'b0; // Release SCL immediately
end else begin
if (arb_lost) begin
sda_oe <= 1'b0;
scl_oe <= 1'b0;
end
end
end
endmodule
Case 3: Pipelined Multiplier Stage Register (Synchronous Reset)
Case 3: Pipelined Multiplier Stage Register (Synchronous Reset)
Intermediate pipeline registers in a DSP datapath only need to be initialized at system startup. There is no requirement to reset them while the clock is stopped. Synchronous reset allows the STA tool to fully characterize the reset path and eliminates the timing uncertainty associated with asynchronous deassertion:
// Pipeline Multiplier Stage 1 Register — Synchronous reset
// Datapath registers do not require emergency clearing;
// synchronous reset provides cleaner timing analysis
module pipe_mult_stage1 (
input wire clk,
input wire rst_n, // Synchronous reset
input wire [15:0] a,
input wire [15:0] b,
output reg [31:0] product_r
);
always @(posedge clk) begin // Only clk in sensitivity list
if (!rst_n)
product_r <= 32'd0;
else
product_r <= a * b;
end
endmodule
Case 4: Baud Rate Generator Counter (Synchronous Reset)
Case 4: Baud Rate Generator Counter (Synchronous Reset)
The counter inside a baud rate generator is cleared by internal logic when it reaches its terminal count. This is inherently a synchronous operation, making synchronous reset the natural and correct choice:
// Baud Rate Generator Counter — Synchronous reset
// Counter rollover is a synchronous operation; synchronous reset is most appropriate
module baud_gen #(
parameter CLK_FREQ = 50_000_000,
parameter BAUD_RATE = 115200,
localparam DIVIDER = CLK_FREQ / BAUD_RATE - 1
) (
input wire clk,
input wire rst_n,
output reg baud_tick
);
reg [$clog2(DIVIDER)-1:0] count;
always @(posedge clk) begin // Synchronous reset
if (!rst_n) begin
count <= 'd0;
baud_tick <= 1'b0;
end else if (count == DIVIDER) begin
count <= 'd0;
baud_tick <= 1'b1;
end else begin
count <= count + 1;
baud_tick <= 1'b0;
end
end
endmodule
Comparison Summary
| Criteria | Synchronous Reset | Asynchronous Reset |
|---|---|---|
| Trigger condition | Rising clock edge + rst_n = 0 | rst_n = 0 (immediate) |
| Sensitivity list | always @(posedge clk) |
always @(posedge clk or negedge rst_n) |
| Synthesized hardware | Mux before D input (uses LUT) | Directly drives FF CLR pin (no extra LUT) |
| Effective when clock is stopped | ✗ No | ✓ Yes |
| Power-on reset support | ✗ Unreliable | ✓ Reliable |
| Reset response time | Up to one full clock period | Immediate (nanoseconds) |
| Glitch filtering | ✓ Natural filtering | ✗ Requires external filtering |
| Multi-clock-domain consistency | ✗ Each domain responds independently | ✓ All domains cleared simultaneously |
| Static Timing Analysis (STA) | ✓ Fully supported by tools | ⚠ Requires recovery/removal constraints |
| Metastability risk | ✗ None | ⚠ Requires Reset Synchronizer at deassertion |
| IP interoperability | ⚠ Depends on IP specification | ✓ Conforms to AMBA/AXI standard |
| Typical applications | Datapath, counters, DSP pipelines | Control FSMs, protocol controllers, POR |
Design Rules Summary
The following four principles guide reset strategy selection in professional FPGA design:
Rule 1: Use asynchronous reset for control logic; use synchronous reset for datapath logic.
Control modules (FSMs, protocol controllers) must recover to a safe state under any condition, including clock failure. Datapath modules only require initialization at startup, and synchronous reset produces cleaner timing analysis results.
Rule 2: Always pair asynchronous reset with a Reset Synchronizer.
The industry-standard pattern is "asynchronous assert, synchronous deassert." This preserves the immediate clearing capability of the asynchronous reset while guaranteeing stable, glitch-free deassertion synchronized to the clock.
Rule 3: Use active-low reset consistently throughout the entire design.
The naming convention rst_n is the industry standard. The hardware CLR pin of the Intel MAX-10 flip-flop is also active-low. Maintaining this convention throughout the design eliminates polarity confusion errors.
Rule 4: Verify that the reset pulse width meets device specifications.
A synchronous reset pulse must be wider than one clock period. An asynchronous reset pulse must meet the flip-flop's minimum CLR pulse width requirement. Both values can be found in the Intel MAX-10 Device Handbook.
10.3 Reset Best Practices in FPGA Design
Reset Best Practices in FPGA Design
The previous section established the behavioral and structural differences between synchronous and asynchronous reset. This section translates that theory into concrete, actionable design rules for professional FPGA development. Following these practices consistently will produce reset networks that are reliable, timing-clean, and maintainable across the full project lifecycle.
Active-Low vs Active-High Reset
A reset signal can be designed to assert on a logic low (active-low, rst_n = 0 triggers reset) or on a logic high (active-high, rst = 1 triggers reset). Both are functionally valid, but the industry standard — and the recommendation for this course — is active-low reset for the following reasons:
- Hardware alignment: The dedicated CLR pin of the Intel MAX-10 flip-flop is natively active-low. Using active-low reset in RTL maps directly to this hardware pin without requiring an inverter, saving logic resources and avoiding additional propagation delay.
- Fail-safe behavior: An undriven or floating reset line defaults to logic low (0), asserting the reset and holding the system in a safe, initialized state. With an active-high reset, a floating line would release the reset and allow the system to run from an undefined state.
- Industry convention: The naming suffix _n (e.g., rst_n, nreset) is universally recognized as an active-low signal. Most commercial IP cores, AMBA bus interfaces, and reference designs follow this convention.
The following example demonstrates both polarities side by side. Note that the only difference is the condition used to check the reset state:
// Active-low reset (recommended)
always @(posedge clk or negedge rst_n) begin
if (!rst_n) // Reset is active when rst_n = 0
q <= 8'h00;
else
q <= d;
end
// Active-high reset (avoid unless required by IP interface)
always @(posedge clk or posedge rst) begin
if (rst) // Reset is active when rst = 1
q <= 8'h00;
else
q <= d;
end
Design rule: adopt active-low reset as the project-wide standard. If an external IP core uses active-high reset at its interface, add a single inversion at the boundary rather than changing the internal reset polarity of your design.
Reset Synchronizer Design
As established in Section 10.2.4, asynchronous reset deassertion can cause metastability if it occurs too close to a clock edge. The solution is a Reset Synchronizer that implements the industry-standard pattern of asynchronous assert and synchronous deassert.
Why Two Stages?
A single flip-flop synchronizer reduces but does not eliminate the probability of metastability. The two-stage chain reduces the probability to an acceptable level for virtually all practical designs. Adding a third stage provides only marginal improvement and is generally not required for clock frequencies below 500 MHz.
// Standard two-stage Reset Synchronizer
// Place one instance per clock domain in the design
module reset_synchronizer (
input wire clk,
input wire async_rst_n, // Raw asynchronous reset input
output wire sync_rst_n // Synchronized reset output for this clock domain
);
// Declare as registers with no initial value — reset handles initialization
reg stage1_ff;
reg stage2_ff;
always @(posedge clk or negedge async_rst_n) begin
if (!async_rst_n) begin
stage1_ff <= 1'b0; // Asynchronous assert: both stages clear immediately
stage2_ff <= 1'b0;
end else begin
stage1_ff <= 1'b1; // Deassert propagates through stage 1
stage2_ff <= stage1_ff; // Then through stage 2 on next cycle
end
end
assign sync_rst_n = stage2_ff;
endmodule
The behavior of this circuit is as follows:
• Assert (rst_n goes low): Both flip-flops are cleared immediately via the asynchronous CLR path. The output sync_rst_n goes low within nanoseconds regardless of the clock state.
• Deassert (rst_n goes high): The value 1'b1 propagates through stage1_ff on the first clock edge, then through stage2_ff on the second. The output sync_rst_n is guaranteed to be stable and to synchronize with the clock edge.
• Metastability handling: If stage1_ff enters metastability during deassertion, it has one full clock period to resolve before its output is sampled by stage2_ff. The probability that metastability persists for a full clock period is negligibly small at typical FPGA operating frequencies.
Multi-Clock-Domain Reset Distribution
In a design with multiple clock domains, each clock domain requires its own independent Reset Synchronizer instance. All synchronizer instances share the same external async_rst_n input but are clocked by their respective domain clocks:
// Top-level reset distribution for a two-clock-domain design
module top (
input wire clk_50m,
input wire clk_100m,
input wire ext_rst_n // Single external reset button
);
wire rst_n_50m; // Synchronized reset for the 50 MHz domain
wire rst_n_100m; // Synchronized reset for the 100 MHz domain
// Independent synchronizer for each clock domain
reset_synchronizer u_rst_sync_50m (
.clk (clk_50m),
.async_rst_n (ext_rst_n),
.sync_rst_n (rst_n_50m)
);
reset_synchronizer u_rst_sync_100m (
.clk (clk_100m),
.async_rst_n (ext_rst_n),
.sync_rst_n (rst_n_100m)
);
// Each sub-module receives the reset synchronized to its own clock domain
// uart_ctrl u_uart (.clk(clk_50m), .rst_n(rst_n_50m), ...);
// dsp_core u_dsp (.clk(clk_100m), .rst_n(rst_n_100m), ...);
endmodule
Reset Fan-out Management
A reset signal that must drive hundreds or thousands of flip-flops simultaneously creates a high-fan-out net. If this net is routed through the standard interconnect fabric, the routing delay can become so large that some flip-flops receive the reset signal significantly later than others — violating timing and potentially causing inconsistent reset behavior across the design.
Using Quartus Global Signals
Intel MAX-10 provides dedicated Global Signal routing resources that distribute a signal to every logic element on the device with minimal and uniform skew. Reset signals are ideal candidates for global routing. Quartus Prime automatically promotes high-fan-out nets to global signals in most cases, but this can also be specified explicitly in the .qsf assignment file:
# Quartus QSF: promote rst_n to a global signal
set_instance_assignment -name GLOBAL_SIGNAL GLOBAL_CLOCK -to rst_n
After compilation, verify the reset routing by opening the Chip Planner in Quartus Prime and confirming that rst_n is listed as a Global Signal in the Resource Usage Summary.
When to Insert Reset Buffers Manually
In very large designs where the global signal resources are already consumed by clock signals, manual reset buffering may be required. This involves inserting a buffer tree between the Reset Synchronizer output and the downstream logic. However, for the scale of designs in this course (Intel MAX-10, DE10-Lite), automatic global signal promotion is sufficient and manual buffering is not necessary.
Reset Constraints in Quartus Prime
Static Timing Analysis (STA) must be correctly configured to handle reset signals. Without proper constraints, the timing tool may either report false violations on the asynchronous reset path or — more dangerously — fail to check the recovery/removal timing altogether.
Verifying Reset Inference in RTL Viewer
Before applying timing constraints, confirm that Quartus has correctly inferred the intended reset hardware:
• Open Tools → Netlist Viewers → RTL Viewer after compilation.
• For synchronous reset: verify that a Mux is present before the D input of the flip-flop. The reset signal should appear as a Mux select input, not connected to the CLR pin.
• For asynchronous reset: verify that the reset signal connects directly to the CLR pin of the flip-flop symbol, with no Mux on the D path.
• Open Tools → Netlist Viewers → Technology Map Viewer (Post-Fitting) to confirm the same structure after place-and-route.
SDC Timing Constraints for Asynchronous Reset
The asynchronous reset path from the Reset Synchronizer input (async_rst_n) to the CLR pins of the downstream flip-flops is not a standard data path. It must be declared as a false path in the SDC constraints file to prevent the timing tool from reporting incorrect violations:
# SDC constraints file (.sdc)
# Declare the raw asynchronous reset input as a false path
# The Reset Synchronizer handles the timing; no setup/hold check is needed here
set_false_path -from [get_ports {ext_rst_n}] -to [all_registers]
The recovery and removal times of the Reset Synchronizer's internal flip-flops are checked automatically by the timing tool because they are clocked elements. No additional SDC entry is required for the synchronizer itself.
Checking Recovery and Removal Times
After applying the SDC constraints, verify the asynchronous reset timing as follows:
• Open Tools → Timing Analyzer.
• Run the Report Recovery and Report Removal analyses.
• Recovery time is the minimum time the reset must be deasserted before the next clock edge for the flip-flop to exit reset reliably.
• Removal time is the minimum time the reset must remain asserted after a clock edge to guarantee that the reset is recognized.
• Both values must show a positive slack. A negative slack indicates a timing violation that must be resolved before the design is considered timing-clean.
Common Reset Design Errors
The following errors are common in practice and among the most difficult to diagnose because they often cause intermittent, non-reproducible failures rather than obvious, immediate faults.
Error 1: Reset Polarity Inversion
Error 1: Reset Polarity Inversion
Mixing active-low and active-high reset signals without explicit inversion at the boundary is a common source of a system that never initializes correctly. The symptom is typically that the design works normally when the reset button is not pressed, but fails or locks up when it is pressed.
// ERROR: rst_n is active-low but is used without inversion
// This module is always in reset when it should be running
always @(posedge clk or negedge rst_n) begin
if (rst_n) // Bug: should be (!rst_n)
q <= 8'h00;
else
q <= d;
end
// CORRECT: check the inverted value for active-low reset
always @(posedge clk or negedge rst_n) begin
if (!rst_n)
q <= 8'h00;
else
q <= d;
end
Error 2: Asynchronous Reset Crossing Clock Domains Without a Synchronizer
Error 2: Asynchronous Reset Crossing Clock Domains Without a Synchronizer
Connecting a raw asynchronous reset directly from one clock domain to registers in another clock domain without a Reset Synchronizer is one of the most serious reset design errors. The symptom is intermittent failure at power-up or after reset release, particularly at higher operating frequencies or across temperature and voltage corners.
// ERROR: raw async reset driven directly into a different clock domain
module domain_b (
input wire clk_b,
input wire raw_rst_n, // This signal is asynchronous to clk_b — DANGEROUS
output reg q
);
always @(posedge clk_b or negedge raw_rst_n) begin
if (!raw_rst_n) q <= 1'b0;
else q <= ~q;
end
endmodule
// CORRECT: pass raw_rst_n through a Reset Synchronizer clocked by clk_b first
// reset_synchronizer u_sync (.clk(clk_b), .async_rst_n(raw_rst_n), .sync_rst_n(rst_n_b));
module domain_b (
input wire clk_b,
input wire rst_n_b, // Synchronized reset — safe to use
output reg q
);
always @(posedge clk_b or negedge rst_n_b) begin
if (!rst_n_b) q <= 1'b0;
else q <= ~q;
end
endmodule
Error 3: Reset Pulse Too Narrow
Error 3: Reset Pulse Too Narrow
When a reset is generated by software (e.g., a processor writing to a control register bit), the pulse width may be only one or two processor clock cycles, which may be shorter than the required minimum pulse width for downstream reset logic, especially across clock domain boundaries. Always verify the minimum reset pulse width against the specifications of the slowest clock domain in the reset network.
Error 4: Using Reset Inside a Combinational always @(*) Block
Error 4: Using Reset Inside a Combinational always @(*) Block
Reset is a sequential concept — it initializes storage elements. It has no meaning inside a purely combinational always @(*) block. Placing a reset check inside combinational logic infers an unintended latch or produces incorrect simulation behavior.
// ERROR: reset check inside combinational always block — infers latch
always @(*) begin
if (!rst_n)
y = 8'h00;
else if (sel)
y = a;
// Missing else: latch inferred for the case where rst_n=1 and sel=0
end
// CORRECT: reset belongs in the clocked always block
always @(posedge clk or negedge rst_n) begin
if (!rst_n)
y <= 8'h00;
else if (sel)
y <= a;
else
y <= b;
end
Error 5: FSM Has No Default State After Reset
Error 5: FSM Has No Default State After Reset
An FSM that does not define a valid initial state in its reset condition may power up or restart in an unencoded state — one that does not correspond to any defined state in the case statement. This causes the FSM to remain inactive or behave unpredictably until an external stimulus forces it into a known state.
// ERROR: reset does not initialize the state register
// After reset, 'state' is undefined — FSM behavior is unpredictable
always @(posedge clk or negedge rst_n) begin
if (!rst_n) begin
output_reg <= 8'h00; // Data registers cleared, but state is not set
end else begin
case (state)
// ...
endcase
end
end
// CORRECT: always initialize the state register explicitly in the reset branch
always @(posedge clk or negedge rst_n) begin
if (!rst_n) begin
state <= IDLE; // FSM enters a defined, valid state after reset
output_reg <= 8'h00;
end else begin
case (state)
IDLE: // ...
// ...
endcase
end
end
Best Practices Summary
The following table consolidates all reset best practices covered in this section:
| Practice | Rule |
|---|---|
| Reset polarity | Use active-low reset (rst_n) throughout the design. Invert at IP boundaries only. |
| Reset Synchronizer | Always use a two-stage Reset Synchronizer for asynchronous reset. One instance per clock domain. |
| Assert / Deassert pattern | Asynchronous assert, synchronous deassert. Never deassert an asynchronous reset without clock synchronization. |
| Fan-out management | Use Quartus Global Signal routing for the reset net. Verify promotion in the Resource Usage Summary. |
| SDC constraints | Apply set_false_path to the raw asynchronous reset input. Check recovery and removal slack in Timing Analyzer. |
| RTL verification | Confirm correct reset inference using RTL Viewer and Technology Map Viewer after every compilation. |
| FSM initialization | Always assign the state register to a defined initial state in the reset branch. Never leave state undefined after reset. |
| Reset pulse width | Verify that the reset pulse is wider than the minimum required by the slowest clock domain in the reset network. |
| Reset in combinational logic | Never place reset checks inside always @(*) blocks. Reset belongs exclusively in clocked always @(posedge clk) blocks. |
10.4 Register Design Patterns (Single, Enable, Reset, Shift)
Register Design Patterns (Single, Enable, Reset, Shift)
A register is the fundamental storage primitive of every synchronous digital system. In practice, registers are rarely used in their bare form — they are almost always augmented with control signals such as reset, clock enable, or shift capability to meet the requirements of the surrounding logic. This section presents six progressive register design patterns, from the simplest single-bit flip-flop to the full-featured shift register, each with complete Verilog implementations and their corresponding synthesized hardware structures on the Intel MAX-10 FPGA.
Single-Bit Register (Basic D Flip-Flop)
The most fundamental register is a single D flip-flop — a one-bit storage element that captures its input on every rising clock edge and holds that value until the next active edge. It is the building block from which all other register patterns are derived.
// Single-bit D flip-flop
// Captures input d on every rising clock edge
module dff_single (
input wire clk,
input wire d,
output reg q
);
always @(posedge clk) begin
q <= d;
end
endmodule
On the Intel MAX-10, this maps directly to one Logic Element (LE) using its embedded flip-flop. The LUT portion of the LE is unused for this pattern.
Typical applications:
• Pipeline stage separator between two combinational logic blocks
• Single-bit flag storage (interrupt pending, status bit)
• Input signal registration to eliminate combinational glitches
Register with Synchronous Reset
Register with Synchronous Reset
Adding a synchronous reset gives the register a deterministic initial state that is applied on the next rising clock edge after the reset signal is asserted. The reset condition must appear as the first branch in the if statement to establish its highest priority.
// Register with synchronous active-low reset
// Reset takes effect on the next rising clock edge after rst_n is asserted
module dff_sync_rst (
input wire clk,
input wire rst_n, // Active-low synchronous reset
input wire d,
output reg q
);
always @(posedge clk) begin // Only clk in sensitivity list
if (!rst_n)
q <= 1'b0; // Reset has highest priority
else
q <= d;
end
endmodule
Synthesized hardware structure:
• A 2-to-1 Mux is inserted before the flip-flop's D input. When rst_n = 0, the Mux drives 0 to the D input; when rst_n = 1, it passes the data signal d.
• The flip-flop's dedicated CLR pin is not used. The reset is implemented entirely in the LUT logic feeding the D input.
• One LE is consumed: one LUT (for the Mux) plus one flip-flop.
Typical applications:
• Pipeline datapath registers where reset only occurs at startup
• DSP accumulators and intermediate result registers
• Registers driven by internal control logic where the clock is always active
Register with Asynchronous Reset
Register with Asynchronous Reset
An asynchronous reset clears the register immediately when asserted, without waiting for the next clock edge. As established in Sections 10.2 and 10.3, this pattern should always be used together with a Reset Synchronizer at the top level to ensure safe deassertion behavior.
// Register with asynchronous active-low reset
// Reset takes effect immediately when rst_n is asserted, regardless of clk
module dff_async_rst (
input wire clk,
input wire rst_n, // Active-low asynchronous reset
input wire d,
output reg q
);
always @(posedge clk or negedge rst_n) begin // rst_n in sensitivity list
if (!rst_n)
q <= 1'b0; // Immediate clear — no clock edge required
else
q <= d;
end
endmodule
Synthesized hardware structure:
• The reset signal connects directly to the flip-flop's hardware CLR pin — a feature natively supported by every LE in the Intel MAX-10.
• No LUT logic is consumed for the reset path. Only one flip-flop resource is used, making this the most area-efficient reset pattern.
• Compared to the synchronous reset pattern, this implementation saves one LUT input per bit.
Typical applications:
• All control FSM state registers
• Communication protocol controllers (UART, SPI, I2C)
• Any register that must be cleared at power-up or on watchdog timeout
10.4.4 Register with Clock Enable
A clock enable signal (en) allows the register to selectively ignore clock edges — the flip-flop only captures its input when en = 1. When en = 0, the output holds its current value.
// Register with clock enable
// Captures input d only when en = 1; holds current value when en = 0
module dff_enable (
input wire clk,
input wire en, // Clock enable
input wire d,
output reg q
);
always @(posedge clk) begin
if (en)
q <= d;
// Implicit else: q retains its value when en = 0
end
endmodule
Important: Never Gate the Clock Signal Directly
A common mistake among designers new to FPGA design is to control when a register captures data by gating the clock signal itself. This is a serious design error that must be avoided:
// WRONG: gating the clock with combinational logic
// This creates a glitchy, skewed clock — never do this in FPGA design
wire gated_clk = clk & en; // Dangerous: combinational glitch on clock
always @(posedge gated_clk) begin
q <= d;
end
// CORRECT: use a clock enable signal inside the always block
// The flip-flop clock is always the clean system clock
always @(posedge clk) begin
if (en)
q <= d;
end
Clock gating with combinational logic introduces glitches onto the clock network, which causes the flip-flop to trigger at unpredictable times and produces simulation-synthesis mismatches. On Intel MAX-10, the correct approach is to use the en signal inside the always block, which the synthesis tool maps directly to the LE's dedicated Clock Enable (ENA) input pin — a zero-LUT, zero-skew hardware feature available on every flip-flop in the device.
Typical applications:
• Registers that should only update when valid data is available
• Baud rate sampling — capture RX data only on the correct sampling clock tick
• Power optimization — disable unnecessary register toggling
• Write-enable controlled configuration registers
10.4.5 Register with Enable and Reset (Full-Featured Register)
Combining reset and clock enable produces the most commonly used register pattern in real FPGA designs. The priority ordering is critical: reset must always take precedence over enable. If enable were checked first, a disabled register could not be reset — leaving it in an unknown state after system initialization.
Version A: Asynchronous Reset with Enable (Recommended for Control Logic)
// Full-featured register: asynchronous reset + clock enable
// Priority: reset > enable > hold
module dff_async_rst_en (
input wire clk,
input wire rst_n, // Active-low asynchronous reset (highest priority)
input wire en, // Clock enable
input wire d,
output reg q
);
always @(posedge clk or negedge rst_n) begin
if (!rst_n) // Reset: immediate, unconditional
q <= 1'b0;
else if (en) // Enable: capture data only when enabled
q <= d;
// Implicit else: hold current value when en = 0
end
endmodule
Version B: Synchronous Reset with Enable (Recommended for Datapath Logic)
// Full-featured register: synchronous reset + clock enable
// Priority: reset > enable > hold
module dff_sync_rst_en (
input wire clk,
input wire rst_n, // Active-low synchronous reset (highest priority)
input wire en, // Clock enable
input wire d,
output reg q
);
always @(posedge clk) begin // Only clk in sensitivity list
if (!rst_n) // Reset: cleared on next clock edge
q <= 1'b0;
else if (en) // Enable: capture data only when enabled
q <= d;
// Implicit else: hold current value when en = 0
end
endmodule
Synthesized hardware structure (asynchronous version):
• The rst_n signal drives the flip-flop's hardware CLR pin directly.
• The en signal drives the flip-flop's dedicated ENA (Clock Enable) pin directly.
• Both control signals are handled entirely in dedicated LE hardware — no LUT logic is consumed for either. This is the most resource-efficient implementation of a controlled register on the MAX-10.
Typical applications:
• Configuration registers in peripheral controllers
• Data registers in FIFO write/read pointers
• Accumulator registers with conditional update logic
• Any register requiring both initialization guarantee and selective update
10.4.6 Shift Register
A shift register is a chain of flip-flops in which the output of each stage connects to the input of the next. On each clock edge, the stored bit pattern shifts one position along the chain. Shift registers are categorized by their input and output mode — serial or parallel — giving four standard types.
Type 1: SISO — Serial-In Serial-Out
The simplest shift register. Data enters one bit at a time and exits one bit at a time after propagating through the full chain. The primary application is a fixed-length delay line — the output is exactly N clock cycles behind the input.
// SISO Shift Register — 8-stage delay line
// Output is the input delayed by 8 clock cycles
module siso_shift_reg #(
parameter DEPTH = 8
) (
input wire clk,
input wire rst_n,
input wire s_in, // Serial input
output wire s_out // Serial output (delayed by DEPTH cycles)
);
reg [DEPTH-1:0] shift_reg;
always @(posedge clk or negedge rst_n) begin
if (!rst_n)
shift_reg <= {DEPTH{1'b0}};
else
shift_reg <= {shift_reg[DEPTH-2:0], s_in}; // Shift left, insert at LSB
end
assign s_out = shift_reg[DEPTH-1]; // Output from MSB (oldest bit)
endmodule
Typical applications:
• Fixed-latency pipeline delay compensation
• Edge detection (compare current and delayed versions of a signal)
• Pseudo-Random Bit Sequence (PRBS) generator with feedback taps
Type 2: SIPO — Serial-In Parallel-Out
Data enters one bit per clock cycle (serial) and the full N-bit word is available simultaneously at the output (parallel) after N clock cycles. This is the core mechanism of a serial receiver.
// SIPO Shift Register — 8-bit serial receiver
// Converts a serial bit stream into an 8-bit parallel word
module sipo_shift_reg (
input wire clk,
input wire rst_n,
input wire s_in, // Serial input (LSB first)
output reg [7:0] p_out // Parallel output — full 8-bit word
);
always @(posedge clk or negedge rst_n) begin
if (!rst_n)
p_out <= 8'h00;
else
p_out <= {p_out[6:0], s_in}; // Shift in from LSB
end
endmodule
Typical applications:
• UART receiver — assembles 8 serial data bits into one parallel byte
• SPI receiver — shifts in MISO bits and presents the full word to the data bus
• I2C byte receiver
Type 3: PISO — Parallel-In Serial-Out
A parallel N-bit word is loaded into the register in one clock cycle, then shifted out one bit per clock cycle. This is the core mechanism of a serial transmitter. A load control signal selects between loading new parallel data and shifting out existing data.
// PISO Shift Register — 8-bit serial transmitter
// Loads an 8-bit parallel word and shifts it out one bit at a time
module piso_shift_reg (
input wire clk,
input wire rst_n,
input wire load, // 1 = load parallel data; 0 = shift out
input wire [7:0] p_in, // Parallel data input
output wire s_out // Serial output (MSB first)
);
reg [7:0] shift_reg;
always @(posedge clk or negedge rst_n) begin
if (!rst_n)
shift_reg <= 8'h00;
else if (load)
shift_reg <= p_in; // Load parallel data in one cycle
else
shift_reg <= {shift_reg[6:0], 1'b0}; // Shift left, output MSB
end
assign s_out = shift_reg[7]; // MSB is transmitted first
endmodule
Typical applications:
• UART transmitter — loads a byte and shifts out each bit at the baud rate
• SPI transmitter — loads a word and shifts out MOSI bits on each SCK edge
• LED driver chain (e.g., 74HC595 serial interface)
Type 4: PIPO — Parallel-In Parallel-Out
All bits are loaded and read simultaneously. While this does not shift data in the traditional sense, it forms the basis of a standard multi-bit pipeline register — the most commonly instantiated register type in FPGA datapath design.
// PIPO Register — 8-bit pipeline stage register
// Captures all 8 input bits simultaneously on the clock edge
module pipo_register (
input wire clk,
input wire rst_n,
input wire en, // Clock enable
input wire [7:0] d, // Parallel data input
output reg [7:0] q // Parallel data output
);
always @(posedge clk or negedge rst_n) begin
if (!rst_n)
q <= 8'h00;
else if (en)
q <= d;
end
endmodule
Typical applications:
• Pipeline stage separators in multi-stage arithmetic units
• Input/output port registers in bus interfaces
• Data holding registers in memory-mapped peripheral controllers
10.4.7 Design Pattern Summary
The following table provides a quick reference for all six register patterns covered in this section:
| Pattern | Sensitivity List | Control Signals | Synthesized Hardware | Typical Use Case |
|---|---|---|---|---|
| Single-Bit DFF | posedge clk | None | 1 FF, 0 LUT | Signal registration, pipeline separator |
| Synchronous Reset | posedge clk | rst_n | 1 FF + Mux (LUT) | Datapath registers, DSP pipeline |
| Asynchronous Reset | posedge clk or negedge rst_n | rst_n (CLR pin) | 1 FF, 0 LUT | Control FSM state registers |
| Clock Enable | posedge clk | en (ENA pin) | 1 FF, 0 LUT | Selective capture, write-enable registers |
| Enable + Async Reset | posedge clk or negedge rst_n | rst_n, en | 1 FF, 0 LUT | Peripheral config registers, FIFO pointers |
| SISO Shift Register | posedge clk or negedge rst_n | rst_n | N FFs, 0 LUT | Delay line, edge detection, PRBS |
| SIPO Shift Register | posedge clk or negedge rst_n | rst_n | N FFs, 0 LUT | UART/SPI receiver, serial-to-parallel |
| PISO Shift Register | posedge clk or negedge rst_n | rst_n, load | N FFs + Mux (LUT) | UART/SPI transmitter, parallel-to-serial |
| PIPO Register | posedge clk or negedge rst_n | rst_n, en | N FFs, 0 LUT | Pipeline stages, bus interface registers |
Design rule: always verify the synthesized hardware structure using the RTL Viewer and Technology Map Viewer in Quartus Prime after compilation. Confirm that control signals are mapped to the correct dedicated LE hardware pins (CLR for reset, ENA for clock enable) rather than being absorbed into LUT logic unnecessarily.
10.5 Parameterized Registers (parameter / localparam)
10.5 Parameterized Registers (parameter / localparam)
All register examples in Section 10.4 used fixed bit-widths. A 1-bit flip-flop, an 8-bit register, a 16-bit shift register — each was written as a separate, non-reusable module. In professional FPGA design, this approach is impractical. A parameterized module is written once and instantiated at any width, depth, or configuration required by the surrounding design, without modifying the source code.
10.5.1 Syntax Quick Reference
This section assumes familiarity with the parameter, localparam, and `define constructs introduced in Lesson 05. The table below provides a brief recap of their key differences for quick reference. For full syntax details and scope rules, refer back to Lesson 05.
| Construct | Scope | Overridable at Instantiation? | Typical Use |
|---|---|---|---|
| `define | Global (entire compilation) | ✗ No | Global constants, conditional compilation |
| parameter | Module-level | ✓ Yes — via #( ) at instantiation | Module configuration (width, depth, value) |
| localparam | Module-level | ✗ No — internal constant only | Derived constants computed from parameters |
The key design rule for this section is:
• Use parameter for values that the instantiating module must be able to configure — such as data width, register depth, or reset value.
• Use localparam for constants derived from those parameters — such as address bit-width calculated from depth using $clog2. These must not be overridden externally because they would break the internal logic.
• Avoid `define for module-level configuration. Its global scope makes it unsuitable for per-instance customization.
10.5.2 Parameterized N-bit Register
The first application is to rewrite the full-featured register from Section 10.4.5 as a parameterized module. The data width is exposed as a parameter so that any instantiating module can specify the exact width it needs without modifying the register source file.
// Parameterized N-bit register with asynchronous reset and clock enable
// Default width is 8 bits; override at instantiation as needed
module reg_n #(
parameter WIDTH = 8, // Data width in bits
parameter RST_VALUE = {WIDTH{1'b0}} // Reset value — default all zeros
) (
input wire clk,
input wire rst_n,
input wire en,
input wire [WIDTH-1:0] d,
output reg [WIDTH-1:0] q
);
always @(posedge clk or negedge rst_n) begin
if (!rst_n)
q <= RST_VALUE;
else if (en)
q <= d;
end
endmodule
This single module replaces every fixed-width register variant from Section 10.4. The width and reset value are both configurable at instantiation time.
Instantiation Examples
// 8-bit register with default reset value (all zeros)
reg_n #(
.WIDTH (8),
.RST_VALUE (8'h00)
) u_data_reg (
.clk (clk),
.rst_n (rst_n),
.en (wr_en),
.d (data_in),
.q (data_out)
);
// 16-bit register with non-zero reset value
reg_n #(
.WIDTH (16),
.RST_VALUE (16'hFFFF)
) u_ctrl_reg (
.clk (clk),
.rst_n (rst_n),
.en (ctrl_wr_en),
.d (ctrl_in),
.q (ctrl_out)
);
// 32-bit register — no source code changes required
reg_n #(
.WIDTH (32)
) u_acc_reg (
.clk (clk),
.rst_n (rst_n),
.en (acc_en),
.d (acc_in),
.q (acc_out)
);
10.5.3 Using localparam for Derived Constants
When a module contains internal constants that are mathematically derived from its parameters, those constants must be computed inside the module using localparam. This guarantees that derived values remain consistent with the primary parameter regardless of the width chosen at instantiation.
A common example is a parameterized shift register where the internal counter that tracks the shift position must be wide enough to count up to DEPTH - 1. The required counter width is $clog2(DEPTH) — the ceiling log base 2 of the depth, which gives the minimum number of bits needed to represent that count.
// Parameterized SISO shift register using localparam for derived width
module siso_n #(
parameter DEPTH = 8 // Shift register depth (number of pipeline stages)
) (
input wire clk,
input wire rst_n,
input wire s_in,
output wire s_out
);
// localparam: derived from DEPTH, cannot be overridden externally
// $clog2(DEPTH) gives the number of bits needed to index DEPTH stages
localparam ADDR_W = $clog2(DEPTH);
reg [DEPTH-1:0] shift_reg;
always @(posedge clk or negedge rst_n) begin
if (!rst_n)
shift_reg <= {DEPTH{1'b0}};
else
shift_reg <= {shift_reg[DEPTH-2:0], s_in};
end
assign s_out = shift_reg[DEPTH-1];
endmodule
If DEPTH is changed from 8 to 32 at instantiation, ADDR_W automatically recalculates from 3 to 5. No internal constants need to be manually updated.
Common localparam Patterns
// Common localparam derivation patterns used in register and counter design
// Pattern 1: counter bit-width from maximum count value
parameter MAX_COUNT = 255;
localparam CNT_W = $clog2(MAX_COUNT + 1); // 8 bits for 0..255
// Pattern 2: byte count from total bit-width
parameter DATA_W = 32;
localparam BYTE_CNT = DATA_W / 8; // 4 bytes for 32-bit data
// Pattern 3: FSM state encoding width from number of states
parameter NUM_STATES = 6;
localparam STATE_W = $clog2(NUM_STATES); // 3 bits for 6 states
// Pattern 4: terminal count for a divider counter
parameter CLK_FREQ = 50_000_000;
parameter BAUD_RATE = 115_200;
localparam DIVIDER = CLK_FREQ / BAUD_RATE - 1;
localparam DIV_W = $clog2(DIVIDER + 1);
10.5.4 Parameterized PIPO Register Bank
A register bank consists of multiple independently addressable registers sharing a common clock and reset. Parameterizing both the data width and the number of registers produces a fully reusable memory-mapped register block suitable for peripheral controller designs.
// Parameterized register bank
// NUM_REGS individually addressable registers, each WIDTH bits wide
module reg_bank #(
parameter WIDTH = 8, // Data width of each register
parameter NUM_REGS = 4 // Number of registers in the bank
) (
input wire clk,
input wire rst_n,
input wire wr_en, // Write enable
input wire [$clog2(NUM_REGS)-1:0] addr, // Register address
input wire [WIDTH-1:0] wr_data, // Write data
output reg [WIDTH-1:0] rd_data // Read data
);
// localparam: address width derived from number of registers
localparam ADDR_W = $clog2(NUM_REGS);
// Register array: NUM_REGS registers, each WIDTH bits wide
reg [WIDTH-1:0] regs [0:NUM_REGS-1];
integer i;
// Write port: synchronous write with asynchronous reset
always @(posedge clk or negedge rst_n) begin
if (!rst_n) begin
for (i = 0; i < NUM_REGS; i = i + 1)
regs[i] <= {WIDTH{1'b0}}; // Clear all registers on reset
end else if (wr_en) begin
regs[addr] <= wr_data;
end
end
// Read port: combinational (asynchronous) read
always @(*) begin
rd_data = regs[addr];
end
endmodule
Instantiation Example
// 16-bit wide bank of 8 registers — suitable for a simple peripheral CSR block
reg_bank #(
.WIDTH (16),
.NUM_REGS (8)
) u_csr_bank (
.clk (clk),
.rst_n (rst_n),
.wr_en (csr_wr_en),
.addr (csr_addr),
.wr_data (csr_wr_data),
.rd_data (csr_rd_data)
);
10.5.5 Parameterized PISO Shift Register (Practical Example)
To demonstrate how parameterization applies to the shift register patterns from Section 10.4, the PISO transmitter is rewritten here as a fully parameterized module. Both the data width and the shift direction (MSB-first or LSB-first) are configurable at instantiation.
// Parameterized PISO shift register
// Configurable width and shift direction
// MSB_FIRST = 1: transmit MSB first (SPI default)
// MSB_FIRST = 0: transmit LSB first (UART default)
module piso_n #(
parameter WIDTH = 8, // Data width
parameter MSB_FIRST = 1 // Shift direction: 1 = MSB first, 0 = LSB first
) (
input wire clk,
input wire rst_n,
input wire load, // 1 = load parallel data; 0 = shift
input wire [WIDTH-1:0] p_in, // Parallel data input
output wire s_out // Serial output
);
reg [WIDTH-1:0] shift_reg;
always @(posedge clk or negedge rst_n) begin
if (!rst_n)
shift_reg <= {WIDTH{1'b0}};
else if (load)
shift_reg <= p_in;
else begin
if (MSB_FIRST)
shift_reg <= {shift_reg[WIDTH-2:0], 1'b0}; // Shift left
else
shift_reg <= {1'b0, shift_reg[WIDTH-1:1]}; // Shift right
end
end
assign s_out = MSB_FIRST ? shift_reg[WIDTH-1] : shift_reg[0];
endmodule
Instantiation Examples
// SPI transmitter: 8-bit, MSB first
piso_n #(
.WIDTH (8),
.MSB_FIRST (1)
) u_spi_tx (
.clk (clk),
.rst_n (rst_n),
.load (spi_load),
.p_in (tx_byte),
.s_out (mosi)
);
// UART transmitter: 8-bit, LSB first
piso_n #(
.WIDTH (8),
.MSB_FIRST (0)
) u_uart_tx (
.clk (clk),
.rst_n (rst_n),
.load (uart_load),
.p_in (tx_byte),
.s_out (tx_serial)
);
10.5.6 Best Practices for Parameterized Design
The following practices apply to all parameterized modules in FPGA design:
• Always provide meaningful default values for every parameter. The default should represent the most commonly used configuration so that the module can be instantiated with minimal overrides for typical use cases.
• Never use magic numbers inside parameterized logic. Every constant that depends on a parameter must be derived from that parameter using localparam or a parameter expression. Hard-coded numbers inside parameterized logic will produce incorrect behavior when the parameter is changed.
• Use $clog2 for all address and counter width calculations. Manual calculation of bit-widths is a maintenance hazard — if the depth or count changes, the bit-width must be recalculated. Using $clog2 makes this automatic and error-free.
• Use named parameter assignment at instantiation. Always use the .PARAM_NAME(value) syntax rather than positional assignment. Named assignment is self-documenting and immune to ordering errors when the parameter list changes.
• Verify the synthesized bit-widths in the Compilation Report. After compiling a parameterized module in Quartus Prime, check the Resource Usage Summary and RTL Viewer to confirm that the synthesized register widths and counter widths match the intended parameter values.
• Add parameter range comments to document valid ranges. Verilog does not enforce parameter value constraints at compile time. Document the valid range of each parameter in a comment immediately above its declaration so that designers instantiating the module know the supported configuration space.
// Example: well-documented parameter declarations with range comments
module reg_n #(
parameter WIDTH = 8, // Data width: 1 to 64 bits
parameter RST_VALUE = {WIDTH{1'b0}} // Reset value: must fit within WIDTH bits
) (
// ...
);
// Internal consistency check using localparam
// WIDTH must be at least 1; RST_VALUE is automatically sized
localparam ZERO = {WIDTH{1'b0}};
// ...
endmodule
10.6 Multi-bit Registers
10.6 Multi-bit Registers
The register patterns introduced in Section 10.4 focused on single instances of fixed-width registers. In practice, designs frequently require collections of registers — arrays of storage elements that share a common clock and reset but are individually addressable or operated on as a group. This section covers the declaration, access, and synthesis of multi-bit register structures, from simple packed arrays to full register file designs.
10.6.1 Packed vs Unpacked Arrays
Verilog supports two distinct array declaration styles that are often confused with each other. Understanding the difference is essential for writing correct, synthesizable RTL and for predicting how Quartus Prime will map the declarations to hardware.
Packed Arrays
A packed array declares multiple bits as a single multi-bit variable. The bit dimensions appear to the left of the variable name. The entire packed array is treated as one contiguous register and is synthesized as a single group of flip-flops sharing a common clock and reset.
// Packed array examples
reg [7:0] data_byte; // One 8-bit register (8 flip-flops as one unit)
reg [15:0] data_word; // One 16-bit register
reg [31:0] data_dword; // One 32-bit register
// Packed array: entire variable assigned at once
always @(posedge clk or negedge rst_n) begin
if (!rst_n)
data_byte <= 8'h00;
else
data_byte <= data_in;
end
// Packed array: individual bit access
wire msb = data_byte[7]; // Single bit select
wire [3:0] upper = data_byte[7:4]; // Part select (upper nibble)
wire [3:0] lower = data_byte[3:0]; // Part select (lower nibble)
Unpacked Arrays
An unpacked array declares multiple separate variables of the same type. The array dimensions appear to the right of the variable name. Each element is an independent register and is accessed individually using an index. Unpacked arrays are the standard way to declare register arrays and memory structures in synthesizable Verilog.
// Unpacked array examples
reg flags [0:7]; // 8 separate 1-bit registers
reg [7:0] regs [0:15]; // 16 separate 8-bit registers
reg [31:0] mem [0:255]; // 256 separate 32-bit registers (register file / RAM)
// Unpacked array: individual element access using an index
always @(posedge clk or negedge rst_n) begin
if (!rst_n)
regs[0] <= 8'h00; // Reset one specific register
else if (wr_en)
regs[addr] <= wr_data; // Write to register selected by addr
end
// Unpacked array: read access is combinational
assign rd_data = regs[addr];
Note that Verilog does not allow a single assignment statement to initialize or clear an entire unpacked array. Each element must be accessed individually, typically using a for loop inside an always block:
// Correct: use a for loop to reset all elements of an unpacked array
integer i;
always @(posedge clk or negedge rst_n) begin
if (!rst_n) begin
for (i = 0; i < 16; i = i + 1)
regs[i] <= 8'h00; // Reset each register individually
end else if (wr_en) begin
regs[addr] <= wr_data;
end
end
Side-by-Side Comparison
| Property | Packed Array | Unpacked Array |
|---|---|---|
| Dimension position | Left of variable name | Right of variable name |
| Treated as | Single multi-bit variable | Collection of independent variables |
| Whole-array assignment | data <= 8'h00 ✓ Allowed | ✗ Not allowed — use for loop |
| Individual bit access | data[3], data[7:4] | regs[i], regs[i][3:0] |
| Synthesized as | Single register (flip-flop group) | Register array (multiple flip-flop groups) |
| Quartus inference | Logic Elements (FFs) | Logic Elements or Distributed RAM |
| Typical use | Data bus, status/control register | Register file, lookup table, small RAM |
10.6.2 Multi-bit Register Operations
Several Verilog operators are specifically useful when working with multi-bit registers. Understanding these operators allows complex register manipulations to be expressed concisely and synthesized efficiently.
Bit-Select and Part-Select
reg [15:0] status_reg;
// Bit-select: access a single bit by index
wire tx_busy = status_reg[0]; // Bit 0: TX busy flag
wire rx_ready = status_reg[1]; // Bit 1: RX ready flag
wire error = status_reg[7]; // Bit 7: error flag
// Part-select: access a contiguous range of bits
wire [3:0] irq_flags = status_reg[11:8]; // Bits 11..8: interrupt flags
wire [3:0] mode_field = status_reg[15:12]; // Bits 15..12: mode selection
// Part-select on left-hand side: write to a specific field
always @(posedge clk or negedge rst_n) begin
if (!rst_n)
status_reg <= 16'h0000;
else if (irq_wr_en)
status_reg[11:8] <= irq_data; // Update only the IRQ field
end
Concatenation
The concatenation operator {} joins multiple signals or constants into a single wider bus. It is one of the most frequently used operators in register and shift register design.
reg [7:0] high_byte, low_byte;
reg [15:0] combined;
// Concatenation: join two 8-bit registers into one 16-bit register
always @(posedge clk or negedge rst_n) begin
if (!rst_n)
combined <= 16'h0000;
else
combined <= {high_byte, low_byte}; // high_byte becomes [15:8], low_byte [7:0]
end
// Concatenation in shift register: insert new bit at LSB, discard MSB
reg [7:0] shift_reg;
always @(posedge clk or negedge rst_n) begin
if (!rst_n)
shift_reg <= 8'h00;
else
shift_reg <= {shift_reg[6:0], serial_in}; // Shift left, insert at bit 0
end
// Concatenation to swap byte order (endian conversion)
wire [15:0] swapped = {combined[7:0], combined[15:8]};
Replication
The replication operator {N{x}} repeats a signal or constant N times. It is commonly used to initialize multi-bit registers and to perform sign extension.
// Replication for register initialization
reg [31:0] wide_reg;
always @(posedge clk or negedge rst_n) begin
if (!rst_n)
wide_reg <= {32{1'b0}}; // Equivalent to 32'h00000000
else
wide_reg <= data_in;
end
// Replication for sign extension: extend an 8-bit signed value to 16 bits
wire [7:0] signed_byte = 8'hA5; // 1010_0101 = -91 in two's complement
wire [15:0] sign_extended = {{8{signed_byte[7]}}, signed_byte};
// signed_byte[7] = 1 (negative), so upper 8 bits = 8'hFF
// Result: 16'hFFA5
// Replication in parameterized reset
parameter WIDTH = 16;
reg [WIDTH-1:0] param_reg;
always @(posedge clk or negedge rst_n) begin
if (!rst_n)
param_reg <= {WIDTH{1'b0}}; // Works for any WIDTH value
else
param_reg <= d;
end
10.6.3 Register Arrays
A register array is an unpacked array of multi-bit registers, declared and accessed as described in Section 10.6.1. It forms the basis of any addressable storage structure: configuration register banks, lookup tables, and small on-chip memories.
// Parameterized register array with synchronous write and asynchronous read
// Models a small block of addressable registers (e.g., peripheral CSR block)
module reg_array #(
parameter DATA_W = 8, // Data width of each register
parameter NUM_REGS = 16, // Number of registers
localparam ADDR_W = $clog2(NUM_REGS) // Address width
) (
input wire clk,
input wire rst_n,
// Write port
input wire wr_en,
input wire [ADDR_W-1:0] wr_addr,
input wire [DATA_W-1:0] wr_data,
// Read port
input wire [ADDR_W-1:0] rd_addr,
output wire [DATA_W-1:0] rd_data
);
// Register array declaration
reg [DATA_W-1:0] regs [0:NUM_REGS-1];
integer i;
// Synchronous write with asynchronous reset
always @(posedge clk or negedge rst_n) begin
if (!rst_n) begin
for (i = 0; i < NUM_REGS; i = i + 1)
regs[i] <= {DATA_W{1'b0}}; // Clear all registers on reset
end else if (wr_en) begin
regs[wr_addr] <= wr_data;
end
end
// Asynchronous (combinational) read
assign rd_data = regs[rd_addr];
endmodule
Quartus Synthesis Inference for Register Arrays
Quartus Prime infers different hardware structures for register arrays depending on the access pattern:
• Logic Elements (Flip-Flops): inferred when the array is small, has multiple write ports, or has access patterns that prevent RAM inference. Every bit occupies one dedicated flip-flop in an LE.
• Distributed RAM (MLAB): inferred when the array has one synchronous write port and one or more asynchronous read ports, and the size is within the MLAB block size on the target device. This is more area efficient than flip-flop inference for larger arrays.
• Block RAM (M9K): inferred when the array is large and both write and read are synchronous. For the register arrays in this section, asynchronous read is preferred to prevent unintended Block RAM inference.
Always verify the inferred hardware using the Resource Usage Summary and RTL Viewer in Quartus Prime after compilation to confirm whether the array was mapped to flip-flops or to RAM resources.
10.6.4 Register File Design
A register file is a structured collection of registers with one or more independent read and write ports, designed to be accessed by index (address). It is the central storage element of a processor datapath — the general-purpose registers of a CPU are implemented as a register file. Understanding register file design is therefore a prerequisite for any CPU or ALU implementation in later chapters.
The standard register file configuration for a simple RISC-style datapath has:
• One synchronous write port — data is written on the rising clock edge when write enable is asserted
• Two asynchronous read ports — two source operands can be read simultaneously in the same cycle, without waiting for a clock edge
• Register 0 hardwired to zero — a common convention in RISC architectures (MIPS, RISC-V) where register 0 always reads as zero regardless of writes
// Dual-read, single-write Register File
// Typical use: RISC processor general-purpose register bank
// Register 0 is hardwired to zero (writes to reg 0 are ignored)
module register_file #(
parameter DATA_W = 32, // Data width (e.g., 32-bit RISC architecture)
parameter NUM_REG = 32, // Number of registers (e.g., 32 for MIPS/RISC-V)
localparam ADDR_W = $clog2(NUM_REG)
) (
input wire clk,
input wire rst_n,
// Write port
input wire wr_en, // Write enable
input wire [ADDR_W-1:0] wr_addr, // Write address (destination register)
input wire [DATA_W-1:0] wr_data, // Write data
// Read port A (source operand 1, e.g., rs1)
input wire [ADDR_W-1:0] rd_addr_a,
output wire [DATA_W-1:0] rd_data_a,
// Read port B (source operand 2, e.g., rs2)
input wire [ADDR_W-1:0] rd_addr_b,
output wire [DATA_W-1:0] rd_data_b
);
// Register storage array
reg [DATA_W-1:0] rf [0:NUM_REG-1];
integer i;
// Synchronous write port with asynchronous reset
// Register 0 is never written — it is permanently zero
always @(posedge clk or negedge rst_n) begin
if (!rst_n) begin
for (i = 0; i < NUM_REG; i = i + 1)
rf[i] <= {DATA_W{1'b0}};
end else if (wr_en && (wr_addr != {ADDR_W{1'b0}})) begin
rf[wr_addr] <= wr_data; // Write to any register except register 0
end
end
// Asynchronous read ports
// Register 0 always returns zero regardless of stored value
assign rd_data_a = (rd_addr_a == {ADDR_W{1'b0}}) ? {DATA_W{1'b0}} : rf[rd_addr_a];
assign rd_data_b = (rd_addr_b == {ADDR_W{1'b0}}) ? {DATA_W{1'b0}} : rf[rd_addr_b];
endmodule
Read-During-Write Behavior
When a read and a write occur to the same address in the same clock cycle, the behavior depends on whether the read is synchronous or asynchronous:
• Asynchronous read (this implementation): the read returns the old value stored before the write takes effect. The new value written on this clock edge will be available on the next cycle.
• Synchronous read: the read returns either the old or new value depending on the implementation — this must be explicitly specified and verified. Synchronous read also prevents asynchronous MLAB inference and may result in Block RAM inference instead.
For a processor datapath, asynchronous read with old-value behavior is the standard choice because it allows the read and write to be fully decoupled in the pipeline timing.
10.6.5 Common Mistakes with Multi-bit Registers
Mistake 1: Whole-Array Assignment to an Unpacked Array
Assigning a constant or expression directly to an entire unpacked array is not supported in synthesizable Verilog. Each element must be individually assigned, typically inside a for loop.
// ERROR: whole-array assignment — not supported in synthesis
reg [7:0] regs [0:15];
always @(posedge clk or negedge rst_n) begin
if (!rst_n)
regs <= 0; // Illegal: cannot assign to an unpacked array as a whole
end
// CORRECT: use a for loop to reset each element individually
integer i;
always @(posedge clk or negedge rst_n) begin
if (!rst_n) begin
for (i = 0; i < 16; i = i + 1)
regs[i] <= 8'h00;
end
end
Mistake 2: Out-of-Range Array Index
Accessing an unpacked array with an index that exceeds the declared range produces undefined behavior in simulation and may cause incorrect synthesis results. Always ensure that the index signal is wide enough to address all elements but narrow enough to prevent out-of-range access.
// Potential issue: addr may be wider than needed, allowing out-of-range access
reg [7:0] regs [0:15]; // 16 registers, valid index: 0..15
reg [7:0] addr; // 8-bit address — can represent 0..255, but only 0..15 are valid
// If addr > 15, behavior is undefined
assign rd_data = regs[addr]; // Risk: addr could be 16..255
// CORRECT: use a localparam-derived address width to prevent over-indexing
localparam NUM_REGS = 16;
localparam ADDR_W = $clog2(NUM_REGS); // 4 bits: can represent 0..15 exactly
reg [7:0] regs [0:NUM_REGS-1];
reg [ADDR_W-1:0] addr; // 4-bit address: physically cannot exceed 15
assign rd_data = regs[addr]; // Safe: addr is bounded by its width
Mistake 3: Packed/Unpacked Dimension Confusion
Swapping the packed and unpacked dimensions produces a structurally different declaration that synthesizes differently and is accessed differently. This is a common source of subtle bugs that are difficult to trace.
// These two declarations look similar but are completely different:
reg [7:0] A [0:3]; // Unpacked: 4 separate 8-bit registers
// Access: A[0], A[1], A[2], A[3]
// A[0] = one 8-bit register; A[0][3] = bit 3 of register 0
reg [3:0] B [0:7]; // Unpacked: 8 separate 4-bit registers
// Access: B[0]..B[7]
// B[0] = one 4-bit register; B[0][3] = MSB of register 0
// DO NOT confuse with:
reg [7:0] C; // Packed: one single 8-bit register
// Access: C[7:0], C[3], C[7:4]
// No unpacked dimension — cannot use C[0] as an array index
Mistake 4: Inferring Unintended Block RAM
If both the write and read of a register array are synchronous (clock-edge triggered), Quartus may infer Block RAM (M9K) instead of distributed flip-flops. Block RAM has a one-cycle read latency that can disrupt datapath timing if not accounted for. To prevent unintended Block RAM inference, use asynchronous (combinational) assign statements for the read port as shown in Sections 10.6.3 and 10.6.4.
// Risk of Block RAM inference: both write and read are synchronous
always @(posedge clk) begin
if (wr_en)
regs[wr_addr] <= wr_data;
end
always @(posedge clk) begin
rd_data <= regs[rd_addr]; // Synchronous read — may infer Block RAM
end
// Prevents Block RAM inference: read is asynchronous
always @(posedge clk) begin
if (wr_en)
regs[wr_addr] <= wr_data; // Synchronous write
end
assign rd_data = regs[rd_addr]; // Asynchronous read — infers flip-flops or MLAB
10.6.6 Summary
| Structure | Declaration Style | Synthesized As | Typical Use |
|---|---|---|---|
| Packed register | reg [N-1:0] name | N flip-flops (single group) | Data bus, status register, counter |
| Register array | reg [W-1:0] name [0:N-1] | FFs or MLAB (address-selected) | CSR block, lookup table |
| Register file | Array + dual read ports | FFs or MLAB | CPU general-purpose registers |
| Concatenation | {a, b} | Wiring (no logic) | Byte assembly, shift register |
| Replication | {N{x}} | Wiring (no logic) | Reset initialization, sign extension |
| Part-select | name[h:l] | Wiring (no logic) | Field extraction, partial update |
10.7 Counter Design Fundamentals
10.7 Counter Design Fundamentals
Section 10.8 presents a comprehensive set of counter implementations. Before working through those variants, this section establishes the theoretical and practical foundations that apply to every counter design: the hardware model of a counter, the implications of different count encodings, overflow and wrap-around mechanics, timing analysis of the counter critical path, and how to verify counter behavior using Quartus Prime tools. Designers who understand these fundamentals will be able to select, implement, and debug counter circuits with confidence.
10.7.1 What Is a Counter? (Hardware Perspective)
From a hardware perspective, a counter is not a fundamentally new type of circuit — it is a register augmented with a feedback path through combinational arithmetic logic. Every counter consists of exactly three elements:
• A register — stores the current count value across clock cycles. On the Intel MAX-10, each bit of the counter occupies one flip-flop within a Logic Element.
• An arithmetic unit — computes the next count value from the current count value. For a simple binary up counter, this is an adder that computes count + 1.
• A feedback path — connects the register output back to the arithmetic unit input, forming a closed loop. This feedback loop is the structural feature that distinguishes a counter from a plain register.
// The feedback loop is explicit in Verilog:
// count (register output) appears on both the left-hand side (storage)
// and the right-hand side (feedback into adder)
always @(posedge clk or negedge rst_n) begin
if (!rst_n)
count <= {WIDTH{1'b0}};
else if (en)
count <= count + 1'b1; // count feeds back into the adder
// ^^^^^
// This is the feedback path
end
This feedback loop is clearly visible in the RTL Viewer in Quartus Prime as a wire looping from the register output back to the adder input. Verifying the presence of this loop after synthesis is a reliable way to confirm that the synthesis tool has correctly inferred a counter rather than a combinational function.
Counter vs Register: The Critical Difference
| Property | Register | Counter |
|---|---|---|
| Next-state logic | External — driven by surrounding logic | Internal — derived from current count value |
| Feedback path | None | Output feeds back to arithmetic input |
| Autonomous operation | No — holds value unless driven | Yes — self-advances on every enabled clock edge |
| RTL Viewer appearance | Linear data flow | Closed feedback loop visible |
10.7.2 Binary Encoding vs Other Encodings
The choice of count encoding determines the hardware cost, switching activity, maximum operating frequency, and suitability for specific applications. Three encodings are relevant to FPGA counter design.
Binary Encoding
Binary encoding is the default and most common choice. Each count value is represented as its standard binary equivalent. It is the most area-efficient encoding because the increment operation maps directly to the dedicated carry-chain logic in the Intel MAX-10.
• Area: minimum — N flip-flops for an N-bit counter
• Switching activity: higher bits toggle less frequently than lower bits (bit 0 toggles every cycle, bit N-1 toggles once every 2^(N-1) cycles), but the MSB-to-LSB carry propagation means all bits may transition simultaneously at the rollover point
• Cross-domain safety: not safe — multiple bits can change simultaneously at rollover
• Use when: general-purpose counting, address generation, timing, and all applications where cross-domain transfer is not required
Gray Code Encoding
In Gray code, only one bit changes between consecutive count values. This property eliminates the multi-bit transition problem at rollover and makes Gray-coded counters the standard choice for pointers that must be read across a clock domain boundary.
• Area: N flip-flops plus XOR conversion logic
• Switching activity: minimum — exactly one bit changes per clock cycle at all times
• Cross-domain safety: safe — a single-bit transition sampled across a clock domain boundary always produces a valid adjacent count value
• Use when: asynchronous FIFO read/write pointers, any counter value that must be transferred between clock domains
One-Hot Encoding
In one-hot encoding, exactly one bit is set at any time. An N-state counter requires N flip-flops — one per state — rather than log2(N) flip-flops as in binary. Despite its higher flip-flop count, one-hot encoding often produces faster and simpler decode logic because state detection requires examining only one bit rather than a multi-bit comparison.
• Area: N flip-flops for N states (more than binary)
• Decode logic: trivial — no comparator required, each state output is directly one flip-flop output
• Maximum frequency: often higher than binary for small N because the next-state logic is simpler
• Use when: FSM state registers (Quartus syn_encoding attribute), ring counters, small N where decode speed matters more than flip-flop count
Encoding Comparison Summary
| Encoding | Flip-Flops Required | Bits Changed per Step | CDC Safe? | Typical Application |
|---|---|---|---|---|
| Binary | N | 1 to N (varies) | ✗ No | General purpose, timing, address |
| Gray Code | N + XOR | Always exactly 1 | ✓ Yes | Async FIFO pointers, CDC counters |
| One-Hot | 2^N (one per state) | Always exactly 2 | ✗ No | FSM states, ring counter, small N |
10.7.3 Overflow and Wrap-around Behavior
When a counter reaches its maximum value and increments one more time, the result wraps around. The wrap-around behavior depends on whether the counter uses natural binary overflow or a forced terminal-count reset, and the choice has a direct impact on the amount of logic required.
Natural Wrap-around (Binary Overflow)
A binary counter naturally wraps from its maximum value (2^N - 1) back to zero due to arithmetic overflow — the carry out of the MSB is simply discarded. No comparison logic is needed. This is the most efficient counter structure because the adder already handles the wrap-around implicitly.
// Natural wrap-around: no comparator, just an adder
// 4-bit counter: counts 0, 1, 2, ..., 14, 15, 0, 1, 2, ...
always @(posedge clk or negedge rst_n) begin
if (!rst_n)
count <= 4'b0000;
else if (en)
count <= count + 1'b1; // At 4'b1111 + 1 = 4'b0000 (overflow discarded)
end
Natural wrap-around is only possible when the terminal count value is exactly 2^N - 1 (a power of two minus one). Any other terminal count requires a comparator.
Forced Reset (Modulo-N, Non-Power-of-Two)
When the desired count range is not a power of two (e.g., count 0 to 9 for BCD, or 0 to 433 for a baud rate generator), a comparator must be added to detect the terminal count and force the counter back to zero. This comparator adds logic to the critical path.
// Forced reset at terminal count: comparator required
// Modulo-10 counter: counts 0, 1, 2, ..., 9, 0, 1, 2, ...
always @(posedge clk or negedge rst_n) begin
if (!rst_n)
count <= 4'b0000;
else if (en) begin
if (count == 4'd9) // Comparator: detect terminal count
count <= 4'b0000; // Forced reset to zero
else
count <= count + 1'b1;
end
end
The comparator (count == 4'd9) adds one level of combinational logic between the register output and the next-state Mux, which increases the critical path delay compared to natural wrap-around.
Hardware Cost Comparison
| Wrap-around Type | Terminal Count | Comparator Needed? | Extra Logic | Critical Path Impact |
|---|---|---|---|---|
| Natural overflow | 2^N - 1 | ✗ No | None | Minimal |
| Forced reset | Any value N-1 | ✓ Yes | Comparator + Mux | Increased |
10.7.4 Timing Analysis of Counters
The counter is one of the most common sources of critical path violations in FPGA designs. Understanding why, and how Intel MAX-10 mitigates the problem, is essential for designing high-speed counters.
The Critical Path in a Binary Counter
The critical path of a binary counter runs through the carry chain of the adder. In a ripple-carry adder, the carry bit propagates serially from the LSB to the MSB — each stage must wait for the carry from the stage below it before its output is valid. The total combinational delay is therefore proportional to the counter width:
• An 8-bit counter has a carry chain of 8 stages — relatively fast
• A 32-bit counter has a carry chain of 32 stages — significantly slower
• The maximum clock frequency (Fmax) decreases as counter width increases, because a wider adder requires more carry propagation time within one clock period
Intel MAX-10 Carry Chain Optimization
The Intel MAX-10 Logic Element includes a dedicated carry-chain connection that links adjacent LEs in a column. The increment operation of a binary counter maps directly to this carry chain, bypassing the general-purpose routing fabric and achieving much lower carry propagation delay than would be possible with LUT-based logic alone.
Quartus Prime automatically applies this optimization when it recognizes the increment pattern (count + 1 in a clocked always block). The result can be verified by opening the Technology Map Viewer (Post-Fitting) and observing that the counter bits are placed in adjacent LEs within the same LE column, connected by the carry chain rather than by general routing.
Fmax vs Counter Width
Even with carry-chain optimization, a wider counter has a longer critical path. The following guidelines apply to Intel MAX-10 at typical operating conditions:
• Counters up to 16 bits typically meet timing at 50 MHz without any optimization effort.
• Counters of 32 bits may require timing-driven placement or pipelining to meet 100 MHz or higher targets.
• Counters wider than 32 bits should be split into cascaded stages (see Section 10.7.5) to keep the carry chain length within a manageable range for the target clock frequency.
The Comparator as an Additional Critical Path Element
In Modulo-N counters and BCD counters, the comparator that detects the terminal count adds additional combinational delay on top of the carry chain delay. The combined path — carry chain through the adder, then the equality comparator, then the reset Mux — is the new critical path and will have a lower Fmax than a plain binary counter of the same width.
One common optimization is to compute the terminal count detection one cycle early using a registered comparator, and use that registered signal to gate the reset on the following cycle. This breaks the critical path at the cost of one additional flip-flop and a one-cycle latency in the terminal count detection.
// Optimized Modulo-N counter: registered terminal count detection
// tc_reg is registered one cycle before the actual wrap, reducing critical path
module modulo_fast #(
parameter MODULO = 100,
localparam WIDTH = $clog2(MODULO)
) (
input wire clk,
input wire rst_n,
input wire en,
output reg [WIDTH-1:0] count,
output reg tc // Registered terminal count (one cycle early)
);
always @(posedge clk or negedge rst_n) begin
if (!rst_n) begin
count <= {WIDTH{1'b0}};
tc <= 1'b0;
end else if (en) begin
if (tc) begin
count <= {WIDTH{1'b0}}; // Reset triggered by registered tc
tc <= (MODULO == 1); // tc stays high only if MODULO = 1
end else begin
count <= count + 1'b1;
tc <= (count == MODULO - 2); // Detect terminal-1 one cycle early
end
end
end
endmodule
10.7.5 Counter Cascading for Wide Counters
When a counter wider than 32 bits is required, or when a very large modulus is needed, the design can be split into cascaded stages. The terminal count output of the lower stage drives the enable input of the upper stage, effectively multiplying the count range without creating an excessively long carry chain.
// Cascaded 48-bit counter built from two 24-bit stages
// Lower stage counts 0..2^24-1; upper stage advances once per lower overflow
module counter_48bit (
input wire clk,
input wire rst_n,
input wire en,
output wire [47:0] count // Full 48-bit count value
);
wire [23:0] count_lo, count_hi;
wire ovf_lo; // Overflow of lower 24 bits
// Lower 24-bit stage: always enabled when en is active
up_counter #(.WIDTH(24)) u_lo (
.clk (clk),
.rst_n (rst_n),
.en (en),
.count (count_lo),
.ovf (ovf_lo)
);
// Upper 24-bit stage: advances only when lower stage overflows
up_counter #(.WIDTH(24)) u_hi (
.clk (clk),
.rst_n (rst_n),
.en (ovf_lo), // Enable from lower stage overflow
.count (count_hi),
.ovf () // Upper overflow not used here
);
assign count = {count_hi, count_lo};
endmodule
Each stage has its own independent carry chain limited to 24 bits, keeping the critical path short. The enable signal (ovf_lo) between stages is a registered signal (the overflow flag of the lower counter), so it does not add combinational delay to either stage's critical path.
10.7.6 Verifying Counter Behavior in Quartus Prime
After synthesizing a counter module, four verification steps should be performed before proceeding to integration or lab testing.
Step 1: Confirm Feedback Loop in RTL Viewer
• Open Tools → Netlist Viewers → RTL Viewer.
• Locate the counter register. Confirm that a wire loops from the register output back through an adder and into the register input — this is the feedback path that defines a counter.
• If no feedback loop is visible, the counter has been incorrectly inferred as a combinational function. Review the always sensitivity list and ensure that the assignment is non-blocking (<=).
Step 2: Confirm Carry Chain in Technology Map Viewer
• Open Tools → Netlist Viewers → Technology Map Viewer (Post-Fitting).
• Locate the counter bits. Confirm that they are placed in adjacent LEs within the same LE column, connected by the dedicated carry chain (shown as a vertical carry connection between LEs).
• If the counter bits are scattered across non-adjacent LEs, the carry chain optimization has not been applied. Check that the increment expression is written as count + 1'b1 and not as a more complex expression that prevents recognition.
Step 3: Check Fmax in Timing Analyzer
• Open Tools → Timing Analyzer and run Report Fmax Summary.
• Locate the clock domain of the counter. The reported Fmax must exceed the target operating frequency (e.g., 50 MHz for the DE10-Lite) with positive slack.
• If Fmax is below the target, examine the Report Critical Path output to identify whether the bottleneck is the carry chain, the terminal count comparator, or the reset Mux.
Step 4: Simulate Count Sequence in Simulation Waveform Editor
• Open File → New → University Program VWF (Vector Waveform File) or use ModelSim with a testbench.
• Apply a clock signal and assert rst_n low for two to three clock cycles, then deassert.
• Assert en and verify that the count sequence advances correctly, produces the expected terminal count, and wraps around to zero at the correct value.
• Test the boundary conditions explicitly: verify the transition from the terminal count value back to zero, and verify that the counter holds its value when en = 0.
10.7.7 Design Rules Summary
| Design Decision | Rule |
|---|---|
| Encoding choice | Use binary for general-purpose counting. Use Gray code when the counter value crosses a clock domain boundary. Use one-hot only for small FSM-style counters where decode speed is critical. |
| Wrap-around strategy | Use natural binary overflow when the terminal count is 2^N - 1. Use forced reset with a comparator for all other terminal count values. |
| Counter width | Use $clog2 to calculate the minimum required bit-width. Cascade stages for counters wider than 32 bits. |
| Increment expression | Always write the increment as count + 1'b1 to enable carry-chain optimization in Quartus Prime. |
| Critical path | Verify Fmax in Timing Analyzer. If the comparator is on the critical path, consider registering the terminal count detection one cycle early. |
| Synthesis verification | Confirm the feedback loop in RTL Viewer and the carry chain placement in Technology Map Viewer after every compilation. |
| Simulation | Always simulate the full count sequence including the wrap-around transition and the boundary conditions at terminal count ± 1. |
10.8 Up, Down, Modulo, and Gray Code Counters
10.8 Up, Down, Modulo-N, Gray Code, and BCD Counters
This section implements the most commonly used counter types in FPGA design. Each counter is presented with a complete, synthesizable Verilog module, an explanation of its internal hardware structure, and the practical applications where it is typically deployed. All examples are parameterized using the conventions established in Section 10.5 and follow the reset best practices from Sections 10.2 and 10.3.
10.8.1 Up Counter
The up counter is the most fundamental sequential circuit in digital design. It increments its stored value by one on every active clock edge and wraps back to zero after reaching its maximum value. The wrap-around occurs naturally due to binary arithmetic overflow — no additional comparison logic is required, making this the most area-efficient counter type.
// Parameterized N-bit Up Counter with asynchronous reset
// Counts from 0 to (2^WIDTH - 1), then wraps back to 0 automatically
module up_counter #(
parameter WIDTH = 8 // Counter width in bits; range: 1..32
) (
input wire clk,
input wire rst_n, // Active-low asynchronous reset
input wire en, // Count enable
output reg [WIDTH-1:0] count, // Current count value
output wire ovf // Overflow flag: pulses high for one cycle at wrap
);
always @(posedge clk or negedge rst_n) begin
if (!rst_n)
count <= {WIDTH{1'b0}};
else if (en)
count <= count + 1'b1;
end
// Overflow: asserted when count is at maximum AND enable is active
assign ovf = en & (&count); // (&count) = reduction AND = 1 only when all bits are 1
endmodule
Hardware structure:
• The increment logic (count + 1) is synthesized as an adder using the dedicated carry-chain logic within the Intel MAX-10 Logic Elements. Quartus automatically chains adjacent LEs to propagate the carry bit across the full WIDTH, producing a highly efficient ripple-carry or fast-carry adder depending on the device family.
• The overflow signal (ovf) is computed using a reduction AND (&count), which evaluates to 1 only when every bit of count is 1 (i.e., the count has reached 2^WIDTH - 1).
Typical applications:
• Clock divider and baud rate generator base counter
• Memory address sequencer for sequential read/write operations
• Event counter for measuring frequency or pulse count
• Pipeline stage cycle counter
10.8.2 Down Counter
The down counter decrements its stored value on every active clock edge. When the count reaches zero, it wraps back to its maximum value through natural binary underflow. A terminal count (TC) flag is commonly added to signal when the counter has reached zero, making it suitable for timeout and delay generation circuits.
// Parameterized N-bit Down Counter with asynchronous reset
// Counts from (2^WIDTH - 1) down to 0, then wraps back to (2^WIDTH - 1)
module down_counter #(
parameter WIDTH = 8 // Counter width in bits; range: 1..32
) (
input wire clk,
input wire rst_n, // Active-low asynchronous reset (loads max value)
input wire en, // Count enable
output reg [WIDTH-1:0] count, // Current count value
output wire tc // Terminal count: high for one cycle when count = 0
);
always @(posedge clk or negedge rst_n) begin
if (!rst_n)
count <= {WIDTH{1'b1}}; // Reset to maximum value (all ones)
else if (en)
count <= count - 1'b1;
end
// Terminal count: asserted when count is zero AND enable is active
assign tc = en & ~(|count); // (|count) = reduction OR = 0 only when all bits are 0
endmodule
Typical applications:
• Countdown timer — assert TC when the programmed delay has elapsed
• Watchdog timeout detector
• Retry counter — count down the number of allowed retransmissions in a communication protocol
• Hardware loop counter for fixed-iteration operations
10.8.3 Up/Down Counter
The up/down counter counts in either direction under the control of a direction signal (dir). When dir = 1 the counter increments; when dir = 0 it decrements. Both overflow and underflow wrap around naturally.
// Parameterized N-bit Up/Down Counter with asynchronous reset
// dir = 1: count up; dir = 0: count down
module updown_counter #(
parameter WIDTH = 8 // Counter width in bits; range: 1..32
) (
input wire clk,
input wire rst_n, // Active-low asynchronous reset
input wire en, // Count enable
input wire dir, // Direction: 1 = up, 0 = down
output reg [WIDTH-1:0] count, // Current count value
output wire ovf, // Overflow: count was at max and counted up
output wire unf // Underflow: count was at 0 and counted down
);
always @(posedge clk or negedge rst_n) begin
if (!rst_n)
count <= {WIDTH{1'b0}};
else if (en) begin
if (dir)
count <= count + 1'b1; // Count up
else
count <= count - 1'b1; // Count down
end
end
// Overflow: counting up from maximum value
assign ovf = en & dir & (&count);
// Underflow: counting down from zero
assign unf = en & ~dir & ~(|count);
endmodule
Typical applications:
• Quadrature encoder interface — increment on forward rotation, decrement on reverse rotation to track absolute position
• Volume or brightness control with up/down button inputs
• Servo motor position controller
• FIFO fill-level counter — increment on write, decrement on read to track occupancy
10.8.4 Modulo-N Counter
A Modulo-N counter counts from 0 to N-1 and then resets to 0 on the next clock edge. Unlike the natural wrap-around of a binary counter, the terminal value N-1 is arbitrary and does not need to be a power of two minus one. This requires a comparator to detect when the count has reached N-1 and force a synchronous reset to zero on the following cycle.
The Modulo-N counter is one of the most frequently used counters in practical FPGA design because it generates a precisely controlled periodic event every N clock cycles.
// Parameterized Modulo-N Counter with asynchronous reset
// Counts from 0 to (MODULO - 1), then resets to 0
// The terminal count pulse (tc) is asserted for one cycle when count = MODULO - 1
module modulo_counter #(
parameter MODULO = 10, // Count modulus; must be >= 2
localparam WIDTH = $clog2(MODULO) // Minimum bit-width for the count
) (
input wire clk,
input wire rst_n, // Active-low asynchronous reset
input wire en, // Count enable
output reg [WIDTH-1:0] count, // Current count value
output wire tc // Terminal count: high for one cycle at MODULO-1
);
always @(posedge clk or negedge rst_n) begin
if (!rst_n)
count <= {WIDTH{1'b0}};
else if (en) begin
if (count == MODULO - 1)
count <= {WIDTH{1'b0}}; // Reset to 0 at terminal count
else
count <= count + 1'b1;
end
end
// Terminal count: asserted one cycle before the reset
assign tc = en & (count == MODULO - 1);
endmodule
Practical Example: Baud Rate Generator Using Modulo-N Counter
One of the most common uses of a Modulo-N counter is generating a precise baud rate tick from a higher-frequency system clock. The modulus is calculated as the ratio of the system clock frequency to the desired baud rate:
// Baud rate generator: 50 MHz system clock, 115200 baud
// MODULO = 50,000,000 / 115,200 = 434 (rounded)
// The baud_tick output pulses high for one cycle every 434 clock cycles
module baud_rate_gen #(
parameter CLK_FREQ = 50_000_000,
parameter BAUD_RATE = 115_200,
localparam MODULO = CLK_FREQ / BAUD_RATE,
localparam WIDTH = $clog2(MODULO)
) (
input wire clk,
input wire rst_n,
output wire baud_tick // One pulse per baud period
);
wire [WIDTH-1:0] count;
modulo_counter #(
.MODULO (MODULO)
) u_baud_cnt (
.clk (clk),
.rst_n (rst_n),
.en (1'b1), // Always counting
.count (count),
.tc (baud_tick) // tc fires every MODULO cycles = one baud period
);
endmodule
Typical applications:
• Baud rate generator for UART, SPI, I2C clock dividers
• PWM period counter — sets the PWM frequency
• Sampling clock generator for ADC interfaces
• Display refresh counter — trigger a new display frame every N clock cycles
• Seven-segment display multiplexer timing controller
10.8.5 Gray Code Counter
A Gray code counter produces a sequence in which only one bit changes between consecutive count values. This property is critical in applications where the counter value is read by logic in a different clock domain — if two or more bits changed simultaneously (as in binary counting), a reader in another domain might sample a transitional value where some bits have already changed and others have not, producing a completely incorrect reading.
The standard implementation counts in binary internally and converts to Gray code at the output. The conversion from binary to Gray code is a simple XOR operation:
// Parameterized Gray Code Counter with asynchronous reset
// Internal binary counter converted to Gray code at the output
// Used primarily for asynchronous FIFO read/write pointer generation
module gray_counter #(
parameter WIDTH = 4 // Counter width in bits; range: 2..16
) (
input wire clk,
input wire rst_n, // Active-low asynchronous reset
input wire en, // Count enable
output wire [WIDTH-1:0] gray_out // Gray code output
);
// Internal binary counter
reg [WIDTH-1:0] bin_count;
always @(posedge clk or negedge rst_n) begin
if (!rst_n)
bin_count <= {WIDTH{1'b0}};
else if (en)
bin_count <= bin_count + 1'b1;
end
// Binary to Gray code conversion: G[i] = B[i] XOR B[i+1]
// MSB of Gray code equals MSB of binary
assign gray_out = bin_count ^ (bin_count >> 1);
endmodule
Gray Code Sequence (4-bit Example)
| Decimal | Binary | Gray Code | Bits Changed |
|---|---|---|---|
| 0 | 0000 | 0000 | — |
| 1 | 0001 | 0001 | bit 0 |
| 2 | 0010 | 0011 | bit 1 |
| 3 | 0011 | 0010 | bit 0 |
| 4 | 0100 | 0110 | bit 2 |
| 5 | 0101 | 0111 | bit 0 |
| 6 | 0110 | 0101 | bit 1 |
| 7 | 0111 | 0100 | bit 0 |
Notice that in every row, only a single bit changes from the previous value — regardless of the decimal count value. This is the defining property of Gray code.
Typical applications:
• Asynchronous FIFO read/write pointers — the most critical application; Gray code ensures that a pointer sampled across a clock domain boundary is always a valid adjacent count value
• Rotary encoder position decoding
• CDC-safe event counters in multi-clock-domain systems
10.8.6 BCD Counter (Binary Coded Decimal)
A BCD counter counts in decimal — each decade (digit) counts from 0 to 9 and then resets to 0 while generating a carry to the next higher decade. Each decimal digit is represented by a 4-bit binary value, giving a range of 0000 (0) to 1001 (9). The values 1010 (10) through 1111 (15) are illegal in BCD and must never appear.
A multi-digit BCD counter is constructed by cascading single-decade BCD counters, where the terminal count output of one decade drives the enable input of the next.
Single-Decade BCD Counter (0 to 9)
// Single-decade BCD counter: counts 0 to 9, then resets to 0
// tc (terminal count) pulses high when count reaches 9
module bcd_decade (
input wire clk,
input wire rst_n, // Active-low asynchronous reset
input wire en, // Count enable (driven by tc of previous decade)
output reg [3:0] digit, // BCD digit output (0 to 9)
output wire tc // Terminal count: high for one cycle when digit = 9
);
always @(posedge clk or negedge rst_n) begin
if (!rst_n)
digit <= 4'd0;
else if (en) begin
if (digit == 4'd9)
digit <= 4'd0; // Reset to 0 after reaching 9
else
digit <= digit + 4'd1;
end
end
assign tc = en & (digit == 4'd9);
endmodule
Three-Decade BCD Counter (000 to 999)
// Three-decade BCD counter: counts 000 to 999
// Cascaded from three single-decade modules
// digit0 = ones, digit1 = tens, digit2 = hundreds
module bcd_counter_3digit (
input wire clk,
input wire rst_n,
input wire en,
output wire [3:0] digit2, // Hundreds digit
output wire [3:0] digit1, // Tens digit
output wire [3:0] digit0, // Ones digit
output wire tc // Terminal count: high when count = 999
);
wire tc0, tc1;
// Ones decade: always enabled when en is active
bcd_decade u_ones (
.clk (clk),
.rst_n (rst_n),
.en (en),
.digit (digit0),
.tc (tc0)
);
// Tens decade: enabled only when ones decade reaches 9
bcd_decade u_tens (
.clk (clk),
.rst_n (rst_n),
.en (tc0), // Carry from ones decade
.digit (digit1),
.tc (tc1)
);
// Hundreds decade: enabled only when tens decade reaches 9
bcd_decade u_hundreds (
.clk (clk),
.rst_n (rst_n),
.en (tc1), // Carry from tens decade
.digit (digit2),
.tc (tc)
);
endmodule
Typical applications:
• Seven-segment display counter — each BCD digit drives one seven-segment display digit directly
• Digital clock (seconds, minutes, hours in BCD format)
• Industrial event counter with decimal readout
• Frequency counter display
10.8.7 Ring and Johnson Counters (Optional Reading)
Ring and Johnson counters are shift-register-based counters that generate specific non-binary sequences without requiring adder logic. They are included here as optional reference material for designers who encounter them in timing generation or phase control applications.
Ring Counter
A ring counter circulates a single 1 bit through an N-bit shift register. After N clock cycles, the pattern repeats. Only N states are possible (not 2^N), and each state has exactly one bit set — making it equivalent to a one-hot state machine.
// 4-bit Ring Counter
// Sequence: 1000 → 0100 → 0010 → 0001 → 1000 → ...
// Each output bit is active for exactly one clock cycle per period
module ring_counter #(
parameter WIDTH = 4 // Number of stages; range: 2..16
) (
input wire clk,
input wire rst_n,
input wire en,
output reg [WIDTH-1:0] ring // One-hot ring output
);
always @(posedge clk or negedge rst_n) begin
if (!rst_n)
ring <= {{WIDTH-1{1'b0}}, 1'b1}; // Initialize with single 1 at LSB
else if (en)
ring <= {ring[0], ring[WIDTH-1:1]}; // Rotate right
end
endmodule
Johnson Counter (Twisted-Ring Counter)
A Johnson counter is similar to a ring counter but feeds back the complement of the MSB to the LSB input. This doubles the number of states to 2N and produces a sequence where only one bit changes per clock cycle — a Gray-code-like property without requiring XOR conversion logic.
// 4-bit Johnson Counter
// Sequence: 0000 → 0001 → 0011 → 0111 → 1111 → 1110 → 1100 → 1000 → 0000 → ...
// 2N = 8 unique states for N = 4
module johnson_counter #(
parameter WIDTH = 4 // Number of stages; produces 2*WIDTH unique states
) (
input wire clk,
input wire rst_n,
input wire en,
output reg [WIDTH-1:0] johnson
);
always @(posedge clk or negedge rst_n) begin
if (!rst_n)
johnson <= {WIDTH{1'b0}};
else if (en)
johnson <= {~johnson[0], johnson[WIDTH-1:1]}; // Feedback inverted LSB to MSB
end
endmodule
Typical applications:
• Ring counter: LED chaser / running light pattern, one-hot FSM controller, phase-accurate timing signal generator
• Johnson counter: divide-by-2N frequency divider, phase generator producing 2N evenly spaced phases, stepper motor commutation sequence
10.8.8 Counter Type Selection Guide
The following table summarizes the key characteristics of each counter type to guide design decisions:
| Counter Type | Count Sequence | Wrap Behavior | Extra Logic Required | Primary Use Case |
|---|---|---|---|---|
| Up Counter | 0 → 2^N-1 | Natural overflow | None | General purpose, address generation |
| Down Counter | 2^N-1 → 0 | Natural underflow | None | Timeout, countdown timer |
| Up/Down Counter | Bidirectional | Natural both ways | Direction Mux | Position tracking, FIFO occupancy |
| Modulo-N Counter | 0 → N-1 | Forced reset at N-1 | Comparator | Baud rate, PWM period, display timing |
| Gray Code Counter | Gray sequence | Natural (binary internal) | XOR conversion | Async FIFO pointers, CDC-safe counters |
| BCD Counter | 0 → 9 per decade | Forced reset at 9 | Comparator per decade | Decimal display, digital clock |
| Ring Counter | One-hot rotation | Circular shift | None (shift register) | LED chaser, one-hot FSM |
| Johnson Counter | 2N Gray-like states | Circular shift + invert | None (shift register) | Phase generator, frequency divider |
10.9 Counter with Enable and Reset
10.9 Counter with Enable and Reset
The counter variants in Section 10.8 each demonstrated a specific counting behavior in its simplest form. In a real FPGA design, a counter module must also handle a complete set of control signals — reset, clock enable, load, and direction — with a well-defined priority ordering that guarantees correct behavior under every combination of inputs. This section builds a series of progressively complete counter modules, culminating in a production-ready parameterized counter that can serve as the standard counter component for all subsequent lab and project work in this course.
10.9.1 Control Signal Priority
When multiple control signals are asserted simultaneously, the counter must apply them in a strictly defined priority order. This ordering is not a matter of convention — it has a concrete hardware justification. The priority from highest to lowest is:
• Reset (rst_n) — highest priority: the counter must reach a known safe state regardless of the state of any other signal. If reset could be blocked by enable or load, the system could become unrecoverable in a fault condition.
• Load — second priority: loading a preset value is a deliberate override of the count sequence. It takes precedence over normal counting but must yield to reset.
• Enable — third priority: allows or suppresses counting. When enable is inactive and no load or reset is asserted, the counter holds its current value.
• Hold (implicit) — lowest priority: when no control signal is active, the counter retains its current value. This is the default behavior and requires no explicit logic.
In Verilog, this priority ordering is expressed directly as a cascaded if / else if chain inside the always block. The first branch checked has the highest priority:
// Priority template for a fully controlled counter
always @(posedge clk or negedge rst_n) begin
if (!rst_n) // Priority 1: reset (unconditional)
count <= RESET_VAL;
else if (load) // Priority 2: load preset value
count <= preset;
else if (en) // Priority 3: count
count <= count + 1'b1;
// Implicit else: hold current value (priority 4)
end
10.9.2 Counter with Synchronous Reset and Enable
The synchronous reset + enable counter is the standard choice for datapath counters where the reset event is always aligned with the system clock — such as baud rate generators, PWM period counters, and pipeline cycle counters.
// Parameterized Up Counter — synchronous reset + clock enable
// Suitable for datapath applications (baud rate, PWM, pipeline timing)
module counter_sync_rst #(
parameter WIDTH = 8 // Counter width in bits
) (
input wire clk,
input wire rst_n, // Active-low synchronous reset
input wire en, // Clock enable
output reg [WIDTH-1:0] count, // Current count value
output wire ovf // Overflow: count reached maximum
);
always @(posedge clk) begin // Synchronous: only clk in sensitivity list
if (!rst_n)
count <= {WIDTH{1'b0}};
else if (en)
count <= count + 1'b1;
end
assign ovf = en & (&count);
endmodule
Synthesized hardware structure:
• Reset is implemented as a Mux before the adder input — the flip-flop CLR pin is not used.
• Enable maps to the flip-flop's dedicated ENA pin — no LUT consumed.
• Increment logic uses the dedicated carry chain across adjacent LEs.
Typical applications:
• Baud rate generator base counter (always clocked, reset only at startup)
• PWM period counter (reset value controls PWM frequency)
• Pipeline timing counter (synchronized to pipeline clock)
10.9.3 Counter with Asynchronous Reset and Enable
The asynchronous reset + enable counter is the standard choice for control logic counters where the reset must take effect immediately — independent of the clock — such as timeout detectors, watchdog counters, and protocol retry counters.
// Parameterized Up Counter — asynchronous reset + clock enable
// Suitable for control logic (timeout, watchdog, protocol retry counter)
module counter_async_rst #(
parameter WIDTH = 8 // Counter width in bits
) (
input wire clk,
input wire rst_n, // Active-low asynchronous reset
input wire en, // Clock enable
output reg [WIDTH-1:0] count, // Current count value
output wire ovf // Overflow: count reached maximum
);
always @(posedge clk or negedge rst_n) begin // rst_n in sensitivity list
if (!rst_n)
count <= {WIDTH{1'b0}};
else if (en)
count <= count + 1'b1;
end
assign ovf = en & (&count);
endmodule
Synthesized hardware structure:
• Reset connects directly to the flip-flop's hardware CLR pin — immediate effect, no LUT consumed.
• Enable maps to the flip-flop's dedicated ENA pin — no LUT consumed.
• This is the most area-efficient counter structure on the MAX-10: both control signals use dedicated LE hardware pins, leaving all LUT inputs available for the increment logic.
Typical applications:
• Timeout counter — must clear immediately when a watchdog fires
• Protocol retry counter — must reset on bus error regardless of clock
• Debounce counter — must reset immediately when button state changes
10.9.4 Loadable Counter (Preset Value)
A loadable counter adds a load control signal that forces the counter to a user-specified preset value on the next active clock edge. This makes the counter's starting point configurable at runtime — an essential feature for programmable timers, adjustable PWM duty cycles, and variable-rate event generators.
// Parameterized Loadable Up Counter — asynchronous reset + load + enable
// Priority: reset > load > enable > hold
module counter_loadable #(
parameter WIDTH = 8 // Counter width in bits
) (
input wire clk,
input wire rst_n, // Active-low asynchronous reset
input wire load, // Load preset value (priority 2)
input wire en, // Count enable (priority 3)
input wire [WIDTH-1:0] preset, // Value to load when load = 1
output reg [WIDTH-1:0] count, // Current count value
output wire tc // Terminal count: count = all ones
);
always @(posedge clk or negedge rst_n) begin
if (!rst_n)
count <= {WIDTH{1'b0}}; // Reset overrides everything
else if (load)
count <= preset; // Load overrides counting
else if (en)
count <= count + 1'b1; // Count when enabled
// Implicit else: hold current value
end
assign tc = en & (&count);
endmodule
Typical applications:
• Programmable timer: load the desired timeout count, then enable counting — the terminal count flag signals expiry
• Variable PWM duty cycle: load a new compare value at the start of each PWM period to change the duty cycle dynamically
• Adjustable baud rate: load a new divider value to switch baud rates without resetting the system
10.9.5 Production-Ready Counter Module
The following module integrates all control signals into a single, fully parameterized counter suitable for direct use in course lab work and project designs. It combines asynchronous reset, clock enable, synchronous load, and bidirectional counting, with a complete set of status outputs.
// Production-Ready Parameterized Counter
// Supports: asynchronous reset, clock enable, synchronous load, up/down direction
// Priority: rst_n (async) > load > en > hold
// Status outputs: overflow, underflow, terminal count
module counter_full #(
parameter WIDTH = 8, // Counter width in bits
parameter RST_VALUE = {WIDTH{1'b0}} // Reset value (default: all zeros)
) (
input wire clk,
input wire rst_n, // Active-low asynchronous reset
input wire en, // Clock enable
input wire load, // Synchronous load (overrides counting)
input wire dir, // Count direction: 1 = up, 0 = down
input wire [WIDTH-1:0] preset, // Preset value loaded when load = 1
output reg [WIDTH-1:0] count, // Current count value
output wire ovf, // Overflow: counting up from all-ones
output wire unf, // Underflow: counting down from all-zeros
output wire tc_up, // Terminal count up: count = all-ones
output wire tc_down // Terminal count down: count = all-zeros
);
always @(posedge clk or negedge rst_n) begin
if (!rst_n)
count <= RST_VALUE; // Priority 1: async reset
else if (load)
count <= preset; // Priority 2: synchronous load
else if (en) begin
if (dir)
count <= count + 1'b1; // Priority 3a: count up
else
count <= count - 1'b1; // Priority 3b: count down
end
// Implicit priority 4: hold (en = 0, load = 0, no reset)
end
// Status outputs (combinational)
assign tc_up = (&count); // All bits = 1
assign tc_down = ~(|count); // All bits = 0
assign ovf = en & dir & tc_up; // Up overflow
assign unf = en & ~dir & tc_down; // Down underflow
endmodule
Port Description
| Port | Direction | Description |
|---|---|---|
| clk | Input | System clock — all operations are synchronous to rising edge |
| rst_n | Input | Active-low asynchronous reset — clears counter to RST_VALUE immediately |
| en | Input | Clock enable — counter advances only when en = 1 |
| load | Input | Synchronous load — loads preset on next rising edge, overrides counting |
| dir | Input | Direction — 1 = count up, 0 = count down |
| preset | Input | Preset value loaded into counter when load = 1 |
| count | Output | Current count value |
| ovf | Output | Overflow flag — pulses high when counting up from maximum value |
| unf | Output | Underflow flag — pulses high when counting down from zero |
| tc_up | Output | Terminal count up — high when count = all-ones |
| tc_down | Output | Terminal count down — high when count = all-zeros |
10.9.6 Practical Application: Programmable Hardware Timer
A programmable hardware timer is one of the most common peripheral modules in embedded FPGA designs. It allows software or higher-level logic to set a timeout period in clock cycles and receive a flag when that period has elapsed. The following implementation uses the counter_loadable module from Section 10.9.4 as its core counting element.
// Programmable Hardware Timer
// Operation:
// 1. Set period_val to the desired timeout in clock cycles minus 1
// 2. Assert start to begin timing (internally loads period_val and enables counter)
// 3. expired pulses high for one clock cycle when the countdown reaches zero
// 4. The timer automatically reloads and restarts if auto_reload = 1
// 5. Assert rst_n low at any time to immediately cancel and reset the timer
module hw_timer #(
parameter WIDTH = 16 // Timer width; max timeout = 2^WIDTH - 1 clock cycles
) (
input wire clk,
input wire rst_n, // Active-low asynchronous reset
input wire start, // Start / restart the timer
input wire auto_reload, // 1 = restart automatically on expiry
input wire [WIDTH-1:0] period_val, // Timeout period in clock cycles (minus 1)
output wire running, // High while timer is active
output wire expired // Pulses high for one cycle on timeout
);
reg [WIDTH-1:0] count;
reg active;
wire tc;
// Active flag: set by start, cleared on expiry (unless auto_reload)
always @(posedge clk or negedge rst_n) begin
if (!rst_n)
active <= 1'b0;
else if (start)
active <= 1'b1;
else if (tc && !auto_reload)
active <= 1'b0; // Stop after one period if auto_reload is off
end
// Countdown counter: loads period_val on start or auto-reload expiry
always @(posedge clk or negedge rst_n) begin
if (!rst_n)
count <= {WIDTH{1'b0}};
else if (start || (tc && auto_reload))
count <= period_val; // Load period on start or auto-reload
else if (active)
count <= count - 1'b1; // Count down while active
end
// Terminal count: counter has reached zero
assign tc = active & ~(|count);
assign expired = tc;
assign running = active;
endmodule
Usage Example
// Instantiate a 1-second one-shot timer on a 50 MHz system clock
// period_val = 50,000,000 - 1 = 49,999,999 clock cycles
hw_timer #(
.WIDTH (26) // $clog2(50_000_000) = 26 bits required
) u_1sec_timer (
.clk (clk),
.rst_n (rst_n),
.start (timer_start),
.auto_reload (1'b0), // One-shot: stop after one period
.period_val (26'd49_999_999), // 1 second at 50 MHz
.running (timer_running),
.expired (timer_expired)
);
// Instantiate a 1 kHz auto-reload periodic timer (1 ms period)
// period_val = 50,000 - 1 = 49,999 clock cycles
hw_timer #(
.WIDTH (16) // $clog2(50_000) = 16 bits required
) u_1ms_timer (
.clk (clk),
.rst_n (rst_n),
.start (1'b1), // Always running
.auto_reload (1'b1), // Periodic: restart automatically
.period_val (16'd49_999), // 1 ms at 50 MHz
.running (), // Not used
.expired (tick_1ms) // 1 kHz tick output
);
10.9.7 Design Rules Summary
| Design Decision | Rule |
|---|---|
| Control signal priority | Always enforce reset > load > enable > hold in the if / else if chain. Never allow enable or load to block a reset. |
| Reset type | Use asynchronous reset for control logic counters (timeout, watchdog, retry). Use synchronous reset for datapath counters (baud rate, PWM, pipeline). |
| Load vs Reset | Use load to set a non-zero starting value at runtime. Use reset only to return to the fixed reset value. Never repurpose reset to load an arbitrary value. |
| Terminal count output | Always derive terminal count flags combinationally from the count value using reduction operators (&count, |count). Never register the terminal count inside the counter — this adds one cycle of latency that the downstream logic must compensate for. |
| Enable gating | Always use en inside the always block to map to the LE's dedicated ENA pin. Never gate the clock signal directly with en. |
| Parameterization | Use parameter for WIDTH and RST_VALUE. Use $clog2 to compute the minimum bit-width from any modulus or period value. |
| Simulation | Test all combinations of simultaneous control signals: reset during load, reset during count, load during count. Verify that priority ordering is correctly enforced in simulation before synthesis. |
10.10 Simulation and Waveform Verification
10.11 Common Mistakes
10.10 Simulation and Waveform Verification
Every register and counter module developed in this chapter must be verified through simulation before it is submitted for synthesis or deployed on the DE10-Lite board. Simulation confirms that the RTL code produces the intended logical behavior under all relevant input conditions — including boundary values, simultaneous control signal assertions, and reset sequences. This section provides a complete simulation workflow using Icarus Verilog and GTKWave, the tools used throughout this course.
10.10.1 Why Simulate Before Synthesizing?
Synthesis and simulation serve different purposes and are not interchangeable verification methods. A design that compiles without errors in Quartus Prime is not necessarily correct — it only means the RTL is syntactically valid and structurally mappable to hardware. Simulation is the only way to verify that the logic behaves as intended before committing to hardware.
What Simulation Verifies
• Functional correctness — does the output match the expected sequence?
• Control signal priority — does reset override enable? Does load override counting?
• Boundary conditions — does the counter wrap correctly at the terminal value?
• Timing relationships — do outputs update on the correct clock edge?
• Reset behavior — does asynchronous reset take effect immediately, without waiting for a clock edge?
Simulation-Synthesis Mismatch
A simulation-synthesis mismatch occurs when the simulated behavior of a module differs from the behavior of the synthesized hardware. This is one of the most difficult categories of bug to diagnose because both the simulation and the synthesis appear to succeed without errors. The most common causes in register and counter design are:
• Blocking assignment ( = ) used in sequential logic — simulation executes assignments sequentially within the time step, which does not model flip-flop behavior correctly. Synthesis ignores the ordering and produces a flip-flop regardless, causing the simulated and synthesized behaviors to diverge.
• Non-blocking assignment ( <= ) used in combinational logic — simulation schedules all updates to the end of the time step, which may delay the combinational output by one simulation delta and cause incorrect results in downstream logic.
• initial block used for register initialization — simulation applies the initial value at time zero, but the synthesized hardware may not reproduce this behavior after a reset event, as discussed in Section 10.1.6.
• Incomplete sensitivity list — if always @(*) is replaced with a manually specified list that omits a signal, simulation will not re-evaluate the block when that signal changes, but synthesis will always produce combinational logic that responds to all inputs.
10.10.2 Testbench Structure for Registers and Counters
A testbench is a non-synthesizable Verilog module that instantiates the design under test (DUT), drives its inputs, and optionally checks its outputs. For the register and counter modules in this chapter, the standard testbench structure consists of four sections: clock generation, reset sequence, stimulus application, and waveform dump.
// Standard Testbench Template for Chapter 10 Register and Counter Modules
// Replace and port connections to adapt for any DUT
`timescale 1ns / 1ps // Time unit: 1 ns; time precision: 1 ps
module tb_template;
// -----------------------------------------------------------------------
// 1. Signal declarations (mirror the DUT port list)
// -----------------------------------------------------------------------
reg clk;
reg rst_n;
reg en;
// Add additional DUT input signals here
wire [7:0] count; // DUT output signals declared as wire
wire ovf;
// -----------------------------------------------------------------------
// 2. DUT instantiation
// -----------------------------------------------------------------------
// Replace with the actual module name and correct port connections
counter_async_rst #(
.WIDTH (8)
) u_dut (
.clk (clk),
.rst_n (rst_n),
.en (en),
.count (count),
.ovf (ovf)
);
// -----------------------------------------------------------------------
// 3. Clock generation: 50 MHz (period = 20 ns)
// -----------------------------------------------------------------------
initial clk = 1'b0;
always #10 clk = ~clk; // Toggle every 10 ns -> 20 ns period -> 50 MHz
// -----------------------------------------------------------------------
// 4. Waveform dump for GTKWave
// -----------------------------------------------------------------------
initial begin
$dumpfile("tb_template.vcd"); // Output VCD file name
$dumpvars(0, tb_template); // Dump all signals in this testbench
end
// -----------------------------------------------------------------------
// 5. Stimulus: reset sequence followed by functional tests
// -----------------------------------------------------------------------
initial begin
// Apply asynchronous reset for 3 clock cycles
rst_n = 1'b0;
en = 1'b0;
#35; // Hold reset for 35 ns (just past 1.5 clock edges)
rst_n = 1'b1; // Release reset
#20; // Wait one full clock cycle before applying stimulus
// --- Functional test stimulus goes here ---
en = 1'b1;
#200; // Count for 10 clock cycles (10 x 20 ns)
en = 1'b0;
#40; // Hold for 2 cycles to verify hold behavior
en = 1'b1;
#400; // Continue counting
// End simulation
#20;
$finish;
end
endmodule
Key points in this template:
• The `timescale directive must appear at the top of the testbench file. Without it, Icarus Verilog uses a default time unit that may produce unexpected delay behavior.
• Reset is held for 35 ns (just past 1.5 clock edges) to ensure it is asserted across at least one complete clock cycle, regardless of when in the clock period it is applied.
• $dumpfile and $dumpvars generate a VCD (Value Change Dump) file that GTKWave reads to display the waveform.
• All DUT inputs are declared as reg in the testbench (so they can be driven in initial and always blocks); all DUT outputs are declared as wire.
10.10.3 Simulating the Full-Featured Register (Section 10.4.5)
The following testbench targets the dff_async_rst_en module from Section 10.4.5. It verifies reset priority, enable behavior, and the hold state in sequence.
`timescale 1ns / 1ps
module tb_dff_async_rst_en;
reg clk, rst_n, en;
reg d;
wire q;
// DUT instantiation
dff_async_rst_en u_dut (
.clk (clk),
.rst_n (rst_n),
.en (en),
.d (d),
.q (q)
);
// 50 MHz clock
initial clk = 0;
always #10 clk = ~clk;
// Waveform dump
initial begin
$dumpfile("tb_dff_async_rst_en.vcd");
$dumpvars(0, tb_dff_async_rst_en);
end
initial begin
// --- Phase 1: Power-on reset ---
rst_n = 0; en = 0; d = 0;
#35;
rst_n = 1;
#20;
// --- Phase 2: Verify enable = 1, d = 1 -> q should capture 1 ---
en = 1; d = 1;
#20; // One clock cycle
// Expected: q = 1
// --- Phase 3: Verify enable = 0 -> q should hold ---
en = 0; d = 0;
#40; // Two clock cycles
// Expected: q remains 1 (hold, not capturing d = 0)
// --- Phase 4: Verify enable = 1, d = 0 -> q should capture 0 ---
en = 1;
#20;
// Expected: q = 0
// --- Phase 5: Verify async reset priority over enable ---
en = 1; d = 1;
#20; // Capture d = 1 -> q = 1
rst_n = 0; // Assert reset mid-cycle (not on clock edge)
#5; // 5 ns later: q should already be 0 (async, no clock needed)
rst_n = 1;
#15;
// Expected: q went to 0 immediately after rst_n asserted,
// without waiting for the next clock edge
// --- Phase 6: Normal operation after reset release ---
en = 1; d = 1;
#20;
// Expected: q = 1
#20;
$finish;
end
endmodule
Expected GTKWave Waveform Behavior
• Phase 1: q stays at 0 during reset, then holds 0 after release (en = 0).
• Phase 2: q transitions to 1 on the next rising clock edge after en is asserted.
• Phase 3: q remains 1 across two clock edges despite d = 0 — confirming hold behavior.
• Phase 5 (critical): q drops to 0 within a few nanoseconds of rst_n going low — not on the next clock edge. This confirms that the reset is truly asynchronous.
10.10.4 Simulating the Production-Ready Counter (Section 10.9.5)
The following testbench targets the counter_full module from Section 10.9.5. It systematically exercises all control signal combinations and boundary conditions.
`timescale 1ns / 1ps
module tb_counter_full;
// Parameters matching the DUT
localparam WIDTH = 4; // Use 4-bit width for readable waveform
reg clk, rst_n, en, load, dir;
reg [WIDTH-1:0] preset;
wire [WIDTH-1:0] count;
wire ovf, unf, tc_up, tc_down;
// DUT instantiation
counter_full #(
.WIDTH (WIDTH),
.RST_VALUE ({WIDTH{1'b0}})
) u_dut (
.clk (clk),
.rst_n (rst_n),
.en (en),
.load (load),
.dir (dir),
.preset (preset),
.count (count),
.ovf (ovf),
.unf (unf),
.tc_up (tc_up),
.tc_down (tc_down)
);
// 50 MHz clock
initial clk = 0;
always #10 clk = ~clk;
// Waveform dump
initial begin
$dumpfile("tb_counter_full.vcd");
$dumpvars(0, tb_counter_full);
end
initial begin
// ---------------------------------------------------------------
// Phase 1: Power-on async reset
// ---------------------------------------------------------------
rst_n = 0; en = 0; load = 0; dir = 1; preset = 0;
#35;
rst_n = 1;
#20;
// Expected: count = 0 after reset
// ---------------------------------------------------------------
// Phase 2: Count up — verify basic increment and tc_up flag
// ---------------------------------------------------------------
en = 1; dir = 1;
#(20 * 16); // Count through all 16 values (0..15 for 4-bit)
// Expected: count cycles 0 -> 1 -> ... -> 15 -> 0
// ovf pulses high for one cycle when count was 15 and incremented
// ---------------------------------------------------------------
// Phase 3: Disable counting — verify hold behavior
// ---------------------------------------------------------------
en = 0;
#60; // 3 clock cycles
// Expected: count does not change
// ---------------------------------------------------------------
// Phase 4: Load preset value — verify load priority over enable
// ---------------------------------------------------------------
en = 1; load = 1; preset = 4'd10;
#20; // One clock cycle
load = 0;
// Expected: count = 10 on the cycle after load is asserted
// ---------------------------------------------------------------
// Phase 5: Count down from loaded value — verify unf flag
// ---------------------------------------------------------------
dir = 0; // Count down
#(20 * 12);
// Expected: count decrements 10 -> 9 -> ... -> 0
// unf pulses high when count was 0 and decremented (wraps to 15)
// ---------------------------------------------------------------
// Phase 6: Verify reset priority over load
// ---------------------------------------------------------------
load = 1; preset = 4'd7;
#5; // Mid-cycle: assert reset while load is active
rst_n = 0;
#5;
// Expected: count immediately goes to 0 (reset wins over load)
rst_n = 1; load = 0;
#20;
// ---------------------------------------------------------------
// Phase 7: Verify reset priority over enable + counting
// ---------------------------------------------------------------
en = 1; dir = 1;
#40; // Count up for 2 cycles
rst_n = 0; // Assert async reset mid-cycle
#5;
// Expected: count immediately clears to 0 without waiting for clock
rst_n = 1;
#20;
// ---------------------------------------------------------------
// Phase 8: Verify simultaneous load and enable (load has priority)
// ---------------------------------------------------------------
en = 1; load = 1; preset = 4'd5; dir = 1;
#20;
load = 0;
// Expected: count = 5 (load wins over enable; counter does not increment)
#40;
// Expected: count increments from 5
#20;
$finish;
end
endmodule
Simulation Test Coverage Summary
| Phase | Test Condition | Expected Observation |
|---|---|---|
| 1 | Power-on async reset | count = 0 immediately, no clock edge required |
| 2 | Count up, full range | 0 → 15 → 0 wrap, ovf pulses at wrap |
| 3 | Enable = 0 | Count holds for 3 cycles |
| 4 | Load while enabled | Count jumps to preset = 10, load wins over enable |
| 5 | Count down from 10 | 10 → 0 → 15 wrap, unf pulses at wrap |
| 6 | Reset while loading | Count = 0 immediately, reset wins over load |
| 7 | Async reset mid-cycle | Count = 0 within nanoseconds, no clock edge needed |
| 8 | Load and enable simultaneously | Count = preset (load priority); counting resumes next cycle |
10.10.5 Running Simulation with Icarus Verilog
The following commands compile and simulate the counter testbench using Icarus Verilog in the course Jupyter environment. The same pattern applies to any module in this chapter — replace the filenames as needed.
// Step 1: Compile the DUT and testbench together
// iverilog -o [additional_files]
iverilog -o sim_counter counter_full.v tb_counter_full.v
// Step 2: Run the simulation (generates the .vcd waveform file)
vvp sim_counter
// Step 3: Open the waveform in GTKWave
gtkwave tb_counter_full.vcd &
In the Jupyter notebook environment used in this course, wrap the commands in conda run to ensure the correct environment is active:
// Jupyter notebook cell: compile and simulate using the course conda environment
import subprocess
subprocess.run([
"conda", "run", "-n", "py312_vsc_base",
"iverilog", "-o", "sim_counter", "counter_full.v", "tb_counter_full.v"
], check=True)
subprocess.run([
"conda", "run", "-n", "py312_vsc_base",
"vvp", "sim_counter"
], check=True)
// GTKWave is launched separately from the desktop environment
10.10.6 GTKWave Waveform Interpretation Guide
After opening the VCD file in GTKWave, add all relevant signals to the waveform view and apply the following interpretation techniques specific to register and counter verification.
Identifying Clock Edge Alignment
• All synchronous signal transitions (count increment, enable capture, load) must occur on the rising edge of clk. In GTKWave, zoom into a transition and confirm that the output changes align with the rising clock edge — not before it and not significantly after it.
• If an output changes between clock edges (not on a rising edge), it indicates either a combinational output (correct for assign statements) or an unintended latch (incorrect for sequential logic).
Verifying Asynchronous Reset
• After asserting rst_n = 0 mid-cycle (Phase 5 and Phase 7 of the testbench), zoom into the waveform and confirm that count transitions to 0 within a few nanoseconds of rst_n going low — independent of the clock.
• If count only clears on the next rising clock edge, the reset has been synthesized as synchronous despite the sensitivity list containing negedge rst_n. Review the RTL and confirm the if (!rst_n) branch appears first in the always block.
Verifying Enable Hold Behavior
• When en = 0, the count waveform must appear as a flat horizontal line — no transitions across any number of clock edges. Any transition while en = 0 indicates that the enable logic has not been correctly implemented.
Measuring Counter Period and Terminal Count Pulse Width
• Use GTKWave's Marker tool (press Ctrl+M to place a marker) to measure the time between two identical count values — this gives the counter period in nanoseconds.
• The terminal count flags ( tc_up, ovf) must be exactly one clock period wide. A pulse wider than one cycle indicates the comparator logic is incorrect; a pulse that never appears indicates the terminal count value is not being reached.
Common Waveform Anomalies and Their Causes
| Waveform Observation | Likely Cause | Fix |
|---|---|---|
| Output changes between clock edges | Unintended latch or combinational feedback | Check for incomplete if/else in always @(*) |
| Reset only takes effect on clock edge | Asynchronous reset coded as synchronous | Add negedge rst_n to sensitivity list |
| Count changes when en = 0 | Enable not checked or wrong polarity | Verify else if (en) branch in always block |
| Count never reaches terminal value | Width too small or modulo value incorrect | Check WIDTH and MODULO parameter values |
| ovf pulse lasts more than one clock cycle | Terminal count flag incorrectly registered | Use combinational assign for ovf |
| Load does not override enable | Wrong priority in if/else if chain | Ensure load is checked before en |
| All outputs show X (unknown) throughout | Reset never deasserted in testbench | Confirm rst_n = 1 is applied after the reset period |
10.10.7 Simulation Verification Checklist
Apply the following checklist to every register and counter module before proceeding to synthesis or board testing. Each item must be confirmed in the GTKWave waveform.
| Check Item | What to Verify in GTKWave |
|---|---|
| Power-on reset | Output is at reset value from time 0; transitions correctly after rst_n release |
| Async reset immediacy | Output clears within nanoseconds of rst_n going low — not on the next clock edge |
| Clock enable hold | Output is flat (no transitions) across multiple clock edges when en = 0 |
| Count sequence | Every count value in the expected sequence appears exactly once per clock cycle |
| Wrap-around / terminal count | Counter wraps at the correct value; ovf / unf pulses for exactly one clock cycle |
| Load priority over enable | When both load and en are asserted, output = preset (not count + 1) |
| Reset priority over load | When rst_n is asserted while load is active, output = reset value (not preset) |
| Direction control | Count increments when dir = 1 and decrements when dir = 0 |
| No X states after reset | No signals show unknown (X) or high-impedance (Z) values after rst_n is released |
| Simulation-synthesis match | Blocking vs non-blocking assignments are used correctly; no initial blocks in synthesizable RTL |