It may be useful for you guys, I am just posting After clock tree synthesis(CTS), many timing paths that end at clock gate/ICG enable pins appear. Why didn't these paths get fixed in placement, and how can I handle them? After clock tree synthesis, clock gates becoming critical because, by default, at placement level, they have the same latency applied to their clock pin arrival times as do the register clock pins. Once a clock tree is constructed, the clock gates will be in the of the clock tree, not at the leaf. Therefore, the clock arrival times are seen to be earlier than that at the clock leaf pins, and timing is impacted. The following shows a simple example: * Pre-CTS, the register pins and clock pins of clock gates see a clock latency of 0ns which models the same arrival time for both. * Post-CTS, the clock gates are now halfway through the tree and see a latency of 800ps. However, all registers see a 1.5ns arrival time for their clock pins since they are at the leaf level of the tree. * Any paths from a register to a clock gate now see the difference in clock arrival times, and the pre-CTS slack is degraded by 700ps(1.5ns - 800ps). Since the clock gate is supposed to be at the intermediate point to allow the shut-down portions of the clock tree, it is not correct to assume the clock pins of clock gates should be balanced with the registers. These paths can be addressed in the following ways: First, examine how far down these ICGs lie in the clock tree for post-CTS. Whether they are near the root of the clock tree or the clock pins of the flops can influence how you handle them. * If the clock gates are roughly halfway down the clock tree, you might get the benefit by splitting (replicating) the clock gate. Splitting the clock gate creates parallel copies of the original driver, resulting more clock gate drivers with fewer loads per driver. If the splitting is done for pre-CTS, then we effectively push the clock gate further down the clock tree, increasing power but improving enable timing. See the split_clock_net command. * If the clock gates are either at or near the bottom of the tree, splitting clock gates will unlikely offer any improvement. In this case, you should add a pre-CTS clock latency value to the ICG clock pin so that you model the pre-CTS latency correctly. Using the above example, you would apply a -700ps latency to the clock pin of the clock gate during place_opt but before clock tree synthesis. Applying a latency allows you to correctly model the slack before the actual clock gate clock arrival time is known. * If the ICG is a single "top-level clock gate", which is fed by a relatively small cone of logic, you might apply a float pin constraint to the flip-flops feeding the enable signal logic to get their clocks earlier (useful skew). Sometimes this technique is the best solution for top-level clock gates because it does not impact power; splitting top-level clock gates can have a very large power impact.
Flipflop is made up of two latches , hence by using FF instead of latches leads to increase in area and power consumption of ICG. By far the most important: Now the enable input is captured on the falling edge of the clock in the previous cycle. Therefore, whatever logic is generating it has to do so within half a clock cycle. Using a FF, the logic was able to take the whole clock cycle (minus the setup time) to calculate the enable (because “next cycle’s enable” is allowed to change all the way up to the rising clock edge), but now it’s only got half a clock cycle.
ICG timing is characterized in Library and used as an IP just like any std.cell. Hence instead of using latch and an AND gate, ICG is used directly. Similar Eg. BUFFER. Please correct me if my ans. needs correction.
Hi Mahantesh, ICG placement near source is good for power saving point of view but it has some issue too. And placing ICG near the sink has its own benefit. so people use both options. Kindly try to list out pros and cons in both the cases
@@TeamVLSI Sir, by placing ICG near source point, if the ICG has no two branch which switch ON and OFF differently then it works good. But if there are two or more branches which need not switch ON or OFF at same time, then placing one each ICG cell near destination domain will be feasible.
It may be useful for you guys, I am just posting
After clock tree synthesis(CTS), many timing paths that end at clock gate/ICG enable
pins appear. Why didn't these paths get fixed in placement, and how can I handle them?
After clock tree synthesis, clock gates becoming critical because, by default, at placement level, they
have the same latency applied to their clock pin arrival times as do the register
clock pins. Once a clock tree is constructed, the clock gates will be in the
of the clock tree, not at the leaf. Therefore, the clock arrival
times are seen to be earlier than that at the clock leaf pins, and timing is impacted.
The following shows a simple example:
* Pre-CTS, the register pins and clock pins of clock gates see a clock latency
of 0ns which models the same arrival time for both.
* Post-CTS, the clock gates are now halfway through the tree and see a latency
of 800ps. However, all registers see a 1.5ns arrival time for their clock pins
since they are at the leaf level of the tree.
* Any paths from a register to a clock gate now see the difference in clock
arrival times, and the pre-CTS slack is degraded by 700ps(1.5ns - 800ps).
Since the clock gate is supposed to be at the intermediate point to allow
the shut-down portions of the clock tree, it is not correct to assume the
clock pins of clock gates should be balanced with the registers.
These paths can be addressed in the following ways:
First, examine how far down these ICGs lie in the clock tree for post-CTS.
Whether they are near the root of the clock tree or the clock pins of the
flops can influence how you handle them.
* If the clock gates are roughly halfway down the clock tree, you might get
the benefit by splitting (replicating) the clock gate. Splitting the clock
gate creates parallel copies of the original driver, resulting more clock gate
drivers with fewer loads per driver. If the splitting is done for pre-CTS,
then we effectively push the clock gate further down the clock tree, increasing
power but improving enable timing. See the split_clock_net command.
* If the clock gates are either at or near the bottom of the tree, splitting
clock gates will unlikely offer any improvement. In this case, you
should add a pre-CTS clock latency value to the ICG clock pin so that
you model the pre-CTS latency correctly. Using the above example, you
would apply a -700ps latency to the clock pin of the clock gate during
place_opt but before clock tree synthesis. Applying a latency allows
you to correctly model the slack before the actual clock gate clock
arrival time is known.
* If the ICG is a single "top-level clock gate", which is fed by a relatively
small cone of logic, you might apply a float pin constraint to the
flip-flops feeding the enable signal logic to get their clocks earlier
(useful skew). Sometimes this technique is the best solution for top-level
clock gates because it does not impact power; splitting top-level clock
gates can have a very large power impact.
Flipflop is made up of two latches , hence by using FF instead of latches leads to increase in area and power consumption of ICG.
By far the most important: Now the enable input is captured on the falling edge of the clock in the previous cycle. Therefore, whatever logic is generating it has to do so within half a clock cycle. Using a FF, the logic was able to take the whole clock cycle (minus the setup time) to calculate the enable (because “next cycle’s enable” is allowed to change all the way up to the rising clock edge), but now it’s only got half a clock cycle.
Ans for 3rd question
Are ICG cells part of the netlist? And during which stage do we add ICG cells? Can we add additional ICG cells during placement stage?
Hi Kshitij,
Yes ICG cells are part of netlist. We add ICG cells in RTL as well as synthesis stage.
Good one...sir would you give some TCL scripting videos taking some examples based on timing reports.
Yes, sure Habiba.
Sir when will you post ir drop prevention techniques, I am waiting for that sir
I need some time pls.
@@TeamVLSI ok sir thank you for posting these videos it's very helpful for all sir thank you so much
Thank u sir
Clock gating violations are observed in which stage, why in that stage
Hii sir
What is different between
Clock gating and icg
Sir can you post CRPR
Hi Radha,
Sure, that will be done soon.
@@TeamVLSI thank you sir
ICG timing is characterized in Library and used as an IP just like any std.cell. Hence instead of using latch and an AND gate, ICG is used directly. Similar Eg. BUFFER. Please correct me if my ans. needs correction.
Second question Answer: ICG should be placed near the source.
Third question Answer: No sir, by using flop, we have chances of glitch after AND gate. Hence we use -ve Latch with an AND.
Hi Mahantesh,
ICG placement near source is good for power saving point of view but it has some issue too. And placing ICG near the sink has its own benefit. so people use both options. Kindly try to list out pros and cons in both the cases
Yes that's right but why we can't use latch + and?
@@TeamVLSI Sir, by placing ICG near source point, if the ICG has no two branch which switch ON and OFF differently then it works good. But if there are two or more branches which need not switch ON or OFF at same time, then placing one each ICG cell near destination domain will be feasible.
Sir can you tcl relates videos also sir
Like scripting in tcl
Sure. Will do soon.
@@TeamVLSI thank you sir
I guess near the sink pin..
Its actually depends on the real situations.
Okay sir