How does pipelining improve cpu efficiency




















On a non-pipelined CPU, an instruction could only take 3 cycles, but on a pipelined CPU it could take 4 cycles because of the different stages involved.

Therefore, a single instruction might require more clock cycles to execute on a pipelined CPU. But the time taken to complete the execution of multiple instructions gets faster in pipelined CPUs. So there needs to a balance in between. One of the major complications with deep pipelining eg, stage pipelining used in some of the Intel Pentium 4 processors is when a conditional branch instruction is being executed — due to the fact that the processor will not be able to determine the location of the next instruction, therefore it has to wait for the branch instruction to finish and the whole pipeline may need to be flushed as a result.

If a program has many conditional branch instructions, pipelining could have a negative effect on the overall perfomance. To alleviate this problem, branch prediction can be used, but this too can have a negative effect if the branches are predicted wrongly.

Toggle navigation Stack Pointer. So what does all this have to do with pipelining? As I explained earlier, the various parts of an instruction that I've talked about use different components within the CPU. Pipelining makes CPU access more efficient by ensuring that most of the CPU's components are being used simultaneously.

Pretend for a moment that four instructions have been placed into a CPU's pipeline. The CPU begins working on those instructions by performing the fetch portion of the first instruction. Once the fetch is complete, the CPU can move on to the decode phase of the first instruction. Keep in mind though that the portion of the CPU that handles the fetch function is now freed up.

Therefore, the CPU can begin working on the fetch portion of the second instruction at the same time that it is working on the decode portion of the first instruction. When the CPU is ready to perform the execute portion of the first instruction, the fetch portion of the second instruction is done. Therefore, the CPU can begin working on the decode portion of the second instruction and the fetch portion of the third instruction.

If you're a little confused by what's going on, then take a look at Table A. As you can see in the diagram above, the second, third, and fourth instructions have begun before the first instruction has completed. For this reason, the length of the pipeline can have a tremendous effect on the CPU's performance. Obviously, if the pipeline is too short then CPU resources are wasted.

Let me show you what happens if the pipeline is too long though. Remember that last component to a general instruction that I talked about yet? The Write components job is to write the results of the instruction to a register within the CPU.

The Write function is not always used, but it is used when mathematical operations or comparisons are performed. To see why this is important, consider the following two mathematical equations:. Granted, these instructions are simple arithmetic and not machine language code, but they will do fine illustrating my point. If you look at the second line of code, you will see that it cannot be solved until a value for X can be established. The problem is that if these were CPU instructions, the second instruction would have already begun before the first equation's results could be written to the CPU's register.

This results in a CPU stall. A stall as a condition in which instructions can't be processed until an earlier instruction completes.

Newer processors contain special forwarding hardware designed to minimize the impact of dependency based equations such as the one that I just showed you. This hardware greatly reduces the need for stalls. Another problem with pipelining is branching to see why branching is an issue, consider of the lines of pseudo code below:.

As you can see in the pseudo code above, we are telling the CPU to perform an action if Y is equal to six and to perform a different action if Y is equal to anything else. The problem with this type of instruction is that it is conditional. There are two possible instructions that could follow this conditional instruction. When a pipeline is in use this presents a problem because multiple instructions are being processed simultaneously in a staged manner.

Because branches are very common in programs, a CPU that uses a pipeline must predict the outcome of the branch instruction. A subsequent instruction would then be placed into the pipeline based on the predicted outcome of the branch. What if the prediction is wrong though? If the branch instruction's outcome is predicted incorrectly, then subsequent instructions within the pipeline are also incorrect.

When this occurs, the pipeline must be flushed. This means that all remaining instructions must be removed from the pipeline, and new instructions must be fed into the pipeline based on the outcome of the branch instruction. The impact of a pipeline flush greatly depends on the length of the pipeline.

If the pipeline is short, then it will only contain a few instructions, so it's no big deal to remove those. If the pipeline is long though, then many instructions will have to be removed, resulting in a significant CPU stall. Although there are issues with using pipelines, a pipeline can make a CPU more efficient so long as the pipeline is of an appropriate length.

If the pipeline is too short, then CPU resources may be wasted. If the pipeline is too long, then a pipeline flush can cause a major delay. There are two different kinds of RAW dependency such as define-use dependency and load-use dependency and there are two corresponding kinds of latencies known as define-use latency and load-use latency. The define-use latency of instruction is the time delay occurring after decoding and issue until the result of an operating instruction becomes available in the pipeline for subsequent RAW-dependent instructions.

If the value of the define-use latency is one cycle, and immediately following RAW-dependent instruction can be processed without any delay in the pipeline. The define-use delay of instruction is the time a subsequent RAW-dependent instruction has to be interrupted in the pipeline. The define-use delay is one cycle less than the define-use latency. The term load-use latency load-use latency is interpreted in connection with load instructions, such as in the sequence. In this example, the result of the load instruction is needed as a source operand in the subsequent ad.

The notion of load-use latency and load-use delay is interpreted in the same way as define-use latency and define-use delay. The latency of an instruction being executed in parallel is determined by the execute phase of the pipeline.



0コメント

  • 1000 / 1000