Browse your IT world

Notes in the margins. Which is faster, CPU or GPU?

Posted by on Feb 10, 2012 in Video

Notes in the margins. Which is faster, CPU or GPU?

Time goes by, the processors are becoming more powerful and multi-core. Video cards are also increasing the number of computing units, and in addition to a 3D-image trying to solve those problems which are still engaged in CPUs. The developers promise a significant increase in graphics performance, which, in general, supported by figures. But the question remains – is it really graphics architecture is better suited to deal with a well-parallelizable tasks and streaming of large data sets? If so, then why do we need multi-core processors can do is "shift" the burden on the video card?

Today we will try to answer the question – "who shall fight a whale or an elephant?", In relation to competition in the CPU and GPU side physics calculations. This material is not intended to be comprehensive and inclusive, in fact – the issues discussed here are not the only example of a "competition" CPU and GPU for computing. In fact, these notes and appeared only as a result of discussions with colleagues about "who is stronger, CPU or the GPU". Do not hold your breath waiting, it was decided to check out, but really – who? You will not believe it, but the result of the competition was not so obvious, and the results have surprised both sides. And why did it happen now and we'll see.

As a test application, we decided to take the 3DMark Vantage, and specifically, one of the tests included in the package – CPU Physics. The choice, in general, is not due to anything special, we can say – "that caught my arm." Just in 3DMark Vantage, we usually test the video card, and it includes a test of the calculation of "physics", which can be executed on the CPU, and the adapters NVIDIA. Here let us see who thinks "physics" faster.

Test Equipment

For comparison, we have three processors. One of them is quite old – Intel Core 2 Quad QX6850. The second processor is a more modern – AMD Phenom II X4 965. A third even more modern – AMD Athlon II X4 620. Of course, we would have to take another Core i7 or Core i5, but this time they were engaged in other tests. Incidentally, the three existing members of the "CPU" camp will be sufficient to obtain qualitative and quantitative assessments.

As for graphics cards, we used the following three models of NVIDIA:

We do not specify the frequency of cards, as in the testing process, they are constantly changing.

Testing

As a "power density" CPU or GPU, we consider the value of performance in the test 3DMark Vantage CPU Physics Test (which is measured in frames per second) divided by the number of cores or shader units, as well as the frequency in megahertz. That is, we measure the "power density" in the FPS / (MHz * Number of threads). Actually, for this value is left to measure the amount of FPS in the test at different frequencies of processors and graphics cards, as the number of CPU cores is fixed, as the number of stream processors, video cards. So, let's start.

As the CPU is still the "heart" of the computer, start it from there. We decided a little more difficult task and at the same time figuring out how to scale the CPU performance in this test not only the frequency but also on the number of nuclei. Core "turned off" by setting the compliance for the required number of CPU cores for 3DMark Vantage in "Task Manager". This method is not ideal, but for our purposes it is enough. By the way, despite the fact that the Intel Core 2 Quad QX6850 is essentially composed of two cores on a single substrate, of any influence in this test, it did not have. That is an option, when the two cores share a cache of 4 MB and the case where each core uses a cache to 4 MB, showed results identical within error. But scaling the frequency by changing the CPU multiplier downward, and other system parameters remained unchanged. Let's see what happens.

As you can see, with increasing frequency in the test performance increases almost linearly. Theoretically, the straight lines must start from the origin, since at zero frequency, CPU, we just do not get any results, then there is zero FPS. Let's draw a straight line from the origin and show how they match with the experimental curves.

Obtained a very amusing results. The results of the Intel Core 2 Quad QX6850 almost perfectly fit the straight lines (except for the three active galactic nuclei, which may be due to the asymmetry of the just distribution of cache memory between them because of the architecture). The results of the processor AMD Athlon II X4 620 also fit well on a line passing through the origin. But for AMD Phenom II X4 965 all the more difficult. If we draw the line from the origin through the point corresponding to the minimum frequency, the following points deviate from this line down (for the case of one and two active nuclei). If you carry a straight line through the points corresponding to a higher frequency of CPU, it turns out that the results at a frequency of 2000 MHz lie on top of the line. Probably, this behavior results can be explained by the presence of a AMD Phenom cache in the third level. At a frequency of 2000 MHz CPU core and L3 cache operate synchronously, so the result is maximum. With increasing frequency the frequency of the nuclei L3-cache of the processor remains unchanged, and he can make any delay, so the results of the "pass" on the line, the slope is lower.

Now let's calculate the "power density" of a given processor in this test. Recall that this is essentially the slope of the tangent, further divided by the number of involved cores CPU. The results are shown in the table below.

Surprisingly, in the calculations, "physics" in 3DMark Vantage considered AMD processors show slightly better results than the representative of the architecture of the Intel Core 2 Quad.

Now let's see what "specific power" to demonstrate the production of GPU NVIDIA. Since the video processor is a fairly complex device, the question arises – how do this "power density" count? Since the calculations are mainly engaged in shader units, it was decided to plot the results on the basis of this parameter. As for the frequency blocks ROP, it was chosen as possible at a given frequency shaders. As it turned out, the minimum frequency rate of shader units with respect to the frequency of ROP-blocks is two. It is the ratio of the frequencies and maintained throughout all tests.

For this part of the tests used a test bed, based on the Core 2 Quad QX6850, the operating frequency of the processor – 3600 MHz, all four cores are active. The results are shown in the chart below.

As you can see in this test in terms of absolute performance graphics cards significantly outperform CPUs in performance. And even the weakest of the present model is at the minimum frequency is faster than a quad-CPU with a frequency of 3600 MHz. However, the behavior of lines of the results is somewhat different from what we saw for the CPU. More details can be seen in the chart below.

In this graph through the points corresponding to the minimum operating frequencies of graphics cards, we had a direct line. As it turned out, they converge not at the origin and intersect the vertical axis at about 20 FPS. Strange, is not it? As it turned out, nothing strange there, and the behavior of the lines is quite natural. It's enough to look at the CPU load during the test run – it reached 100% for each of the nuclei. If you go back to the data graph number 1, it is easy to see that the test result on the Intel Core 2 Quad QX6850 @ 3600 MHz is just at 18 FPS. We tried to reduce the frequency of the processor and reduce the number of active nuclei, and each time the level of the vertical displacement of the lines for the GPU results with good accuracy coincide with the CPU performance in this test.

With regard to the deviation of the results of the lines constructed lines, then it is explained easily – from a certain point of the shader units, apparently not fully loaded. Perhaps the lack of suitable for parallelization of the load test, and some may play a role in the architecture of the GPU limitations. Whatever it was, let's calculate the "power density" GPU, using, as before, the slope of the constructed lines, divided by the number of stream processors. The results are shown in the table below. Also, it stated "specific power" Intel Core 2 Quad QX6850.

It's hard to believe, but in the test 3DMark Vantage CPU Physics Test "specific power" quite old by today's standards, the CPU is at least a half times more "power density" of modern graphics processors NVIDIA. Such is the paradoxical result. However, we do not propose to abandon the calculations on the GPU in favor of the CPU. In GPU has another trump card – a great performance per watt. These estimations hard to do, so we leave this possibility to the readers.

Well, if we compare the absolute results of the CPU and GPU, obtained in this test, the current processor is not up to them soon dorastut. However, the successes and deny protsessorostroeniya not worth it. Not so long ago published the results of testing six-core processor overclocked Intel Core i9 Gulftown. Overclocked to 5892 MHz, this processor benchmark 3DMark Vantage CPU Physics Test score of 63,01 FPS. If you count the "power density" of new items, get the value of 0.00178 FPS / (MHz * number of cores), which is 1.44 times the "power density" Core 2 Quad QX6850. That is 44% of the gain achieved by the advantages of Architecture and Technology Core i9 HyperThreading.

Although direct confrontation CPU and GPU on all fronts of tasks not yet been, who knows where it will unfold between the two fierce competition. It is worth mentioning AMD Radeon HD 5870 has the computing power of 2,7 TFLOPS, and Microsoft DirectX 11 technology to support Compute Shader, which allows to shift the calculations on the GPU. Whether more will be …