Browse your IT world

A processor video. Part II – Effect of cache size and CPU speed RAM

Posted by on Feb 13, 2012 in Video

A processor video. Part II – Effect of cache size and CPU speed RAM

Preface to the second part

Since the publication of the article "a processor video. Part I – Analysis of "we got a lot of feedback from you, dear readers. Along with the questions, when will the second part of the material, as were many comments about the graphs, and doubts about their reliability in certain specific cases.

Today we will explain a few nuances, which caused great interest among the public, but have not been described in detail in the first part. Consider the effect of cache size and CPU speed of the memory performance in 3D-games. And also came very close to the issue of comparing platforms in general. So, let's begin.

«Disconnect» № 1. Or – Where did "zero"

Pay attention to the chart below.

This chart is taken by us from the first part of the article. We see that the lines that reflect the performance of graphics cards in different modes, while reducing the CPU frequency converge to the same oblique line. "Discrepancy" is that if we try to extend this approximation to the line to the intersection with the axis of the "FPS", we see that the line does not come to the origin, and a little higher. It turns out that at zero frequency of the CPU we can play, and at a rate of as much as 15 frames per second!

How can this be? If we disregard the test conditions in which we get results, then this situation is still theoretically possible. For example, we load the card in some of the data, along with shader programs to be performed by the video chip, and let them all stand-alone spinning, the CPU is not needed here. Examples of the use of graphics processors for mathematical calculations known. But in our testing conditions to obtain such a result is physically impossible. Someone has to calculate the position of objects in the scene instead of playing CPU! How do actually have to behave in a processor graph, as the frequency of the CPU to zero?

An attempt to conduct an experiment on real hardware is complicated by the fact that lower values ​​of the CPU multiplier is not available, but if you just take some 'very weak processor "- we change the platform, that is, test conditions, therefore, correct to compare the test results will not be possible . What to do?

Let's try to predict the "behavior" of the curve with a processor logic and the real "behavior" of a typical personal computer. To do this we need to delve into some of the principles of operating systems with preemptive multitasking. Do not be afraid of the long term. Rather, for work or play you are using just this operating system. We are talking about well-known to all operating systems – Windows XP. In addition to the Windows XP operating system with preemptive multitasking and are Windows2000, and all clones of Linux.

The peculiarity of these operatsionok, essential for our consideration, is how they manage resources "iron" – namely, the distribution of CPU time for the simultaneous execution of multiple tasks. We were sitting at a PC, it seems that everything is handled the same time – and download files from the Internet and play music and record CD-ROM, but the reality is somewhat different. All applications that you run on your computer, run in a strict sequence! There is no contradiction here. As a processor, all applications run at a time, to "piece together". But these pieces are so small and the operating system so quickly switch between them, that person is not able to notice it, and the illusion that everything done at once. Speaking briefly and simply, all the time the CPU is divided into several periods, or "quanta" of time. And then these "quanta" time "issued" applications, such as – do some work here you are, here is the processor for a couple of milliseconds. In this multi-tasking operating system kernel itself uses some of these "quanta" of CPU time, in order to work the system services, and just – it is necessary OSes to "think", which application to give the following "quantum". That is, there are some "unproductive" (from the perspective of the user application), the loss of CPU time to go to the maintenance of proper operating system.

All the above is directly related to our "inconsistencies."

And here's why. If the operating system to ensure its efficiency requires a fixed number of "quanta" of the CPU, it is obvious that reducing the CPU clock speed of the free "quanta" that can be allocated to the application (in this case, 3D-games), will decrease faster than the CPU frequency. You can express it in different words. Suppose that at a frequency of 100 MHz processor its performance will be enough to serve only the operating system. Then, to obtain an equivalent frequency of CPU, that is, the number of "megahertz" that available to the application, we have from the real frequency of the processor to deduct those same 100 MHz to be allocated to the operating system. In this case, it turns out that at a frequency of 1000 MHz CPU value of the "correction for the OSes" is 10%, at a frequency of 200 MHz CPU – already 50%, and at a frequency of 100 MHz CPU – we get 0 FPS. The following chart we have illustrated all of the above.

Red dashed line denotes the expected behavior of the curve as the frequency of a processor CPU to zero. Warning – this line is drawn at random and is not displaying any of the experimental data!

It may seem strange, why do we pay so much attention to this matter and time. After all processors with frequencies as low as they are almost not used in personal computers, and the usefulness of the experiment, even if it failed to do – at first glance, no. All true, but not quite.

Let us ask ourselves – "but as you can eliminate or minimize the impact of the operating system on the speed of the application?". That is – if at all possible to get a processor graph, passing through the origin? Looking ahead, for example – is possible if the operating system will run … on a different processor. But the issue we shall return later.

Discrepancy number 2. The nonlinearity of the "line of best possible outcomes"

We just looked at the behavior of the "line of best possible results" in reducing the frequency of CPU. Now let's see what happens if we go the other way and will increase the frequency of CPU.

Actually, the essence of the "inconsistencies" clearly visible on all the same graph, which we discussed above. That is – no matter where the "line of best possible results," we did not build a tangent with increasing CPU frequency further results deviate from the tangent down. Why is the schedule should not be a linear law, and begins to "bend down" to the X-axis? Here are a few reasons for this phenomenon.

The first reason – the power consumption of the CPU to the needs of the operating system. The question that has already been discussed above.

The second reason – the influence of the multiplier CPU. The question is, what have the multiplier CPU? And besides, if only to increase the processor speed by a factor, we seem to be increasing CPU power to process the data, but because they still have to deliver the core CPU, and CPU bus speed remains unchanged. For problems with large amounts of data that must be processed and will not fit in the cache memory of the processor, it may be a time when the CPU core has shortchanged the available data, and waits for the next portion of the swap. That is, the processor starts to stall, which can be viewed as reducing the "effective" frequency of the CPU.

The third reason – the distribution of CPU time between the graphics driver (which runs on the CPU) and the actual calculations of the game (also running on the CPU). The situation is somewhat confusing, because both tasks use the CPU, and the graphics driver can be attributed both to a component of the operating system (in architecture), and an important link in terms of implementation of 3D-applications.

Other possible causes – latency and bandwidth of memory, processor bus, etc.

List the reasons given is not definitive and exhaustive, and if you want you can find several factors that cause the behavior of the "line of maximum results" will differ from a straight line. Determination of the degree of influence each of these causes and search for "bottleneck" – a rather broad topic for research.

Before proceeding to the consideration of particular issues, we formulate a general postulate:

In multivariate linear dependence among some of the specific parameter values ​​can only be achieved if there is no restriction on the part of all other parameters.

Or, in other words, it is one option, depending on which is plotted, and should be the most limiting factor.

With regard to our consideration of a processor video, this means that in addition to CPU, the performance of all other components must be sufficient and does not create any restrictions. That is – should be a powerful graphics card and work in the flyweight of the modes (eg, 640×480 instead of 1600×1200 with otherwise identical settings), memory must operate at maximum speed, the influence of the operating system is minimized, etc.

Whatever it was, in practice, by increasing the frequency of the CPU, we still see growth in the "line of best possible results." While this growth is not strictly linear, to assess the "ceiling" of a possible performance in 3D-platform applications, it is quite applicable.

Next, we consider several factors that affect the performance of your computer in 3D-applications. But it will be nothing of the things that we can to some extent controlled by choosing the type of CPU and RAM, that is – choosing the "platform" to launch 3D-applications.