2015年2月16日 星期一

What is hardware cache line ?

When I was learning the SLAB allocator in the Linux kernel, there was always a question in my mind:
WAHT IS "CACHE LINE" ?!
I know CPU has caches, like L1, L2, etc. But why there is a "line" in the name? Moreover, when I was searching for the answer, I encountered more unfamiliar names like "2 way sets cache" or "Fully Associated cache" , and I was confused by those proper nouns, too.

In short, the reason it is called "line" is that a cache needs to be organized in groups, or blocks, and each of them is called "a cache line". What's more important is the way how it's organized. The first method comes out in our mind is to simply divide a memory address(32 bits, for example) into two fields: field at the most-significant-bits side indicates the cache line index, and the other field, the least-significant-bits side, represents the offset within a cache line. Although this sounds pretty straight forward and there do exist a kind of cache organization which is similar to the above description, it still needs to be slightly fixed before putting into practice.

The first kind we are going to introduce, is Fully Associated  cache.
This approach simply split a 32 bits address into 28 + 4 bits. The prior 28 bits field represents the memory block id, so there will be 2^28 memory blocks in this case. As mentioned above, this sort of organization is straight forward, however, it's difficult to implement since every time we want to find a specific memory block among the cache lines, it takes too many comparisons and efforts due to its large tag field, a memory block may locate in any of the cache line.

Before we introduce the solution how most of the modern processors fix the above problem, let's look at the other extreme compared to fully associated cache: Directed Mapped cache
There are some assumptions which are showed at the top-left corner in the picture above. This organization approach is just like its name - it directly maps each memory block to the cache. So the 19 bits field is the id of a memory block and the 9 bits field acts like a "cache line offset within a memory block" which also represents the index of a cache line in the cache. What is the downside of this approach? Well, let's take a look at a vivid example I saw in a lecture web page in Colorado University's ECEE department: If we are going to add two arrays and store the result to the third one which their sizes are all happen to be 16 bytes x 512, since each array perfectly fits the entire cache and each element in the array also perfectly fits a cache line, we might get cache miss on EVERY array access during the calculation!!

Let's look at the usual approach most of the modern processors adopt, thus, the N-Way Sets cache. Here is the 2 way sets:
The N-Ways Sets approach is really similar to the directed mapped approach, that's why I introduced the latter first. If you split the cache into half in the directed mapped cache and "merge" two parts into one by putting them parallel, then you almost get a 2 way sets cache! That is, a cache line now can contain two "slices"(16 bytes, in this case) of memory block which have the same cache line index, and the tag field is charged to identify which memory block this "slice" belongs to.


沒有留言:

張貼留言