Skip to content

2....a)Hardware:④HTA

Tingyuan LIANG edited this page Aug 28, 2018 · 12 revisions

Motivation of HTA

Although FBTA and PATA provide high-performance solution to DMM, when the number of MAUs is raised to more than 512, their consumption of area will be increased notably because of the large width of BVs array in HLS and high parallel computation between them. Therefore, FBTA and PATA perform quite well for coarse-grained allocation, such as the allocation of arrays, for which designers can define MAU with large size, for example, 1 KB, so that there would not be too many MAUs. However, the solution based on large MAU is not suitable for fine-grained allocation because it will lead to serious memory fragmentation issues resulting in wasted memory. To overcome this limitation, Hybrid Tree Allocator (HTA), provided by Hi-DMM, adopts the basic hierarchical idea based on group tree from SysAlloc to store the information of buddy tree, which can save a lot of area. However, HTA does not rely on the search based on AND-gate tree so that it takes less than 20 cycles to accomplish the allocation request for thousands of MAUs, instead of suffering the long allocation latency (hundreds of cycles) of SysAlloc.


Structure of Hybrid Tree:

The hybrid tree consists of two different kinds of layers, trunk layers and branch layers, as shown in the figure below. The depths of trunk and branch parts of a hybrid tree are and respectively. Based on the hybrid tree structure, HTA can manage MAUs. { The trunk part of hybrid tree is described in the same way as FBTA but leaves in the trunk part, e.g. the node 4,5,6 and 7 in Fig.\ref{hta}, are mapped to a corresponding group tree in branch layer, indicating whether a large memory block is available. As for the branch layers, group trees are used to manage fine-grained allocation and each of their leaves is mapped to a MAU. To locate an available memory node with small size efficiently, these \emph{group trees} are described by two different kinds of BVs, layer BV and \emph{group trees} BV. Each branch layer has its own layer BV, and each bit of layer BV accounts for the status of a specific layer of a \emph{group tree}, indicating whether the \emph{group tree} has available nodes in the specified branch layer. {For example, in Fig.\ref{hta}, both of the first two layer BVs (01...) indicate that the first \emph{group tree} has available node in the layers while the second \emph{group tree} does not. In contrast, the third layer BV (00...) indicate that both two \emph{group trees} have available nodes in the third layer. Each \emph{group tree} in the branch part can be described within a \emph{group tree} BV, where each bit indicates the status of a node of the \emph{group tree}.}} Therefore, if the tree in trunk layer has leaves, there will be \emph{group trees} in branch part and the width of each layer BV will be . Similarly, the width of \emph{group tree} BV should be .

Allocation Stage of HTA:

The requests of allocation are classified into coarse-grained and fine-grained according to their sizes. A request, size of which is greater than ${2^{D_B-1}}$, is coarse-grained. The allocation for this request will only involve the trunk part of hybrid tree. The process of coarse-grained allocation will be the same as FBTA. In contrast, fine-grained allocation will involve \emph{group trees} in branch part of hybrid tree. According to the request size, HTA locates the target layer for allocation in branch part. Based on the corresponding layer BV, HTA can find which \emph{group tree} has space for the request by searching the first zero bit in the layer BV. Finally, techniques of FBTA will be applied on the located \emph{group tree} and the address of allocable block will be obtained. Since HTA conducts the search of first zero bit twice for fine-grained allocation, the latency of allocation stage will be about 2 times more than FBTA.

\textbf{Maintenance Stage of HTA:} The maintenance for trunk part and the \emph{group tree} BV of branch part can be implemented according to the definition of OR-gate tree. Note that the maintenance after allocation of a node in the buddy tree is known during compilation. Therefore, Hi-DMM maintains the \emph{group tree} BV by applying an OR operation with a mask BV to it. These mask BVs can be grouped into a look-up table so that the maintenance of \emph{group tree} BVs requires lower latency. For example, if node 14 is allocated, then nodes 7, 28 and 29 should be marked as 1 and the corresponding mask BV is 0011011. As for the layer BVs, when the request is coarse-grained, all of them are only involved in marking downwards and the operation applied on the BV of leaves in trunk layer can be passed to these layer BV directly. However, when the request is fine-grained, each layer BV for branch layer needs to be handled according to the status of the \emph{group tree}. Since only the bits for the target \emph{group tree} in layer BVs are involved in maintenance, the computation overhead is low. Moreover, since the layer BVs are independent of each other, HTA can update them concurrently.