-
Notifications
You must be signed in to change notification settings - Fork 9
2....a)Hardware:④HTA
Although FBTA and PATA provide high-performance solution to DMM, when the number of MAUs is raised to more than 512, their consumption of area will be increased notably because of the large width of BVs array in HLS and high parallel computation between them. Therefore, FBTA and PATA perform quite well for coarse-grained allocation, such as the allocation of arrays, for which designers can define MAU with large size, for example, 1 KB, so that there would not be too many MAUs. However, the solution based on large MAU is not suitable for fine-grained allocation because it will lead to serious memory fragmentation issues resulting in wasted memory. To overcome this limitation, Hybrid Tree Allocator (HTA), provided by Hi-DMM, adopts the basic hierarchical idea based on group tree from SysAlloc to store the information of buddy tree, which can save a lot of area. However, HTA does not rely on the search based on AND-gate tree so that it takes less than 20 cycles to accomplish the allocation request for thousands of MAUs, instead of suffering the long allocation latency (hundreds of cycles) of SysAlloc.
The hybrid tree consists of two different kinds of layers, trunk layers and branch layers, as shown in the figure below. The depths of trunk and branch parts of a hybrid tree are and respectively. Based on the hybrid tree structure, HTA can manage MAUs. { The trunk part of hybrid tree is described in the same way as FBTA but leaves in the trunk part, e.g. the node 4,5,6 and 7 in Fig.\ref{hta}, are mapped to a corresponding group tree in branch layer, indicating whether a large memory block is available. As for the branch layers, group trees are used to manage fine-grained allocation and each of their leaves is mapped to a MAU. To locate an available memory node with small size efficiently, these \emph{group trees} are described by two different kinds of BVs, layer BV and \emph{group trees} BV. Each branch layer has its own layer BV, and each bit of layer BV accounts for the status of a specific layer of a \emph{group tree}, indicating whether the \emph{group tree} has available nodes in the specified branch layer. {For example, in Fig.\ref{hta}, both of the first two layer BVs (01...) indicate that the first \emph{group tree} has available node in the layers while the second \emph{group tree} does not. In contrast, the third layer BV (00...) indicate that both two \emph{group trees} have available nodes in the third layer. Each \emph{group tree} in the branch part can be described within a \emph{group tree} BV, where each bit indicates the status of a node of the \emph{group tree}.}} Therefore, if the tree in trunk layer has leaves, there will be \emph{group trees} in branch part and the width of each layer BV will be . Similarly, the width of \emph{group tree} BV should be .
The requests of allocation are classified into coarse-grained and fine-grained according to their sizes. A request, size of which is greater than
\textbf{Maintenance Stage of HTA:} The maintenance for trunk part and the \emph{group tree} BV of branch part can be implemented according to the definition of OR-gate tree. Note that the maintenance after allocation of a node in the buddy tree is known during compilation. Therefore, Hi-DMM maintains the \emph{group tree} BV by applying an OR operation with a mask BV to it. These mask BVs can be grouped into a look-up table so that the maintenance of \emph{group tree} BVs requires lower latency. For example, if node 14 is allocated, then nodes 7, 28 and 29 should be marked as 1 and the corresponding mask BV is 0011011. As for the layer BVs, when the request is coarse-grained, all of them are only involved in marking downwards and the operation applied on the BV of leaves in trunk layer can be passed to these layer BV directly. However, when the request is fine-grained, each layer BV for branch layer needs to be handled according to the status of the \emph{group tree}. Since only the bits for the target \emph{group tree} in layer BVs are involved in maintenance, the computation overhead is low. Moreover, since the layer BVs are independent of each other, HTA can update them concurrently.