grub-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH 6/6] [RFC] docs: document mm


From: Daniel Axtens
Subject: [PATCH 6/6] [RFC] docs: document mm
Date: Thu, 25 Nov 2021 02:22:50 +1100

This is a sketch of much more detailed descriptions of how grub mm
works. I expect we'll want to bikeshed it a bit so I will roll up
change requests into subsequent series I send that touch mm.

Signed-off-by: Daniel Axtens <dja@axtens.net>
---
 docs/grub-dev.texi | 405 ++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 399 insertions(+), 6 deletions(-)

diff --git a/docs/grub-dev.texi b/docs/grub-dev.texi
index fb2cc965ed80..2b3b21afe020 100644
--- a/docs/grub-dev.texi
+++ b/docs/grub-dev.texi
@@ -80,8 +80,7 @@ This edition documents version @value{VERSION}.
 * Updating External Code::
 * Porting::
 * Error Handling::
-* Stack and heap size::
-* BIOS port memory map::
+* Memory Management::
 * Video Subsystem::
 * PFF2 Font File Format::
 * Graphical Menu Software Design::
@@ -1025,8 +1024,403 @@ if (grub_errno != GRUB_ERR_NONE)
 grub_error_pop ();
 @end example
 
-@node Stack and heap size
-@chapter Stack and heap size
+@node Memory Management
+@chapter Memory Management
+
+Grub supports both stack and heap memory.
+
+@section Heap management overview
+
+Grub memory management boils down to cells, blocks and regions.
+
+The most fundamental concept in grub mm data structures is that of the
+`cell'. The cell is the atomic unit of memory in grub's memory management. A
+cell is @code{GRUB_MM_ALIGN} bytes in size, which is 16 bytes on a 32-bit
+platform and 32 bytes on a 64-bit platform.
+
+All metadata structures are exactly one cell in size and are cell-aligned.
+
+A block represents a block of memory that is either:
+@itemize
+
+@item an allocation: memory that has been allocated by another part of
+      grub and is being used, or
+
+@item free: memory from which an allocation can be made.
+
+@end itemize
+
+All blocks are cell-aligned and all blocks are a whole number of cells in size.
+
+A region is an area of memory from firmware which contains one or more blocks.
+
+For more details, see the comment at the top of @code{kern/mm.c}.
+
+@subsection Where does heap memory come from?
+
+Heap memory comes from platform-specific code: the platform's memory map must 
be
+consulted to find free memory regions. Depending on the platform, these regions
+may need to be `claimed' in some way to protect them from being used by
+firmware. Firmware may provide grub with a range of differently sized and
+dis-contiguous regions.
+
+A region is defined by a @code{grub_mm_region_t} structure. This metadata
+structure is stored in the first cell of each region.
+
+Region metadata consists of a pointer to a free block to speed up allocations
+(@code{first}), a pointer to the next region (@code{next}) if one exists, the
+number of bytes lost at the beginning to retain alignment (@code{pre_size}) and
+the total size of the allocatable space in bytes (@code{size}).
+
+Regions are tracked as a linked list, with the head being @code{grub_mm_base}.
+
+Regions are created by platform-specific code. Once platform-specific code has
+performed any necessary initialisation, it calls @code{grub_mm_init_region ()}
+on the memory to make it available as a new region. Then it can used by the 
grub
+memory management code to service allocation requests.
+
+If firmware passes a non-cell-aligned address to @code{grub_mm_init_region},
+then the address will be rounded up to cell-alignment, and the number of
+`wasted' bytes stored in @code{pre_size}. Likewise, if the size after aligning
+is not a multiple of the cell size, those bytes beyond the end of the last cell
+cannot be used. (These bytes are not recorded.)
+
+When a region is added, grub will prefer to extend an existing region, if it 
can
+do so. This allows for larger individual allocations to be satisfied. Currently
+the code only supports merging when the new allocation is directly below an
+existing region, that is, @code{new address + new size + previous region
+pre_size = previous region base}.
+
+If a region cannot be merged into an existing region, a new region is created
+and marked as containing a single free block. Regions are added in sorted 
order,
+smallest first, but the region merging process does not respect the sorted 
order
+and so the linked list may not remain in sorted order.
+
+@subsection Blocks: Free blocks and allocations
+
+Metadata for a block (either an allocation or a free block) is stored in a
+@code{grub_mm_header_t} structure.
+
+The type of a block is set by the @code{magic} value. Its @code{size} is
+measured in cells and includes the header cell. If it is a free block, the
+@code{next} pointer points to the next free block in the circular linked
+list. If the block is allocated, the @code{next} pointer is undefined.
+
+Metadata for a block is always stored in the cell immediately preceding the
+block data.
+
+A region starts as one big free block.
+
+@subsection Example
+
+Say firmware provides you with one region running from 0xffff0 (1MB - 16 bytes)
+to 0x10ffff0 (17MB - 16 bytes). Say you're on a 64-bit platform, so cell size 
is
+32 bytes (0x20). The resultant layout is:
+
+@example
+0x00ffff0 +- Start of region as seen by firmware
+          |   ("pre-space", considered in region merging)
+0x0100000 +- grub_mm_region_t
+          |  first free block: 0x100020
+          |  next region: NULL
+          |  pre_size (wasted bytes to ensure alignment): 0x10
+          |  size: 0xffffa0 (bytes, rounded to cells, not including our cell)
+          |   (The last 16 bytes are wasted as there is not enough
+          |    to form a cell. These are not recorded.)
+          |
+0x0100020 +- grub_mm_header_t
+          |  next free block: 0x100020 (circular linked list)
+          |  size: 0x7ffff cells (region size in cells)
+          |  magic: GRUB_MM_FREE_MAGIC
+          |  <padding>
+0x0100040 |  --
+          |    free space belonging to the block
+          |
+0x10fffe0 +- End of region as seen by grub
+          |   (lost space)
+0x10ffff0 +- End of region as seen by firmware
+@end example
+
+Now say you want to allocate 1MB of memory with 1MB alignment. You would call
+something like @code{grub_memalign (1024 * 1024, 1024 * 1024)}. This would then
+call @code{grub_real_malloc ()} with our region. That function is well
+documented: refer to the comments there for precisely how it operates.
+
+In this case, @code{grub_real_malloc ()} will satisfy the allocation, splitting
+the free block into 3 and allocating as high an address as possible:
+
+@example
+0x0100000 +- grub_mm_region_t
+          |  first free block: 0x0100020
+          |  next region: NULL
+          |  pre_size (wasted bytes to ensure alignment): 0x10
+          |  size: 0xffffa0 (bytes, rounded to cells, not including our cell)
+          |
+0x0100020 +- grub_mm_header_t
+          |  next free block: 0x1000000
+          |  size: 0x6ffff cells
+          |  magic: GRUB_MM_FREE_MAGIC
+          |  <padding>
+0x0100040 |  --
+          |    free space belonging to the block
+          |
+0x0efffe0 +- grub_mm_header_t
+          |  next free block: undefined
+          |  size: 0x8001 cells (1MB + header cell)
+          |  magic: GRUB_MM_ALLOC_MAGIC
+          |  <padding>
+0x0f00000 |  --
+          |    allocated space
+          |
+0x1000000 +- grub_mm_header_t
+          |  next free block: 0x0100020
+          |  size: 0x7fff cells
+          |  magic: GRUB_MM_FREE_MAGIC
+          |  <padding>
+0x1000020 |  --
+          |    free space belonging to block
+          |
+0x10fffe0 +- End of region as seen by grub
+@end example
+
+In this case the free pointer in the region will not be updated. In some cases
+it is updated on allocations: for example when the region it points to is no
+longer free, or for small allocations.
+
+Now we make a small allocation of a few bytes. This will be rounded up to two
+cells - one for the data and one for the header. @code{grub_real_malloc ()} 
will
+iterate over the linked list of free block using a pair of pointers 
representing
+the previous and current free block, @code{prev} and @code{cur}
+respectively. @code{prev} starts at the region's @code{*first} pointer, and
+@code{cur} starts as @code{*first->next}.
+
+This will be the result:
+
+@example
+0x0100000 +- grub_mm_region_t
+          |  first free block: 0x0100020
+          |  next region: NULL
+          |  pre_size (wasted bytes to ensure alignment): 0x10
+          |  size: 0xffffa0 (bytes)
+          |
+0x0100020 +- grub_mm_header_t
+          |  next free block: 0x1000000
+          |  size: 0x6ffff cells
+          |  magic: GRUB_MM_FREE_MAGIC
+0x0100040 |  --
+          |    free space belonging to the block
+          |
+0x0efffe0 +- grub_mm_header_t
+          |  size: 0x8001 cells (1MB + header cell)
+          |  magic: GRUB_MM_ALLOC_MAGIC
+0x0f00000 |  --
+          |    allocated space
+          |
+0x1000000 +- grub_mm_header_t
+          |  next free block: 0x0100020
+          |  size: 0x7ffd cells
+          |  magic: GRUB_MM_FREE_MAGIC
+0x1000020 |  --
+          |    free space belonging to block
+          |
+0x10fffa0 +- grub_mm_header_t
+          |  size: 0x2 cells (32 bytes + header cell)
+          |  magic: GRUB_MM_ALLOC_MAGIC
+0x10fffc0 |  --
+          |    allocated space
+          |
+0x10fffe0 +- End of region as seen by grub
+@end example
+
+In this case, we do hit the code block at the end of @code{grub_real_malloc}
+that would move @code{first}, but because we never advanced @code{prev},
+@code{*first} remains unchanged.
+
+So a subsequent allocation will again be made from the second free block if
+possible - for example if we were to allocate 64kB, the result would be the
+following:
+
+@example
+0x0100000 +- grub_mm_region_t
+          |  first free block: 0x0100020
+          |  next region: NULL
+          |  pre_size (wasted bytes to ensure alignment): 0x10
+          |  size: 0xffffa0 (bytes)
+          |
+0x0100020 +- grub_mm_header_t
+          |  next free block: 0x1000000
+          |  size: 0x6ffff cells
+          |  magic: GRUB_MM_FREE_MAGIC
+0x0100040 |  --
+          |    free space belonging to the block
+          |
+0x0efffe0 +- grub_mm_header_t
+          |  size: 0x8001 cells (1MB + header cell)
+          |  magic: GRUB_MM_ALLOC_MAGIC
+0x0f00000 |  --
+          |    allocated space
+          |
+0x1000000 +- grub_mm_header_t
+          |  next free block: 0x0100020
+          |  size: 0x77fc cells
+          |  magic: GRUB_MM_FREE_MAGIC
+0x1000020 |  --
+          |    free space belonging to block
+          |
+0x10eff80 +- grub_mm_header_t
+          |  size: 0x801 cells (64kB + header cell)
+          |  magic: GRUB_MM_ALLOC_MAGIC
+0x10effa0 |  --
+          |    allocated space
+          |
+0x10fffa0 +- grub_mm_header_t
+          |  size: 0x2 cells (32 bytes + header cell)
+          |  magic: GRUB_MM_ALLOC_MAGIC
+0x10fffc0 |  --
+          |    allocated space
+          |
+0x10fffe0 +- End of region as seen by grub
+@end example
+
+@subsection Freeing an allocation
+
+To free an allocation, the block's magic is flipped to
+@code{GRUB_MM_FREE_MAGIC}, it's placed into the free list, and merged with any
+adjacent free blocks.
+
+There are two things to be aware of.
+
+Firstly, the free-list ordering goes `backwards', from high addresses to low
+addresses. That is, except in the case where the list wraps around,
+@code{free_block > free_block->next}.
+
+Secondly, the region's @code{first} pointer is updated to the block preceding
+the block that has just been freed. This means (because the first pointer 
points
+to the region @emph{before} the region that will be used to service an
+allocation) that the next allocation will try to use the newly freed block.
+
+To illustrate, say we free our 32 byte allocation:
+
+@example
+0x0100000 +- grub_mm_region_t
+          |  first free block: 0x0100020
+          |  next region: NULL
+          |  pre_size (wasted bytes to ensure alignment): 0x10
+          |  size: 0xffffa0 (bytes)
+          |
+0x0100020 +- grub_mm_header_t
+          |  next free block: 0x10fffa0
+          |  size: 0x6ffff cells
+          |  magic: GRUB_MM_FREE_MAGIC
+0x0100040 |  --
+          |    free space belonging to the block
+          |
+0x0efffe0 +- grub_mm_header_t
+          |  size: 0x8001 cells (1MB + header cell)
+          |  magic: GRUB_MM_ALLOC_MAGIC
+0x0f00000 |  --
+          |    allocated space
+          |
+0x1000000 +- grub_mm_header_t
+          |  next free block: 0x0100020
+          |  size: 0x77fc cells
+          |  magic: GRUB_MM_FREE_MAGIC
+0x1000020 |  --
+          |    free space belonging to block
+          |
+0x10eff80 +- grub_mm_header_t
+          |  size: 0x801 cells (64kB + header cell)
+          |  magic: GRUB_MM_ALLOC_MAGIC
+0x10effa0 |  --
+          |    allocated space
+          |
+0x10fffa0 +- grub_mm_header_t
+          |  next free block: 0x1000000
+          |  size: 0x2 cells (32 bytes + header cell)
+          |  magic: GRUB_MM_FREE_MAGIC
+0x10fffc0 |  --
+          |    free space belonging to block
+          |
+0x10fffe0 +- End of region as seen by grub
+@end example
+
+The final block is now freed; it is inserted in the ring such that each 
region's
+next is at a lower address that the previous free region, modulo wrapping. The
+region's @code{first} is set such that @code{*first->next} is the block we just
+freed.
+
+@subsection Memory pressure and large allocations
+
+The largest allocation that can be serviced depends on the platform and heap
+fragmentation.
+
+If an allocation cannot be serviced, grub will attempt to invalidate disk 
caches
+to free up memory, and try again.
+
+Grub does not implement virtual memory or paging: there is no way for it to
+stitch together various dis-contiguous pages from firmware into a contiguous
+address space.
+
+Grub always searches regions in order: if you are trying to allocate, for
+example, a number of 64kB objects and your region list starts with small 
regions
+with numerous free blocks that are all too small to satisfy the allocation,
+@code{grub_memalign ()} will have to check each of those regions every time
+before finally getting to a region that can satisfy the allocation.
+
+@subsection Summary: the life-cycle of an allocation from the heap
+
+@itemize @bullet
+
+@item A heap region is added in grub initialisation by calling
+      @code{grub_mm_init_region ()}. The region is marked as a single free
+      block. It is inserted into the list of regions.
+
+@item A call to @code{grub_malloc ()} is made, seeking a certain amount of
+      memory.
+
+@item This is translated into a call to @code{grub_memalign ()} with no
+      alignment requirement.
+
+@item For each region from first to last:
+
+  @itemize
+
+  @item Call @code{grub_real_malloc ()} with that region to try to satisfy the
+        allocation.
+
+  @item Starting from the region's @code{*first->next}, which is usually the
+        most recently freed block, try each block in the free ring.
+
+  @item If it is large enough to satisfy the size and alignment requirements,
+        use it. Carve out an appropriate block, mark it as allocated, and 
adjust
+        the size and shape of the free space as required.
+
+  @item If we found a block, and we allocated less than 32kB, or we allocated
+        the entire free block, mark the previous block as the first free block.
+
+  @item Return the block we found. If we cannot find an appropriate block,
+        return NULL.
+
+@end itemize
+
+@item The caller uses their block of memory.
+
+@item A call to @code{grub_free ()} is made to release the memory back to
+      grub. The block is reinserted into the free list, merging with adjacent
+      free blocks if possible. The region's @code{first} pointer is set such
+      that @code{*first->next} is the most recently freed region.
+
+@end itemize
+
+@section Another memory user: the relocator
+
+The grub mm code is not the only piece of code that uses memory from
+firmware. Grub also a `relocator' infrastructure that can be used to find
+appropriate places in memory to lay out boot material. This code talks to the
+platform specific code to iterate over available memory from firmware, but
+rather than adding it to the heap as a region, the relocator uses it directly.
+
+@section Stack and heap sizes
 
 On emu stack and heap are just normal host OS stack and heap. Stack is 
typically
 8 MiB although it's OS-dependent.
@@ -1088,8 +1482,7 @@ In short:
 @end multitable
 
 
-@node BIOS port memory map
-@chapter BIOS port memory map
+@section BIOS port memory map
 @c By Yoshinori K Okuji
 
 @multitable @columnfractions .15 .25 .5
-- 
2.32.0




reply via email to

[Prev in Thread] Current Thread [Next in Thread]