数据结构对齐

Data structure alignment is the way data is arranged and accessed in computer memory. It consists of three separate but related issues: data alignment, data structure padding, and packing.

数据结构对齐是数据在计算机内存中排布和访问的方式。其由三个独立但相关的问题组成：数据对齐、数据结构填充和打包。

The CPU in modern computer hardware performs reads and writes to memory most efficiently when the data is naturally aligned, which generally means that the data’s memory address is a multiple of the data size. For instance, in a 32-bit architecture, the data may be aligned if the data is stored in four consecutive bytes and the first byte lies on a 4-byte boundary.

当内存自然对齐时，现代计算机硬件中的CPU对内存执行读写操作时效率最高，这通常意味着数据的内存地址是数据大小的倍数。例如，在32位架构中，如果数据存储于4个连续的字节中，而且第一个字节位于4字节边界上，则数据应该是对齐的。

Data alignment is the aligning of elements according to their natural alignment. To ensure natural alignment, it may be necessary to insert some padding between structure elements or after the last element of a structure. For example, on a 32-bit machine, a data structure containing a 16-bit value followed by a 32-bit value could have 16 bits of padding between the 16-bit value and the 32-bit value to align the 32-bit value on a 32-bit boundary. Alternatively, one can pack the structure, omitting the padding, which may lead to slower access, but uses three quarters as much memory.

数据对齐是根据元素的自然对齐来对齐元素。为了确保自然对齐，可能需要在结构体中元素间或最后一个元素后插入填充。例如，在32位机器上，一个数据结构包含一个16位值跟一个32位值则可以在16位值和32位值之间填充16位从而对齐32位值到32位边界上。或者，打包结构，省略填充，这可能导致访问速度变慢，但使用4分之3的内存。

Although data structure alignment is a fundamental issue for all modern computers, many computer languages and computer language implementations handle data alignment automatically. Fortran, Ada,[1][2] PL/I,[3] Pascal,[4] certain C and C++ implementations, D,[5] Rust,[6] C#,[7] and assembly language allow at least partial control of data structure padding, which may be useful in certain special circumstances.

尽管数据结构对齐是所有现代计算机的基础问题，许多计算机语言和计算机语言实现会自动处理数据对齐。 Fortran、Ada、PL/I、Pascal、某些 C 和 C++ 实现，D、Rust、C#、及汇编语言允许对数据结构填充的部分控制，这在一些特殊情况下可能非常有用。

定义

A memory address a is said to be n-byte aligned when a is a multiple of n bytes (where n is a power of 2). In this context, a byte is the smallest unit of memory access, i.e. each memory address specifies a different byte. An n-byte aligned address would have a minimum of log2(n) least-significant zeros when expressed in binary.

当 A 是 N 字节的倍数（其中 N 是2的幂）时，内存地址 A 被称为 N 字节对齐。在这种情况下，字节是内存访问的最小单元，即：每个内存地址指定一个不同的字节。当以二进制表示时，一个 N字节对齐的地址最少有 log2(N) 个最低有效零。例如32位时，最少有4个最低有效零（10进制:32 → 2进制:10000，32前边有0-31共32位）

The alternate wording b-bit aligned designates a b/8 byte aligned address (ex. 64-bit aligned is 8 bytes aligned).

或者换个说法 B 位对齐表示一个 B/8 字节对齐地址（例如：64位对齐是8字节对齐）。

A memory access is said to be aligned when the data being accessed is n bytes long and the datum address is n-byte aligned. When a memory access is not aligned, it is said to be misaligned. Note that by definition byte memory accesses are always aligned.

当被访问的数据是 N 字节且其数据地址为n字节对齐时，内存访问被称为是对齐的。当一个内存访问不是对齐的，则称其位未对齐。请注意，根据定义字节内存访问始终是对齐的。

A memory pointer that refers to primitive data that is n bytes long is said to be aligned if it is only allowed to contain addresses that are n-byte aligned, otherwise it is said to be unaligned. A memory pointer that refers to a data aggregate (a data structure or array) is aligned if (and only if) each primitive datum in the aggregate is aligned.

指向 N 字节长原始数据的内存指针，如果只允许包含 N 字节对齐的地址则称其为对齐的，否则称其为未对齐。当且仅当聚合中的每个原始数据对齐时，指向数据聚合（数据结构或数组）的内存指针才对齐。

Note that the definitions above assume that each primitive datum is a power of two bytes long. When this is not the case (as with 80-bit floating-point on x86) the context influences the conditions where the datum is considered aligned or not.

注意上面的定义假设每个原始数据都是两字节长的幂。如果不是这种情况（如80位浮点型在x86平台上）上下文会影响数据是否对齐的条件。

Data structures can be stored in memory on the stack with a static size known as bounded or on the heap with a dynamic size known as unbounded.

数据结构可以以静态的大小存储在内存中的栈上称之为有界或以动态大小存储在堆中称之为无界。

问题

The CPU accesses memory by a single memory word at a time. As long as the memory word size is at least as large as the largest primitive data type supported by the computer, aligned accesses will always access a single memory word. This may not be true for misaligned data accesses.

CPU一次访问一个字的内存。只要内存字大小至少与最大原始数据类型一样大，对齐访问将始终访问单个字节。对于未对齐的数据访问，情况可能并非如此。

If the highest and lowest bytes in a datum are not within the same memory word the computer must split the datum access into multiple memory accesses. This requires a lot of complex circuitry to generate the memory accesses and coordinate them. To handle the case where the memory words are in different memory pages the processor must either verify that both pages are present before executing the instruction or be able to handle a TLB miss or a page fault on any memory access during the instruction execution.

如果数据中的最高和最低位字节不再同一个内存字中，计算机必须将数据访问拆分多个内存访问。这需要大量复杂的电路来生成内存访问并协调它们。为了处理内存字位于不同内存页的情况，处理器必须要么在执行指令前验证两个页面都存在，要么能够在指令执行期间处理任何TLB未命中或页错误。

Some processor designs deliberately avoid introducing such complexity, and instead yield alternative behavior in the event of a misaligned memory access. For example, implementations of the ARM architecture prior to the ARMv6 ISA require mandatory aligned memory access for all multi-byte load and store instructions. Depending on which specific instruction was issued, the result of attempted misaligned access might be to round down the least significant bits of the offending address turning it into an aligned access (sometimes with additional caveats), or to throw an MMU exception (if MMU hardware is present), or to silently yield other potentially unpredictable results. The ARMv6 and later architectures support unaligned access in many circumstances, but not necessarily all.

一些处理器设计有意避免引入这种复杂性，而是在内存访问未对齐的情况下产生替代行为。例如，在 ARMv6 ISA 之前的 ARM 体系结构的实现要求对所有的多字节加载和存储指令进行强制的内存访问。根据发出的特定指令，尝试未对齐访问的结果可能是向下舍入违规地址的最低有效位，将其转化为对齐访问（有时还有额外的警告），或者抛出一个 MMU 异常（如果 MMU 硬件存在），或者默默的产生其他潜在的不可预测的结果。 ARMv6和更高版本的体系结构在许多情况下支持未对齐访问，但不一定在所有情况下都支持。

When a single memory word is accessed the operation is atomic, i.e. the whole memory word is read or written at once and other devices must wait until the read or write operation completes before they can access it. This may not be true for unaligned accesses to multiple memory words, e.g. the first word might be read by one device, both words written by another device and then the second word read by the first device so that the value read is neither the original value nor the updated value. Although such failures are rare, they can be very difficult to identify.

当访问单个内存字时操作是原子的，比如：一次读取或写入整个内存字，其他设备必须等到读取或写入操作完成才能访问它。对于多个内存字的未对齐访问，这就不一定了，比如：第一个字可能由一个设备读取，第二个字由另一个设备写入，然后第二个字再由第一个设备读取，因此第一个设备读取的值既不是原始值也不是更新值（一半一半）。尽管此类故障很少见，但很难鉴别。

数据结构填充

Although the compiler (or interpreter) normally allocates individual data items on aligned boundaries, data structures often have members with different alignment requirements. To maintain proper alignment the translator normally inserts additional unnamed data members so that each member is properly aligned. In addition, the data structure as a whole may be padded with a final unnamed member. This allows each member of an array of structures to be properly aligned.

尽管编译器（或解释器）通常在对齐边界上分配单个数据项，但数据结构通常包含具有不同对齐要求的成员。为了保持恰当的对齐，翻译器通常会插入额外的未命名数据成员以便每个成员正确对齐。此外，整个数据结构最终可能会只添加一个未命名成员。这允许正确对齐结果数组的每个成员。

Padding is only inserted when a structure member is followed by a member with a larger alignment requirement or at the end of the structure. By changing the ordering of members in a structure, it is possible to change the amount of padding required to maintain alignment. For example, if members are sorted by descending alignment requirements a minimal amount of padding is required. The minimal amount of padding required is always less than the largest alignment in the structure. Computing the maximum amount of padding required is more complicated, but is always less than the sum of the alignment requirements for all members minus twice the sum of the alignment requirements for the least aligned half of the structure members.

填充仅在一个结构体成员后跟一个具有较大对齐要求的成员或在结构体末尾时才插入填充。

一个结构体成员后跟一个具有较大对齐要求的成员

struct User {
	age int8
	// padding 8bit
	id  int16
}

在结构体末尾时

struct User {
	id  int16
	age int8
	// padding 8bit
}

通过改变结构体内成员顺序，可以改变保持对齐所需的填充量。例如，如果成员按降序排序对齐需求则需要最少的填充量。

进行填充，对齐边界

type User struct {
	age   int16       //<----|
	// padding 16 bit        | 64 bit boundary
	id    int32       //<----|

	point int16       //<----|
	// padding 16 bit        | 64 bit boundary
}

调整结构后甚至不需要进行填充（如果成员按降序排序对齐需求则需要最少的填充量）

type User struct {
	age   int16
	point int16
	id    int32
}

Although C and C do not allow the compiler to reorder structure members to save space, other languages might. It is also possible to tell most C and C compilers to "pack" the members of a structure to a certain level of alignment, e.g. "pack(2)" means align data members larger than a byte to a two-byte boundary so that any padding members are at most one byte long.

尽管 C 和 C++ 不允许编译器重排序结构体成员来节省内存，其他语言可能会这样做。但可以告诉大多数 C 和 C++ 将结构体成员 “打包” 到一定对齐级别。例如：“pack(2)” 表示将大于1字节的数据成员对齐到2字节边界，以便任何填充成员最多只有一字节长。

type User struct {
	age   int24
	// padding 8 bit
	point int16
	id    int32
}

One use for such "packed" structures is to conserve memory. For example, a structure containing a single byte and a four-byte integer would require three additional bytes of padding. A large array of such structures would use 37.5% less memory if they are packed, although accessing each structure might take longer. This compromise may be considered a form of space–time tradeoff.

这种打包结构的一种用途是节约内存。例如：一个包含1字节和4字节整数的机构体将需要3个额外的填充字节。如果这些结构被打包，一个包含此结构的大数组所用内存将减少37.5%，尽管访问每个结构体可能需要更长时间。这种折中，可以被认为是一种时间和空间权衡的形式。

Although use of "packed" structures is most frequently used to conserve memory space, it may also be used to format a data structure for transmission using a standard protocol. However, in this usage, care must also be taken to ensure that the values of the struct members are stored with the endianness required by the protocol (often network byte order), which may be different from the endianness used natively by the host machine.

尽管使用“packed”结构体最长用于节省内存空间，但它也可以用于格式化数据结构用来使用标准协议进行传输。但是在这种用法中，还必须注意确保结构成员的值以协议要求的字节序（通常是网络字节顺序）存储，这可能与主机本地使用的字节序不同。

计算填充量

The following formulas provide the number of padding bytes required to align the start of a data structure (where mod is the modulo operator):

以下公式提供对齐数据结构开头所需的填充字节数（其中 mod 是模运算符）：

padding = (align - (offset mod align)) mod align
aligned = offset + padding
        = offset + ((align - (offset mod align)) mod align)

For example, the padding to add to offset 0x59d for a 4-byte aligned structure is 3. The structure will then start at 0x5a0, which is a multiple of 4. However, when the alignment of offset is already equal to that of align, the second modulo in (align - (offset mod align)) mod align will return zero, therefore the original value is left unchanged.

例如：对于4字节对齐结构，要添加到偏移量 0x59d 的填充是3。

1437 % 32    # -> 29
             # or
0x59d % 0x20 # -> 29

当结构从 0x5a0 开始时，它是4的倍数。但是，当 offset 的对齐已经等于 align 的时候， (align - (offset mod align)) mod align 中的第二个模将返回零，因此原始值保持不变。

Since the alignment is by definition a power of two,[a] the modulo operation can be reduced to a bitwise boolean AND operation.

由于对齐定义为2的幂，模运算可以简化为按位布尔与运算。

The following formulas produce the aligned offset (where & is a bitwise AND and ~ a bitwise NOT):

以下公式产生对齐的偏移量（其中 & 是按位与和 ~ 按位非）：

padding = (align - (offset & (align - 1))) & (align - 1)
        = (-offset & (align - 1))
aligned = (offset + (align - 1)) & ~(align - 1)
        = (offset + (align - 1)) & -align

表示32的二进制位之前的位都是32的倍数，所以取模运算终究会除的一干二净，所以直接和32位二进制位后的位做与运算

0101 1001 1101 # 0x59d or 1437
&
0000 0001 1111 # 0x20 - 1 or 32 - 1
=
0000 0001 1101 # 0x1d or 29

参考

[原文] Data structure alignment