【Linux 内核设计的艺术】从开机加电到执行 main 函数之前的过程-CFANZ编程社区

本笔记依据 《Linux 内核设计的艺术》新设计团队著机械工业出版社 以及本人平时的笔记积累而书写，Linux0.11。

文章目录

1.1 启动 BIOS，准备实模式下的中断向量表和中断服务程序

1.1.1 BIOS 的启动原理
1.1.2 BIOS 在内存中加载中断向量表和中断服务程序

1.2. 加载操作系统内核程序并为保护模式做准备

1.2.1 加载第一部分代码——引导程序（bootsect）
1.2.2 加载第二部分代码 —— setup
1.2.3 加载第三部分代码——system 模块

1.3 开始向 32 位模式转变，为 main 函数的调用做准备

1.3.1 关中断并将 system 移动到内存地址起始位置 0x00000
1.3.2 设置中断描述符表和全局描述符表
1.3.3 打开 A20，实现 32 位寻址
1.3.4 为保护模式下执行 head.s 做准备
1.3.5 head.s 开始执行

从按下开机键开始，直到 main 函数开始执行的过程中，计算机做了哪些事情¹？

BIOS 运行阶段。系统上电后，第一条机器指令的物理地址为 0xffff0，正好位于 BIOS 占用的物理段内，对系统中的硬件进行初始化，创建实模式下的中断向量表和中断服务程序（在 1 MB 物理内存的前 1KB 物理地址空间内初始化中断向量表，在最后 256KB 物理地址空间内保存中断处理程序），最后将 cpu 在第 2 阶段运行的可执行文件加 bootsect.bin 载到物理内存（0x7c00 开始）。并跳转运行可执行文件 bootsect.bin。
引导任务阶段。从启动盘加载操作系统到内存，因为引导任务的可执行文件 bootsect.bin 的大小不能超过 512B ，故本阶段主要工作是：将 cpu 在第 3 阶段运行的可执行文件 kernel.bin 加载到物理内存，开启分段机制（进入保护模式）跳转运行可执行文件 kernel.bin。
内核运行阶段。

实模式：是 Intel 80286 和之后的 80x86 兼容的 CPU 的操作模式。实模式的特性是一个 20 位的存储器地址空间（即 1MB 的存储器可被寻址），可以直接通过软件的方式访问 BIOS 以及周边硬件，没有硬件支持的分页机制和实时多任务的概念。——《Linux 内核设计的艺术》

1.1 启动 BIOS，准备实模式下的中断向量表和中断服务程序

需要明白的是，计算机的运行需要程序的帮助，而程序一般存储在内存中，CPU 只能运行内存中的程序，而当计算机开始运行的时候，内存中空空如也，因为我们需要运行的程序（操作系统）此刻还不在内存中，它静静地放在软盘或者硬盘上。如果要运行软盘或者硬盘上的操作系统，我们必须将它们加载到内存中。那么问题来了，控制核心 CPU 此刻只能眼巴巴地看着空空如也的内存，自己啥都干不了，谁来将操作系统加载到内存，好让 CPU 忙起来呢？—— BIOS。

1.1.1 BIOS 的启动原理

在上电的一瞬间，CPU 在傻等（因为在等内存中的操作系统），内存中空空如也（因为在等待 BIOS 加载操作系统进来），没了指挥中枢 CPU 的 BIOS，谁来执行 BIOS 中的程序？—— 硬件自动加载。

我们知道 BIOS² 是一个固化到内存的 ROM 程序，其中的内容都是固定的，而我们将从 BIOS 程序中的某个位置开始执行（BIOS 程序位于内存的末端，其中包含了中断处理程序等信息），这个位置是 0xFFFF0。

来自《Linux 内核设计的艺术》的解释。Intel 80x86 系列的 CPU 可以分别在 16 位实模式和 32 位保护模式下运行。在启动阶段，CPU 硬件被设计为加电即进入 16 位实模式状态运行。
CPU 硬件逻辑设计为加电瞬间强行将 CPU 的值置为 0xFFFF，IP 的值设置为 0x0000，意味着 CS:IP=0xFFFF0，此范围正好是 BIOS 的地址位置（原来虽然内存中没有东西，CPU 无法发挥作用，但是 ROM 中有东西，还是可以发挥余热的？）。

如果这个位置没有代码呢？那么计算机就无法启动了。

1.1.2 BIOS 在内存中加载中断向量表和中断服务程序

BIOS 会显示显卡信息、内存信息…，BIOS 也会在内存中建立中断向量表和中断服务程序。BIOS 在内存开始位置会建立中断向量表（4B*256），在 1MB 内存末尾，会建立 256 个中断向量对应的中断服务程序，每个中断向量都会指向一个具体的中断服务程序。
【Linux 内核设计的艺术】从开机加电到执行 main 函数之前的过程_linux

BIOS 在内存中的位置

也就说，CPU 从 BIOS ROM 中的某个特殊的位置（0xFFFF0）处的代码开始执行，这个位置的代码，就是我们操作系统或者计算机的火种！从此，这个机器就开始轰轰轰地运行起来了。

1.2. 加载操作系统内核程序并为保护模式做准备

我们从现在开始，就要一步一步加载我们的操作系统了，计算机将分为三批次加载操作系统的代码。

1.2.1 加载第一部分代码——引导程序（bootsect）

bootsect.s 全部代码 Linux0.11

!
! SYS_SIZE is the number of clicks (16 bytes) to be loaded.
! 0x3000 is 0x30000 bytes = 196kB, more than enough for current
! versions of linux
!
SYSSIZE = 0x3000
!
! bootsect.s    (C) 1991 Linus Torvalds
!
! bootsect.s is loaded at 0x7c00 by the bios-startup routines, and moves
! iself out of the way to address 0x90000, and jumps there.
!
! It then loads 'setup' directly after itself (0x90200), and the system
! at 0x10000, using BIOS interrupts. 
!
! NOTE! currently system is at most 8*65536 bytes long. This should be no
! problem, even in the future. I want to keep it simple. This 512 kB
! kernel size should be enough, especially as this doesn't contain the
! buffer cache as in minix
!
! The loader has been made as simple as possible, and continuos
! read errors will result in a unbreakable loop. Reboot by hand. It
! loads pretty fast by getting whole sectors at a time whenever possible.

.globl begtext, begdata, begbss, endtext, enddata, endbss
.text
begtext:
.data
begdata:
.bss
begbss:
.text

SETUPLEN = 4        ! nr of setup-sectors
BOOTSEG  = 0x07c0     ! original address of boot-sector
INITSEG  = 0x9000     ! we move boot here - out of the way
SETUPSEG = 0x9020     ! setup starts here
SYSSEG   = 0x1000     ! system loaded at 0x10000 (65536).
ENDSEG   = SYSSEG + SYSSIZE   ! where to stop loading

! ROOT_DEV: 0x000 - same type of floppy as boot.
!   0x301 - first partition on first drive etc
ROOT_DEV = 0x306

entry start
start:
  mov ax,#BOOTSEG
  mov ds,ax
  mov ax,#INITSEG
  mov es,ax
  mov cx,#256
  sub si,si
  sub di,di
  rep
  movw
  jmpi  go,INITSEG
go: mov ax,cs
  mov ds,ax
  mov es,ax
! put stack at 0x9ff00.
  mov ss,ax
  mov sp,#0xFF00    ! arbitrary value >>512

! load the setup-sectors directly after the bootblock.
! Note that 'es' is already set up.

load_setup:
  mov dx,#0x0000    ! drive 0, head 0
  mov cx,#0x0002    ! sector 2, track 0
  mov bx,#0x0200    ! address = 512, in INITSEG
  mov ax,#0x0200+SETUPLEN ! service 2, nr of sectors
  int 0x13      ! read it
  jnc ok_load_setup   ! ok - continue
  mov dx,#0x0000
  mov ax,#0x0000    ! reset the diskette
  int 0x13
  j load_setup

ok_load_setup:

! Get disk drive parameters, specifically nr of sectors/track

  mov dl,#0x00
  mov ax,#0x0800    ! AH=8 is get drive parameters
  int 0x13
  mov ch,#0x00
  seg cs
  mov sectors,cx
  mov ax,#INITSEG
  mov es,ax

! Print some inane message

  mov ah,#0x03    ! read cursor pos
  xor bh,bh
  int 0x10
  
  mov cx,#24
  mov bx,#0x0007    ! page 0, attribute 7 (normal)
  mov bp,#msg1
  mov ax,#0x1301    ! write string, move cursor
  int 0x10

! ok, we've written the message, now
! we want to load the system (at 0x10000)

  mov ax,#SYSSEG
  mov es,ax   ! segment of 0x010000
  call  read_it
  call  kill_motor

! After that we check which root-device to use. If the device is
! defined (!= 0), nothing is done and the given device is used.
! Otherwise, either /dev/PS0 (2,28) or /dev/at0 (2,8), depending
! on the number of sectors that the BIOS reports currently.

  seg cs
  mov ax,root_dev
  cmp ax,#0
  jne root_defined
  seg cs
  mov bx,sectors
  mov ax,#0x0208    ! /dev/ps0 - 1.2Mb
  cmp bx,#15
  je  root_defined
  mov ax,#0x021c    ! /dev/PS0 - 1.44Mb
  cmp bx,#18
  je  root_defined
undef_root:
  jmp undef_root
root_defined:
  seg cs
  mov root_dev,ax

! after that (everyting loaded), we jump to
! the setup-routine loaded directly after
! the bootblock:

  jmpi  0,SETUPSEG

! This routine loads the system at address 0x10000, making sure
! no 64kB boundaries are crossed. We try to load it as fast as
! possible, loading whole tracks whenever we can.
!
! in: es - starting address segment (normally 0x1000)
!
sread:  .word 1+SETUPLEN  ! sectors read of current track
head: .word 0     ! current head
track:  .word 0     ! current track

read_it:
  mov ax,es
  test ax,#0x0fff
die:  jne die     ! es must be at 64kB boundary
  xor bx,bx   ! bx is starting address within segment
rp_read:
  mov ax,es
  cmp ax,#ENDSEG    ! have we loaded all yet?
  jb ok1_read
  ret
ok1_read:
  seg cs
  mov ax,sectors
  sub ax,sread
  mov cx,ax
  shl cx,#9
  add cx,bx
  jnc ok2_read
  je ok2_read
  xor ax,ax
  sub ax,bx
  shr ax,#9
ok2_read:
  call read_track
  mov cx,ax
  add ax,sread
  seg cs
  cmp ax,sectors
  jne ok3_read
  mov ax,#1
  sub ax,head
  jne ok4_read
  inc track
ok4_read:
  mov head,ax
  xor ax,ax
ok3_read:
  mov sread,ax
  shl cx,#9
  add bx,cx
  jnc rp_read
  mov ax,es
  add ax,#0x1000
  mov es,ax
  xor bx,bx
  jmp rp_read

read_track:
  push ax
  push bx
  push cx
  push dx
  mov dx,track
  mov cx,sread
  inc cx
  mov ch,dl
  mov dx,head
  mov dh,dl
  mov dl,#0
  and dx,#0x0100
  mov ah,#2
  int 0x13
  jc bad_rt
  pop dx
  pop cx
  pop bx
  pop ax
  ret
bad_rt: mov ax,#0
  mov dx,#0
  int 0x13
  pop dx
  pop cx
  pop bx
  pop ax
  jmp read_track

/*
 * This procedure turns off the floppy drive motor, so
 * that we enter the kernel in a known state, and
 * don't have to worry about it later.
 */
kill_motor:
  push dx
  mov dx,#0x3f2
  mov al,#0
  outb
  pop dx
  ret

sectors:
  .word 0

msg1:
  .byte 13,10
  .ascii "Loading system ..."
  .byte 13,10,13,10

.org 508
root_dev:
  .word ROOT_DEV
boot_flag:
  .word 0xAA55

.text
endtext:
.data
enddata:
.bss
endbss:

我们在开机时，可以在中途切换到 BIOS 画面，在里面可以设置我们的启动盘，一般来讲都是硬盘启动。
经过一系列 BIOS 代码之后，计算机完成了自检操作（判断系统有几块硬盘？…），计算机硬件体系结构的设计与 BIOS 联手操作，CPU 会接收到一个 int 0x19 中断，首先寻找在内存中的中断向量表，取出响应中断服务程序的入口地址，然后去该地址执行中断服务程序——启动加载服务程序的入口地址。这个中断服务程序的作用：将硬盘第一个扇区（512B）中的程序（bootsect）加载到内存中的指定位置，这个中断服务程序的功能是 BIOS 提前设计好的，代码是固定的。

中断向量表：实模式中断机制的重要组成部分，表中记录所有中断号对应的中断服务程序的内存地址。
中断服务程序：通过中断向量表的索引对中断进行响应服务，是一些具有特定功能的程序。

那么，引导程序 bootsect 被加载到内存哪里了？—— 0x7c00
综上，也就是说，将第一扇区中的内容（bootsect）加载到了内存的指定位置，第一扇区又称为启动扇区，这段程序的载入，意味着操作系统要开始大显神威了？No，还需要将操作系统的其它代码载入内存才可以，而将操作系统中的其它代码载入内存的工作，就交由 bootsect 来做了。

两个约定：对于操作系统，设计者必须把最开始执行的程序放在启动扇区，其余的程序可以依照操作系统的设计顺序加载在后续的扇区中。对于 BIOS，0xFFFF0 位置就应该是启动程序，目的是为了从硬盘的启动扇区加载引导程序，必须将程序加载到内存的 0x7C00 处，忽视启动扇区中的内容，不管其中是什么，不管是什么操作系统，只负责加载。

1.2.2 加载第二部分代码 —— setup

规划内存
移动 bootsect
加载 setup

这个阶段之前，BIOS 已经使用 int 0x19 中断，将引导程序（bootsect）装入内存了，而 bootsect 的作用就是把第二批和第三批程序加载到内存中。
首先，需要规划内存。
在实模式状态下，寻址的最大范围是 1MB，为了规划内存，bootsect.s 定义了如下变量：

boot/bootsect.s 内存规划变量

SETUPLEN = 4        ! setup 程序的扇区数
BOOTSEG  = 0x07c0     ! 启动扇区被 BIOS 加载的位置
INITSEG  = 0x9000     ! 启动扇区将被移动到的新位置，启动扇区又进行了重定向
SETUPSEG = 0x9020     ! setup 被加载到的位置
SYSSEG   = 0x1000     ! 内核被加载的位置
ENDSEG   = SYSSEG + SYSSIZE   ! 内核的末尾位置

然后，在内存中移动 bootsect 程序。
将 bootsect 自身从 0x07c00（BOOTSEG）移动到 0x90000（INITSEG）位置处。

boot/bootsect.s

mov ax,#BOOTSEG
mov ds,ax
mov ax,#INITSEG
mov es,ax
mov cx,#256
sub si,si
sub di,di
rep
movw

在这次复制过程中，ds(0x07C0):si(0x0000)构成了源地址 0x07C00，es(0x9000):di(0x0000)构成目的地址 0x90000，mov cx，#256 这一行循环控制量，提供了需要复制的“字”数（一个字为两个字节），256 个字正好是 512 字节，即第一扇区的字节数。

从一开始的约定，bootsect 程序被加载到 0x07C00 处，到现在 bootsect 程序又被拷贝到 0x90000 处，说明操作系统开始根据自己的需要安排内存了。

boot/bootsect.s

rep
  movw
  jmpi  go,INITSEG
go: mov ax,cs
  mov ds,ax

CS:IP 指向 go:mov ax, cs这一行，程序从这一行开始往下执行。从现在起，操作系统不再需要完全依赖 BIOS，可以按照自己的意志把代码安排在内存中的某个位置了。

bootsect.s

jmpi  go,INITSEG # CS:INITSEG, IP:go
go: mov ax,cs

执行指令的过程就是 CS 和 IP 不断变化的过程。执行到jmpi go,INITSEG 这行之前，代码的作用就是复制代码自身，执行这行代码之后，程序就跳转到 0x90000 开始执行了。这两行代码巧妙实现了“到新位置后接着原来的执行序继续执行下去”的目的。

由于 bootsect 复制到了新的地方，并且在新的地方开始执行，我们还需要更改下代码中的各个段，就比如前面利用jmpi改变了 CS 的值 ，我们还需要改变 DS（数据段寄存器）、ES（附加段寄存器）、SS（栈基址寄存器）和 SP（栈顶指针）。

bootsect.s

go: mov ax,cs
  mov ds,ax
  mov es,ax
! put stack at 0x9ff00.
  mov ss,ax
  mov sp,#0xFF00    ! arbitrary value >>512

【Linux 内核设计的艺术】从开机加电到执行 main 函数之前的过程_服务程序_02

图中标明了压栈方向，是由高地址到低地址的方向。即使 SS:SP 指向的位置为 0x9FF00，这与 setup 程序的起始位置 0x90200 还有距离，即便 setup 加载进来后，系统仍然有足够的空间用来执行数据压栈操作。

栈表示 stack，特指在 C 语言程序运行时结构中，以"后进先出"机制运行的内存空间；堆表示 heap，特指用 C 语言库函数 malloc 创建、free 释放的动态内存空间。

最后，我们完成加载第二部分代码的最后一步——将 Setup 程序加载到内存中。

从硬盘上加载 setup 这个程序，我们需要借助 BIOS 提供的 int 13 中断向量所指向的中断服务程序来完成。

该书讲述了 int 0x19 和 int 0x13 两种中断的不同：
中断服务程序的发起者不同。int 0x19：BIOS，int 0x13：Linux 启动代码 bootsect 执行。
二者均是从硬盘加载代码，但是加载到内存的位置不一样。int 0x19：只负责把第一扇区的代码加载到 0x07c00 位置，int 0x13：按照设计者意图，指定扇区的代码加载到内存的指定的位置。

执行 int 0x13 中断之前，需要先将指定的扇区和加载的内存位置等信息传递给服务程序中的某些寄存器。

bootsect.s

load_setup:
  mov dx,#0x0000    ! drive 0, head 0
  mov cx,#0x0002    ! sector 2, track 0
  mov bx,#0x0200    ! address = 512, in INITSEG
  mov ax,#0x0200+SETUPLEN ! service 2, nr of sectors
  int 0x13      ! read it
  jnc ok_load_setup   ! ok - continue
  mov dx,#0x0000
  mov ax,#0x0000    ! reset the diskette
  int 0x13
  j load_setup
ok_load_setup:

系统给 BIOS 中断服务程序传参是通过几个通用寄存器实现的。

参数传递完毕，执行 int 0x13 指令，产生 0x13 中断，通过中断向量表找到这个中断服务程序，然后执行。将 setup 程序放置到紧接着 bootsect 程序的后面，到此，操作系统已经加载了 5 个扇区的代码了。

1.2.3 加载第三部分代码——system 模块

至此，第二批代码已经加载入内存，现在要加载第三批代码。使用实模式下提供的 int 0x13 中断。整个过程与加载 setup 没有区别，只是这次需要加载 240 个扇区数（120 KB），加载到 SYSSEG（0x10000）处往后 120KB 空间中，由于加载时间较长，以免用户误以为计算机故障，Linus 在这个阶段设置了一行屏幕信息 “Loading system …”，由于这个时候还处于引导阶段，main 函数还没有开始执行，所以这一行字符，完全是使用汇编语言写上去的，如果感兴趣，可以参考下面的程序，展示了这段程序是如何被显示器显示出来的，调用了系统调用 int 0x10 来实现。

bootsect.s

mov ah,#0x03    ! read cursor pos
xor bh,bh
int 0x10
mov cx,#24
mov bx,#0x0007    ! page 0, attribute 7 (normal)
mov bp,#msg1
mov ax,#0x1301    ! write string, move cursor
int 0x10
msg1:
  .byte 13,10
  .ascii "Loading system ..."
  .byte 13,10,13,10

当第三部分代码加载完毕后，整个操作系统的代码就已经全部载入内存，bootsect 的使命结束了，当然在 bootsect.s 中，它还做了点别的事情，确定根设备号，将根设备号保存在 root_dev 中，作为机器系统数据之一。

Linux 0.11 使用 Minix 操作系统的文件系统管理方式。Linux 0.11 没有提供在设备上建立文件系统的工具，必须在一个正在运行的系统上利用工具做出一个文件系统并加载至本机。故 Linux 0.11 的启动需要两部分数据，即系统内核镜像和根文件系统（配套的文件系统格式的设备）。

在 bootsect.s 中 ，jmpi 0,SETUPSEG 这行语句跳转至 0x90200 处，即 setup 程序 。CS:IP 通过这条指令，指向了 setup 程序的第一条指令，意味着由 setup 程序接着 bootsect 程序继续执行。

setup 开始执行。

setup.s 全部代码

!
! setup.s   (C) 1991 Linus Torvalds
!
! setup.s is responsible for getting the system data from the BIOS,
! and putting them into the appropriate places in system memory.
! both setup.s and system has been loaded by the bootblock.
!
! This code asks the bios for memory/disk/other parameters, and
! puts them in a "safe" place: 0x90000-0x901FF, ie where the
! boot-block used to be. It is then up to the protected mode
! system to read them from there before the area is overwritten
! for buffer-blocks.
!

! NOTE! These had better be the same as in bootsect.s!

INITSEG  = 0x9000 ! we move boot here - out of the way
SYSSEG   = 0x1000 ! system loaded at 0x10000 (65536).
SETUPSEG = 0x9020 ! this is the current segment

.globl begtext, begdata, begbss, endtext, enddata, endbss
.text
begtext:
.data
begdata:
.bss
begbss:
.text

entry start
start:

! ok, the read went well so we get current cursor position and save it for
! posterity.

  mov ax,#INITSEG ! this is done in bootsect already, but...
  mov ds,ax
  mov ah,#0x03  ! read cursor pos
  xor bh,bh
  int 0x10    ! save it in known place, con_init fetches
  mov [0],dx    ! it from 0x90000.

! Get memory size (extended mem, kB)

  mov ah,#0x88
  int 0x15
  mov [2],ax

! Get video-card data:

  mov ah,#0x0f
  int 0x10
  mov [4],bx    ! bh = display page
  mov [6],ax    ! al = video mode, ah = window width

! check for EGA/VGA and some config parameters

  mov ah,#0x12
  mov bl,#0x10
  int 0x10
  mov [8],ax
  mov [10],bx
  mov [12],cx

! Get hd0 data

  mov ax,#0x0000
  mov ds,ax
  lds si,[4*0x41]
  mov ax,#INITSEG
  mov es,ax
  mov di,#0x0080
  mov cx,#0x10
  rep
  movsb

! Get hd1 data

  mov ax,#0x0000
  mov ds,ax
  lds si,[4*0x46]
  mov ax,#INITSEG
  mov es,ax
  mov di,#0x0090
  mov cx,#0x10
  rep
  movsb

! Check that there IS a hd1 :-)

  mov ax,#0x01500
  mov dl,#0x81
  int 0x13
  jc  no_disk1
  cmp ah,#3
  je  is_disk1
no_disk1:
  mov ax,#INITSEG
  mov es,ax
  mov di,#0x0090
  mov cx,#0x10
  mov ax,#0x00
  rep
  stosb
is_disk1:

! now we want to move to protected mode ...

  cli     ! no interrupts allowed !

! first we move the system to it's rightful place

  mov ax,#0x0000
  cld     ! 'direction'=0, movs moves forward
do_move:
  mov es,ax   ! destination segment
  add ax,#0x1000
  cmp ax,#0x9000
  jz  end_move
  mov ds,ax   ! source segment
  sub di,di
  sub si,si
  mov   cx,#0x8000
  rep
  movsw
  jmp do_move

! then we load the segment descriptors

end_move:
  mov ax,#SETUPSEG  ! right, forgot this at first. didn't work :-)
  mov ds,ax
  lidt  idt_48    ! load idt with 0,0
  lgdt  gdt_48    ! load gdt with whatever appropriate

! that was painless, now we enable A20

  call  empty_8042
  mov al,#0xD1    ! command write
  out #0x64,al
  call  empty_8042
  mov al,#0xDF    ! A20 on
  out #0x60,al
  call  empty_8042

! well, that went ok, I hope. Now we have to reprogram the interrupts :-(
! we put them right after the intel-reserved hardware interrupts, at
! int 0x20-0x2F. There they won't mess up anything. Sadly IBM really
! messed this up with the original PC, and they haven't been able to
! rectify it afterwards. Thus the bios puts interrupts at 0x08-0x0f,
! which is used for the internal hardware interrupts as well. We just
! have to reprogram the 8259's, and it isn't fun.

  mov al,#0x11    ! initialization sequence
  out #0x20,al    ! send it to 8259A-1
  .word 0x00eb,0x00eb   ! jmp $+2, jmp $+2
  out #0xA0,al    ! and to 8259A-2
  .word 0x00eb,0x00eb
  mov al,#0x20    ! start of hardware int's (0x20)
  out #0x21,al
  .word 0x00eb,0x00eb
  mov al,#0x28    ! start of hardware int's 2 (0x28)
  out #0xA1,al
  .word 0x00eb,0x00eb
  mov al,#0x04    ! 8259-1 is master
  out #0x21,al
  .word 0x00eb,0x00eb
  mov al,#0x02    ! 8259-2 is slave
  out #0xA1,al
  .word 0x00eb,0x00eb
  mov al,#0x01    ! 8086 mode for both
  out #0x21,al
  .word 0x00eb,0x00eb
  out #0xA1,al
  .word 0x00eb,0x00eb
  mov al,#0xFF    ! mask off all interrupts for now
  out #0x21,al
  .word 0x00eb,0x00eb
  out #0xA1,al

! well, that certainly wasn't fun :-(. Hopefully it works, and we don't
! need no steenking BIOS anyway (except for the initial loading :-).
! The BIOS-routine wants lots of unnecessary data, and it's less
! "interesting" anyway. This is how REAL programmers do it.
!
! Well, now's the time to actually move into protected mode. To make
! things as simple as possible, we do no register set-up or anything,
! we let the gnu-compiled 32-bit programs do that. We just jump to
! absolute address 0x00000, in 32-bit protected mode.

  mov ax,#0x0001  ! protected mode (PE) bit
  lmsw  ax    ! This is it!
  jmpi  0,8   ! jmp offset 0 of segment 8 (cs)

! This routine checks that the keyboard command queue is empty
! No timeout is used - if this hangs there is something wrong with
! the machine, and we probably couldn't proceed anyway.
empty_8042:
  .word 0x00eb,0x00eb
  in  al,#0x64  ! 8042 status port
  test  al,#2   ! is input buffer full?
  jnz empty_8042  ! yes - loop
  ret

gdt:
  .word 0,0,0,0   ! dummy

  .word 0x07FF    ! 8Mb - limit=2047 (2048*4096=8Mb)
  .word 0x0000    ! base address=0
  .word 0x9A00    ! code read/exec
  .word 0x00C0    ! granularity=4096, 386

  .word 0x07FF    ! 8Mb - limit=2047 (2048*4096=8Mb)
  .word 0x0000    ! base address=0
  .word 0x9200    ! data read/write
  .word 0x00C0    ! granularity=4096, 386

idt_48:
  .word 0     ! idt limit=0
  .word 0,0     ! idt base=0L

gdt_48:
  .word 0x800   ! gdt limit=2048, 256 GDT entries
  .word 512+gdt,0x9 ! gdt base = 0X9xxxx
  
.text
endtext:
.data
enddata:
.bss
endbss:

setup 中，利用 BIOS 提供的中断服务程序从设备上提取内核运行所需的机器系统数据，其中包括光标位置和显示页面等数据，并且获取硬盘参数表 1 和硬盘参数表 2，将它们存放在内存中的特定位置上，这些机器数据被加载到内存 0x90000 开始的位置上，即覆盖了 bootsect 程序所在的部分区域。
为什么说是部分区域呢？因为这些数据大小是 510 字节，而 bootsect 的大小是 512 字节，需要加载的数据刚好占用仅一个扇区的位置，而 bootsect 又占用一个扇区，启动扇区的使命刚刚完毕，它的地址空间就被 setup 所加载的数据所覆盖，内存使用率极高。

1.3 开始向 32 位模式转变，为 main 函数的调用做准备

准备工作：

打开 32 位的寻址空间。
打开保护模式。
建立保护模式下的中断响应机制等与保护模式配套的工作。
建立内存的分页机制。
做好调用 main 函数的准备。

1.3.1 关中断并将 system 移动到内存地址起始位置 0x00000

0xffff0 0x7c00 1. 0x90000 2. 0x90200 3. 0x10000 0x00000 start ROM BIOS bootsect.s setup.s system-head.s

关中断。

关中断就是将 CPU 标志寄存器（EFLAGS）中的中断允许标志（IF）置 0，意味着此刻系统不能中断打扰。
cli —— 关闭中断，sti—— 开启中断。
开中断与关中断两种指令配合使用，可以形成一个保护区间，保护这个区间内的代码不受到外部事件的打扰。

将 system 移动到内存起始位置。

setup.s

do_move:
  mov es,ax   ! destination segment
  add ax,#0x1000
  cmp ax,#0x9000
  jz  end_move
  mov ds,ax   ! source segment
  sub di,di
  sub si,si
  mov   cx,#0x8000
  rep
  movsw
  jmp do_move

! then we load the segment descriptors

end_move:

内存起始位置本来存放着 BIOS 建立的中断向量表以及 BIOS 的数据区，将 system 模块移动到内存起始位置，导致数据被覆盖。在新的终端服务体系构建完毕之前，操作系统不具备处理中断的能力。

废除 BIOS 的中断向量表，即废除了 BIOS 提供的实模式下的中断服务程序。
收回寿命结束的程序所占空间。
内核代码占据有利位置，为后续分页机制建立提供便利。

废除了 16 位中断机制，准备新建立 32 位中断机制。

1.3.2 设置中断描述符表和全局描述符表

GDT（全局描述符表）：它是系统中唯一存放段寄存器内容的数组，配合程序进行保护模式下的段寻址。它在操作系统的进程切换中具有重要意义，可以理解为所有进程的总目录表，其中存放着每一个任务局部描述符表（LDT）地址和任务状态段（TSS）地址，用于完成进程中各段的寻址、现场保护与现场恢复。

GDTR（GDT 基地址寄存器）：GDT 可以存放在内存的任何位置，当程序通过段寄存器引用一个段描述符时，需要取得 GDT 的入口，GDTR 所标识的即为此入口。在操作系统对 GDT 的初始化完成后，可以用 LGDT 指令将 GDT 基地址加载至 GDTR。

IDT（中断描述符表）：保存保护模式下所有中断服务程序的入口地址，类似于实模式下的中断向量表。

IDTR（IDT 基地址寄存器）：保存 IDT 的起始地址。

lidt  idt_48    ! load idt with 0,0
lgdt  gdt_48    ! load gdt with whatever appropriate
gdt:
  .word 0,0,0,0   ! dummy
  .word 0x07FF    ! 8Mb - limit=2047 (2048*4096=8Mb)
  .word 0x0000    ! base address=0
  .word 0x9A00    ! code read/exec
  .word 0x00C0    ! granularity=4096, 386
  .word 0x07FF    ! 8Mb - limit=2047 (2048*4096=8Mb)
  .word 0x0000    ! base address=0
  .word 0x9200    ! data read/write
  .word 0x00C0    ! granularity=4096, 386
idt_48:
  .word 0     ! idt limit=0
  .word 0,0     ! idt base=0L
gdt_48:
  .word 0x800   ! gdt limit=2048, 256 GDT entries
  .word 512+gdt,0x9 ! gdt base = 0X9xxxx

【Linux 内核设计的艺术】从开机加电到执行 main 函数之前的过程_加载_03
注意：

16 位中断机制与 32 位中断机制的不同：16 位的中断机制用的是中断向量表（位于内存始址），位置固定；而 32 位的中断机制，用的是中断描述符表 IDT，位置是不固定的。
内核此刻还未运行，没有进程在运行，故 GDT 表中仅有内核相关段。
IDT 表虽然已经设置了，但是没有内容，因为不需要有内容，中断已经关闭了。

1.3.3 打开 A20，实现 32 位寻址

为什么要打开 A20？（地址线 20 根 -> 32 根）

CPU 可以进行 32 位寻址，寻址空间 1MB -> 4GB，即使最大只能支持 16MB 物理内存。
在实模式下，如果想访问 0xFFFFF 以上的内存空间，CPU 将会回滚到内存地址起始处寻址，因为系统段寄存器 CS 的最大允许地址为 0xFFFF，指令指针 IP 最大允许地址为 0xFFFF，程序中可产生的实模式下的寻址范围为 0xFFFF0+0xFFFF=0x10FFEF，这意味着它能够比实际 0xFFFFF 寻址范围，多出 64KB，A20 的启动可以禁用“回滚”机制，使得能够寻址 1MB 以上的空间。

1.3.4 为保护模式下执行 head.s 做准备

建立保护模式下的中断机制，使得后续代码能够使用中断。

8259A 中断控制器：为 8085A 和 8086/8088 进行中断控制而设计的芯片，它是可以用程序控制的中断控制器。单个的 8259A 能管理 8 级向量优先级中断。在不增加其他电路的情况下，最多可以级联成 64 级的向量优先级中断。

在保护模式下，8259A 的 int 0x00 ~ int 0x1F 被 Intel 保留为内部中断，我们需要对 8259A 重新编程，将 int 0x00 ~ int 0x1F 中断功能重新定义，IRQ0x00 ~ IRQ 0x0F 对应的中断号重新分布，使它们对应的中断号：int 0x20 ~ 0x2F。

boot/setup.s

! well, that went ok, I hope. Now we have to reprogram the interrupts :-(
! we put them right after the intel-reserved hardware interrupts, at
! int 0x20-0x2F. There they won't mess up anything. Sadly IBM really
! messed this up with the original PC, and they haven't been able to
! rectify it afterwards. Thus the bios puts interrupts at 0x08-0x0f,
! which is used for the internal hardware interrupts as well. We just
! have to reprogram the 8259's, and it isn't fun.

mov al,#0x11    ! initialization sequence
out #0x20,al    ! send it to 8259A-1
.word 0x00eb,0x00eb   ! jmp $+2, jmp $+2
out #0xA0,al    ! and to 8259A-2
.word 0x00eb,0x00eb
mov al,#0x20    ! start of hardware int's (0x20)
out #0x21,al
.word 0x00eb,0x00eb
mov al,#0x28    ! start of hardware int's 2 (0x28)
out #0xA1,al
.word 0x00eb,0x00eb
mov al,#0x04    ! 8259-1 is master
out #0x21,al
.word 0x00eb,0x00eb
mov al,#0x02    ! 8259-2 is slave
out #0xA1,al
.word 0x00eb,0x00eb
mov al,#0x01    ! 8086 mode for both
out #0x21,al
.word 0x00eb,0x00eb
out #0xA1,al
.word 0x00eb,0x00eb
mov al,#0xFF    ! mask off all interrupts for now
out #0x21,al
.word 0x00eb,0x00eb
out #0xA1,al

进入保护模式的代码，CPU 进入保护模式后，最重要的特征就是需要结合 GDT 来决定下一步执行的程序。

mov ax,#0x0001  ! protected mode (PE) bit
lmsw  ax    ! This is it!
jmpi  0,8   ! jmp offset 0 of segment 8 (cs) 0：段内偏移，8（10 00）GDT 表的第一项，内核模式：保护模式下的段选择符（cs）

到目前为止，setup 执行完毕了，它为系统在保护模式下运行做了一些准备工作，其余的准备工作将在 head.s 中来完成。

1.3.5 head.s 开始执行

system 模块 = 内核程序 + head.s。先将 head.s 汇编成目标代码，将 C 语言编写的内核程序编译成目标代码，然后链接成 system 模块。在 system 模块中，head 程序在开头，所以起名叫 head 程序（25 KB + 184B），system 模块被 setup 程序拷贝到了内存起始位置，故 head.s 目前正处于 0x00000。
head 程序：对内核程序在内存中布局，用程序自身代码在程序自身所处的空间创建分页机制，在 0x00000 的位置创建了页目录表、页表、缓冲区、GDT、IDT，并将 head 程序已经执行过的代码所占的内存空间覆盖，意味着 head 程序要自己把自己所占的部分空间废弃。

head.s

/*
 *  linux/boot/head.s
 *
 *  (C) 1991  Linus Torvalds
 */

/*
 *  head.s contains the 32-bit startup code.
 *
 * NOTE!!! Startup happens at absolute address 0x00000000, which is also where
 * the page directory will exist. The startup code will be overwritten by
 * the page directory.
 */
.text
.globl _idt,_gdt,_pg_dir,_tmp_floppy_area
_pg_dir: # 标识内核分页机制完成后的内核起始位置，也就是物理内存起始位置，head 程序将会在这里建立页目录表，为分页机制做准备。
startup_32:
  movl $0x10,%eax
  mov %ax,%ds # 将下面的寄存器从实模式转变到保护模式
  mov %ax,%es
  mov %ax,%fs
  mov %ax,%gs
  lss _stack_start,%esp
  call setup_idt
  call setup_gdt
  movl $0x10,%eax   # reload all the segment registers
  mov %ax,%ds   # after changing gdt. CS was already
  mov %ax,%es   # reloaded in 'setup_gdt'
  mov %ax,%fs
  mov %ax,%gs
  lss _stack_start,%esp
  xorl %eax,%eax
1:  incl %eax   # check that A20 really IS enabled
  movl %eax,0x000000  # loop forever if it isn't
  cmpl %eax,0x100000
  je 1b
/*
 * NOTE! 486 should set bit 16, to check for write-protect in supervisor
 * mode. Then it would be unnecessary with the "verify_area()"-calls.
 * 486 users probably want to set the NE (#5) bit also, so as to use
 * int 16 for math errors.
 */
  movl %cr0,%eax    # check math chip
  andl $0x80000011,%eax # Save PG,PE,ET
/* "orl $0x10020,%eax" here for 486 might be good */
  orl $2,%eax   # set MP
  movl %eax,%cr0
  call check_x87
  jmp after_page_tables

/*
 * We depend on ET to be correct. This checks for 287/387.
 */
check_x87:
  fninit
  fstsw %ax
  cmpb $0,%al
  je 1f     /* no coprocessor: have to set bits */
  movl %cr0,%eax
  xorl $6,%eax    /* reset MP, set EM */
  movl %eax,%cr0
  ret
.align 2
1:  .byte 0xDB,0xE4   /* fsetpm for 287, ignored by 387 */
  ret

/*
 *  setup_idt
 *
 *  sets up a idt with 256 entries pointing to
 *  ignore_int, interrupt gates. It then loads
 *  idt. Everything that wants to install itself
 *  in the idt-table may do so themselves. Interrupts
 *  are enabled elsewhere, when we can be relatively
 *  sure everything is ok. This routine will be over-
 *  written by the page tables.
 */
setup_idt:
  lea ignore_int,%edx
  movl $0x00080000,%eax
  movw %dx,%ax    /* selector = 0x0008 = cs */
  movw $0x8E00,%dx  /* interrupt gate - dpl=0, present */

  lea _idt,%edi
  mov $256,%ecx
rp_sidt:
  movl %eax,(%edi)
  movl %edx,4(%edi)
  addl $8,%edi
  dec %ecx
  jne rp_sidt
  lidt idt_descr
  ret

/*
 *  setup_gdt
 *
 *  This routines sets up a new gdt and loads it.
 *  Only two entries are currently built, the same
 *  ones that were built in init.s. The routine
 *  is VERY complicated at two whole lines, so this
 *  rather long comment is certainly needed :-).
 *  This routine will beoverwritten by the page tables.
 */
setup_gdt:
  lgdt gdt_descr
  ret

/*
 * I put the kernel page tables right after the page directory,
 * using 4 of them to span 16 Mb of physical memory. People with
 * more than 16MB will have to expand this.
 */
.org 0x1000
pg0:

.org 0x2000
pg1:

.org 0x3000
pg2:

.org 0x4000
pg3:

.org 0x5000
/*
 * tmp_floppy_area is used by the floppy-driver when DMA cannot
 * reach to a buffer-block. It needs to be aligned, so that it isn't
 * on a 64kB border.
 */
_tmp_floppy_area:
  .fill 1024,1,0

after_page_tables:
  pushl $0    # These are the parameters to main :-)
  pushl $0
  pushl $0
  pushl $L6   # return address for main, if it decides to.
  pushl $_main
  jmp setup_paging
L6:
  jmp L6      # main should never return here, but
        # just in case, we know what happens.

/* This is the default interrupt "handler" :-) */
int_msg:
  .asciz "Unknown interrupt\n\r"
.align 2
ignore_int:
  pushl %eax
  pushl %ecx
  pushl %edx
  push %ds
  push %es
  push %fs
  movl $0x10,%eax
  mov %ax,%ds
  mov %ax,%es
  mov %ax,%fs
  pushl $int_msg
  call _printk
  popl %eax
  pop %fs
  pop %es
  pop %ds
  popl %edx
  popl %ecx
  popl %eax
  iret


/*
 * Setup_paging
 *
 * This routine sets up paging by setting the page bit
 * in cr0. The page tables are set up, identity-mapping
 * the first 16MB. The pager assumes that no illegal
 * addresses are produced (ie >4Mb on a 4Mb machine).
 *
 * NOTE! Although all physical memory should be identity
 * mapped by this routine, only the kernel page functions
 * use the >1Mb addresses directly. All "normal" functions
 * use just the lower 1Mb, or the local data space, which
 * will be mapped to some other place - mm keeps track of
 * that.
 *
 * For those with more memory than 16 Mb - tough luck. I've
 * not got it, why should you :-) The source is here. Change
 * it. (Seriously - it shouldn't be too difficult. Mostly
 * change some constants etc. I left it at 16Mb, as my machine
 * even cannot be extended past that (ok, but it was cheap :-)
 * I've tried to show which constants to change by having
 * some kind of marker at them (search for "16Mb"), but I
 * won't guarantee that's all :-( )
 */
.align 2
setup_paging:
  movl $1024*5,%ecx   /* 5 pages - pg_dir+4 page tables */
  xorl %eax,%eax
  xorl %edi,%edi      /* pg_dir is at 0x000 */
  cld;rep;stosl
  movl $pg0+7,_pg_dir   /* set present bit/user r/w */
  movl $pg1+7,_pg_dir+4   /*  --------- " " --------- */
  movl $pg2+7,_pg_dir+8   /*  --------- " " --------- */
  movl $pg3+7,_pg_dir+12    /*  --------- " " --------- */
  movl $pg3+4092,%edi
  movl $0xfff007,%eax   /*  16Mb - 4096 + 7 (r/w user,p) */
  std
1:  stosl     /* fill pages backwards - more efficient :-) */
  subl $0x1000,%eax
  jge 1b
  xorl %eax,%eax    /* pg_dir is at 0x0000 */
  movl %eax,%cr3    /* cr3 - page directory start */
  movl %cr0,%eax
  orl $0x80000000,%eax
  movl %eax,%cr0    /* set paging (PG) bit */
  ret     /* this also flushes prefetch-queue */

.align 2
.word 0
idt_descr:
  .word 256*8-1   # idt contains 256 entries
  .long _idt
.align 2
.word 0
gdt_descr:
  .word 256*8-1   # so does gdt (not that that's any
  .long _gdt    # magic number, but it works for me :^)

  .align 3
_idt: .fill 256,8,0   # idt is uninitialized

_gdt: .quad 0x0000000000000000  /* NULL descriptor */
  .quad 0x00c09a0000000fff  /* 16Mb */
  .quad 0x00c0920000000fff  /* 16Mb */
  .quad 0x0000000000000000  /* TEMPORARY - don't use */
  .fill 252,8,0     /* space for LDT's and TSS's etc */

为什么要废除原来的 GDT 而新建一个 GDT 呢？

原来的 GDT 在哪里？setup 中，bootsect 中的 GDT 呢? 在 setup 程序中被参数表给覆盖了。
setup 此刻的 GDT 为什么不能用？它所处于的区域为内存的缓冲区位置，会被覆盖。
setup 程序为什么不能直接把 GDT 拷贝到 head 中呢？1. setup 如果先拷贝 GDT，再移动 system 模块，它就会被覆盖掉（如果是移动到内存始址），就会被 system 模块给覆盖掉。2. 如果先移动 system 模块，后复制 GDT 内容，又会把 head.s 中的部分内容代码覆盖掉，影响 head 程序执行。

检测 A20 地址线是否开启，如果 A20 地址线没有打开，则计算机处于实模式下，只能寻址 1MB 内存空间，通过在 0x00000 写入一个数据，然后比较和 1MB 处数据是否一致，检验 A20 地址线是否打开。

xorl %eax,%eax
1:  incl %eax   # check that A20 really IS enabled
  movl %eax,0x000000  # loop forever if it isn't
  cmpl %eax,0x100000
  je 1b

在确定地址线已经打开之后，head 程序会检测数学协处理器是否存在，将其设置到保护模式工作状态。

x87 协处理器：为了弥补 x86 系列在进行浮点运算时的不足，是一个外置的、可选择的芯片。

movl %cr0,%eax    # check math chip
  andl $0x80000011,%eax # Save PG,PE,ET
/* "orl $0x10020,%eax" here for 486 might be good */
  orl $2,%eax   # set MP
  movl %eax,%cr0
  call check_x87
  jmp after_page_tables

然后，我们就来到了 head 程序执行的最后阶段。

after_page_tables:
  pushl $0    # These are the parameters to main :-)
  pushl $0
  pushl $0
  pushl $L6   # return address for main, if it decides to.
  pushl $_main
  jmp setup_paging
L6:
  jmp L6      # main should never return here, but
        # just in case, we know what happens.

将 L6 标号和 main 函数入口地址压栈，栈顶为 main 函数地址，目的是使 head 执行完毕后，通过 ret 指令直接执行 main 函数。

压栈完毕后，head 程序跳转到 setup_paging 执行，开始创建分页机制。

以下是分页核心代码。

清空 5 个页表中的内容（5KB）
填写页目录表中内容（4项，对应4个页表）
填写每个页表中的页表项（每个页表有 1K 个页表项）
将 CR3 指向页目录表
置 PG 位，开启分页机制

setup_paging:
  movl $1024*5,%ecx   /* 5 pages - pg_dir+4 page tables */
  xorl %eax,%eax
  xorl %edi,%edi      /* pg_dir is at 0x000 */
  cld;rep;stosl
  movl $pg0+7,_pg_dir   /* set present bit/user r/w */
  movl $pg1+7,_pg_dir+4   /*  --------- " " --------- */
  movl $pg2+7,_pg_dir+8   /*  --------- " " --------- */
  movl $pg3+7,_pg_dir+12    /*  --------- " " --------- */
  movl $pg3+4092,%edi
  movl $0xfff007,%eax   /*  16Mb - 4096 + 7 (r/w user,p) */
  std
1:  stosl     /* fill pages backwards - more efficient :-) */
  subl $0x1000,%eax
  jge 1b
  xorl %eax,%eax    /* pg_dir is at 0x0000 */
  movl %eax,%cr3    /* cr3 - page directory start */
  movl %cr0,%eax
  orl $0x80000000,%eax
  movl %eax,%cr0    /* set paging (PG) bit */
  ret     /* this also flushes prefetch-queue */

【Linux 内核设计的艺术】从开机加电到执行 main 函数之前的过程_服务程序_04
该函数最后返回了 ret 指令。该指令会将压入栈的内容弹出执行，也就是说，main 函数的地址被弹出来了，地址给了 eip，我们要开始执行 main 函数了。使用 ret 来巧妙地调用 main 函数，模仿了 call 操作

加载操作系统时，只有 BIOS 程序在运行，此时计算机处于 16 位实模式下，通过 BIOS 程序自身的代码形成 16 位的中断向量表以及相关的 16 位的中断服务程序。Linux 是一个 32 位的实时多任务的操作系统，main 函数需要执行的是 32 位的代码，只有先经过了前三个汇编程序的操作，形成 32 位操作系统的运行环境后，才能够再运行 main 函数。

Linux 内核——分段机制/2. 加载可执行文件 ↩︎
BIOS是英文"Basic Input Output System"的缩略词，直译过来后中文名称就是"基本输入输出系统"。BIOS是个人电脑启动时加载的第一个软件。其实，它是一组固化到计算机内主板上一个ROM芯片上的程序，它保存着计算机最重要的基本输入输出的程序、开机后自检程序和系统自启动程序，它可从CMOS中读写系统设置的具体信息。其主要功能是为计算机提供最底层的、最直接的硬件设置和控制。此外，BIOS还向作业系统提供一些系统参数。系统硬件的变化是由BIOS隐藏，程序使用BIOS功能而不是直接控制硬件。 —— 百度百科. https://baike.baidu.com/item/bios/91424?fr=aladdin↩︎