This section covers first the mechanisms provided by the 386 for handling system calls, and then shows how Linux uses those mechanisms. This is not a reference to the individual system calls: There are very many of them, new ones are added occasionally, and they are documented in man pages that should be on your Linux system.
The 386 recognizes two event classes: exceptions and interrupts. Both cause a forced context switch to new a procedure or task. Interrupts can occur at unexpected times during the execution of a program and are used to respond to signals from hardware. Exceptions are caused by the execution of instructions.
Two sources of interrupts are recognized by the 386: Maskable interrupts and Nonmaskable interrupts. Two sources of exceptions are recognized by the 386: Processor detected exceptions and programmed exceptions.
Each interrupt or exception has a number, which is referred to by the 386 literature as the vector. The NMI interrupt and the processor detected exceptions have been assigned vectors in the range 0 through 31, inclusive. The vectors for maskable interrupts are determined by the hardware. External interrupt controllers put the vector on the bus during the interrupt-acknowledge cycle. Any vector in the range 32 through 255, inclusive, can be used for maskable interrupts or programmed exceptions. Here is a listing of all the possible interrupts and exceptions:
0 | divide error |
---|---|
1 | debug exception |
2 | NMI interrupt |
3 | Breakpoint |
4 | INTO-detected Overflow |
5 | BOUND range exceeded |
6 | Invalid opcode |
7 | coprocessor not available |
8 | double fault |
9 | coprocessor segment overrun |
10 | invalid task state segment |
11 | segment not present |
12 | stack fault |
13 | general protection |
14 | page fault |
15 | reserved |
16 | coprocessor error |
17-31 | reserved |
32-255 | maskable interrupts |
The priority of simultaneous interrupts and exceptions is:
HIGHEST | Faults except debug faults |
---|---|
. | Trap instructions INTO, INT n, INT 3 |
. | Debug traps for this instruction |
. | Debug traps for next instruction |
. | NMI interrupt |
LOWEST | INTR interrupt |
Under Linux the execution of a system call is invoked by a maskable interrupt or exception class transfer, caused by the instruction int 0x80. We use vector 0x80 to transfer control to the kernel. This interrupt vector is initialized during system startup, along with other important vectors like the system clock vector.
iBCS2 requries an lcall 0,7 instruction, which Linux can send to the iBCS2 compatibility module appropriate if an iBCS2-compliant binary is being executed. In fact, Linux will assume that an iBCS2-compliant binary is being executed if an lcall 0,7 call is executed, and will automatically switch modes.
As of version 0.99.2 of Linux, there are 116 system calls. Documentation for these can be found in the man (2) pages. When a user invokes a system call, execution flow is as follows:
For example, the setuid system call is coded as
_syscall1(int,setuid,uid_t,uid);
which will expand to:
_setuid: subl $4,%exp pushl %ebx movzwl 12(%esp),%eax movl %eax,4(%esp) movl $23,%eax movl 4(%esp),%ebx int $0x80 movl %eax,%edx testl %edx,%edx jge L2 negl %edx movl %edx,_errno movl $-1,%eax popl %ebx addl $4,%esp ret L2: movl %edx,%eax popl %ebx addl $4,%esp retThe macro definition for the syscallX() macros can be found in /usr/include/linux/unistd.h, and the user-space system call library code can be found in /usr/src/libc/syscall/
Actual code for system_call entry point can be found in /usr/src/linux/kernel/sys_call.S Actual code for many of the system calls can be found in /usr/src/linux/kernel/sys.c, and the rest are found elsewhere. find is your friend.
The startup_32() code found in /usr/src/linux/boot/head.S starts everything off by calling setup_idt(). This routine sets up an IDT (Interrupt Descriptor Table) with 256 entries. No interrupt entry points are actually loaded by this routine, as that is done only after paging has been enabled and the kernel has been moved to 0xC0000000. An IDT has 256 entries, each 4 bytes long, for a total of 1024 bytes. When start_kernel() (found in /usr/src/linux/init/main.c) is called it invokes trap_init() (found in /usr/src/linux/kernel/traps.c). trap_init() sets up the IDT via the macro set_trap_gate() (found in /usr/include/asm/system.h). trap_init() initializes the interrupt descriptor table as shown here:
0 | divide_error |
---|---|
1 | debug |
2 | nmi |
3 | int3 |
4 | overflow |
5 | bounds |
6 | invalid_op |
7 | device_not_available |
8 | double_fault |
9 | coprocessor_segment_overrun |
10 | invalid_TSS |
11 | segment_not_present |
12 | stack_segment |
13 | general_protection |
14 | page_fault |
15 | reserved |
16 | coprocessor_error |
17 | alignment_check |
18-48 | reserved |
Copyright (C) 1993, 1996 Michael K. Johnson, [email protected].
Copyright (C) 1993 Stanley Scalsky