中断描述符表(IDT)中的两种描述符(好吧,共三种,忽略第三种):interrupt gate和trap gate。中断描述符是: irqno(异常0-31,中断IRQ+32) => segment and offset of interrupt/exception handler的映射。这两种描述符的区别在于:对于interrupt gate,CPU(硬件)跳到对应segment时会自动clear IF标志,即自动mask中断(即忽略PIC发来的信号),注意这是硬件行为,而trap gate不会。注意:Linux uses interrupt gates to handle interrupts and trap gates to handle exceptions.
当我们讨论中断(非异常)时,总是interrupt gate,即进入handler时,中断总是被mask的。那么,是否所有的handler都有这个必要呢?不是的。根据需要:必须mask中断的,注册handler时要带上SA_INTERRUPT标志,否则不带。触发时,发现没有带SA_INTERRUPT标志,就把中断unmask了(sti指令)。 引用Understanding the Linux Kernel 3rd Editin,一个handler要执行一些ISR(多个device可能共用一个IRQ一个handler,每个device一个ISR,所以有多个),就调用handle_IRQ_event(),这个函数:
但其他限制,bottom-haf和top-half是一样的:cannot sleep, cannot access user space, and cannot invoke the scheduler.
Linux没有明确要求driver开发者如何划分这两者,但开发者应该只把时间敏感的、硬件相关的、不能被另一个interrupt中断的事情放在top-half中,其他的都放在bottom-half。典型情况是:top-half copy data from/to hardware, 数据的处理由bottom-half完成。网卡中断就是如此。
需要澄清:bottom-half这个词是有歧义的,一个意思就是上面说的“中断后半段”;另一个意思是实现“中断后半段”的一种机制。Linux Device Drivers, Second Edition中使用bottom-haf表示“中断后半段”;使用简写BH表示实现bottom-half的机制。实现bottom-half的机制有两种,除了BH,还有tasklet;现在tasklet更流行。
Tasklets of the same type are always serialized: in other words, the same type of tasklet cannot be executed by two CPUs at the same time. However, tasklets of different types can be executed concurrently on several CPUs. Serializing the tasklet simplifies the life of device driver developers, because the tasklet function needs not be reentrant.
从前,有32个软件中断向量(不是32个异常),每个向量分配给每个设备驱动程序或相关任务。现在驱动程序已经和softirq分离,但仍然使用softirq:通过中间API(像tasklet和timer)进行访问,而不是直接调用。在当前的内核中,定义了十个softirq向量; 两个用于tasklet处理,两个用于网络(the source of the softirq mechanism and its most important application),两个用于块层,两个用于计时器,一个用于调度程序,一个用于读-复制-更新(RCU)处理。对应mpstat -I SCPU的十列:
HI/s(HI_SOFTIRQ): high priority tasklets;
TIMER/s(TIMER_SOFTIRQ):
NET_TX/s(NET_TX_SOFTIRQ): send operations in networks;
NET_RX/s(NET_RX_SOFTIRQ): receive operations in networks;
BLOCK/s(BLOCK_SOFTIRQ): used by the block layer to implement asynchronous request completions (libaio的completion?);
IRQ_POLL/s
TASKLET/s(TASKLET_SOFTIRQ): regular tasklets;
SCHED/s(SCHED_SOFTIRQ): used by the scheduler to implement periodic load balancing on SMP systems;
HRTIMER/s(HRTIMER_SOFTIRQ): required when high-resolution timers are enabled;
RCU/s
实现方式:The kernel maintains a per-CPU bitmask indicating which softirqs need processing at any given time. So, for example, when a kernel subsystem calls tasklet_schedule(), the TASKLET_SOFTIRQ bit is set on the corresponding CPU and, when softirqs are processed, the tasklet will be run. There are two places where software interrupts can “fire” and preempt the current thread. One of them is at the end of the processing for a hardware interrupt; it is common for interrupt handlers to raise softirqs, so it makes sense (for latency and optimal cache use) to process them as soon as hardware interrupts can be re-enabled (硬件interrupt handler经常发起tasklet, 它是一种softirq,所以interrupt handler结束就处理softirq). The other possibility is anytime that kernel code re-enables softirq processing (via a call to functions like local_bh_enable() or spin_unlock_bh()).
注意:不是说softirq都是通过这些内核线程处理的,而是当正常流程处理不完的时候,才把过剩的交给它们:these processes exist to offload softirq processing when the load gets too heavy. If the regular, inline softirq processing code loops ten times and still finds more softirqs to process (because they continue to be raised), it will wake the appropriate ksoftirqd process (there is one per CPU) and exit; that process will eventually be scheduled and pick up running softirq handlers. Ksoftirqd will also be poked if a softirq is raised outside of (hardware or software) interrupt context; that is necessary because, otherwise, an arbitrary amount of time might pass before softirqs are processed again.