linux-insides-zh
  • 简介
  • 引导
    • 从引导加载程序内核
    • 在内核安装代码的第一步
    • 视频模式初始化和转换到保护模式
    • 过渡到 64 位模式
    • 内核解压缩
  • 初始化
    • 内核解压之后的首要步骤
    • 早期的中断和异常控制
    • 在到达内核入口之前最后的准备
    • 内核入口 - start_kernel
    • 体系架构初始化
    • 进一步初始化指定体系架构
    • 最后对指定体系架构初始化
    • 调度器初始化
    • RCU 初始化
    • 初始化结束
  • 中断
    • 中断和中断处理第一部分
    • 深入 Linux 内核中的中断
    • 初步中断处理
    • 中断处理
    • 异常处理的实现
    • 处理不可屏蔽中断
    • 深入外部硬件中断
    • IRQs的非早期初始化
    • Softirq, Tasklets and Workqueues
    • 最后一部分
  • 系统调用
    • 系统调用概念简介
    • Linux 内核如何处理系统调用
    • vsyscall and vDSO
    • Linux 内核如何运行程序
    • open 系统调用的实现
    • Linux 资源限制
  • 定时器和时钟管理
    • 简介
    • 时钟源框架简介
    • The tick broadcast framework and dyntick
    • 定时器介绍
    • Clockevents 框架简介
    • x86 相关的时钟源
    • Linux 内核中与时钟相关的系统调用
  • 同步原语
    • 自旋锁简介
    • 队列自旋锁
    • 信号量
    • 互斥锁
    • 读者/写者信号量
    • 顺序锁
    • RCU
    • Lockdep
  • 内存管理
    • 内存块
    • 固定映射地址和 ioremap
    • kmemcheck
  • 控制组
    • 控制组简介
  • 概念
    • 每个 CPU 的变量
    • CPU 掩码
    • initcall 机制
    • Linux 内核的通知链
  • Linux 内核中的数据结构
    • 双向链表
    • 基数树
    • 位数组
  • 理论
    • 分页
    • ELF 文件格式
    • 內联汇编
    • CPUID
    • MSR
Powered by GitBook
On this page
  • x86_64 related clock sources
  • High Precision Event Timer
  • ACPI PM timer
  • Time Stamp Counter
  • Conclusion
  • Links
  1. 定时器和时钟管理

x86 相关的时钟源

PreviousClockevents 框架简介NextLinux 内核中与时钟相关的系统调用

Last updated 1 year ago

x86_64 related clock sources

This is sixth part of the which describes timers and time management related stuff in the Linux kernel. In the previous we saw clockevents framework and now we will continue to dive into time management related stuff in the Linux kernel. This part will describe implementation of architecture related clock sources (more about clocksource concept you can read in the of this chapter).

First of all we must know what clock sources may be used at x86 architecture. It is easy to know from the or from content of the /sys/devices/system/clocksource/clocksource0/available_clocksource. The /sys/devices/system/clocksource/clocksourceN provides two special files to achieve this:

  • available_clocksource - provides information about available clock sources in the system;

  • current_clocksource - provides information about currently used clock source in the system.

So, let's look:

$ cat /sys/devices/system/clocksource/clocksource0/available_clocksource 
tsc hpet acpi_pm 

We can see that there are three registered clock sources in my system:

  • tsc - ;

  • hpet - ;

  • acpi_pm - .

Now let's look at the second file which provides best clock source (a clock source which has the best rating in the system):

$ cat /sys/devices/system/clocksource/clocksource0/current_clocksource 
tsc

For me it is . As we may know from the of this chapter, which describes internals of the clocksource framework in the Linux kernel, the best clock source in a system is a clock source with the best (highest) rating or in other words with the highest .

Frequency of the power management timer is 3.579545 MHz. Frequency of the is at least 10 MHz. And the frequency of the depends on processor. For example On older processors, the Time Stamp Counter was counting internal processor clock cycles. This means its frequency changed when the processor's frequency scaling changed. The situation has changed for newer processors. Newer processors have an invariant Time Stamp counter that increments at a constant rate in all operational states of processor. Actually we can get its frequency in the output of the /proc/cpuinfo. For example for the first processor in the system:

$ cat /proc/cpuinfo
...
model name	: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
...

And although Intel manual says that the frequency of the Time Stamp Counter, while constant, is not necessarily the maximum qualified frequency of the processor, or the frequency given in the brand string, anyway we may see that it will be much more than frequency of the ACPI PM timer or High Precision Event Timer. And we can see that the clock source with the best rating or highest frequency is current in the system.

As I already wrote above, we will consider all of these three clock sources in this part. We will consider it in order of their initialization or:

  • hpet;

  • acpi_pm;

  • tsc.

$ dmesg | grep clocksource
[    0.000000] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1910969940391419 ns
[    0.000000] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 133484882848 ns
[    0.094369] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1911260446275000 ns
[    0.186498] clocksource: Switched to clocksource hpet
[    0.196827] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
[    1.413685] tsc: Refined TSC clocksource calibration: 3999.981 MHz
[    1.413688] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x73509721780, max_idle_ns: 881591102108 ns
[    2.413748] clocksource: Switched to clocksource tsc

High Precision Event Timer

if (late_time_init)
	late_time_init();
static __init void x86_late_time_init(void)
{
	x86_init.timers.timer_init();
	tsc_init();
}
struct x86_init_ops x86_init __initdata = {
   ...
   ...
   ...
   .timers = {
		.setup_percpu_clockev	= setup_boot_APIC_clock,
		.timer_init		= hpet_time_init,
		.wallclock_init		= x86_init_noop,
   },
   ...
   ...
   ...
void __init hpet_time_init(void)
{
	if (!hpet_enable())
		setup_pit_timer();
	setup_default_timer_irq();
}

First of all the hpet_enable function check we can enable High Precision Event Timer in the system by the call of the is_hpet_capable function and if we can, we map a virtual address space for it:

int __init hpet_enable(void)
{
	if (!is_hpet_capable())
		return 0;

    hpet_set_mapping();
}
hpet_virt_address = ioremap_nocache(hpet_address, HPET_MMAP_SIZE);

The timer register space is 1024 bytes

So, the HPET_MMAP_SIZE is 1024 bytes too:

#define HPET_MMAP_SIZE		1024

After we mapped virtual space for the High Precision Event Timer, we read HPET_ID register to get number of the timers:

id = hpet_readl(HPET_ID);

last = (id & HPET_ID_NUMBER) >> HPET_ID_NUMBER_SHIFT;

We need to get this number to allocate correct amount of space for the General Configuration Register of the High Precision Event Timer:

cfg = hpet_readl(HPET_CFG);

hpet_boot_cfg = kmalloc((last + 2) * sizeof(*hpet_boot_cfg), GFP_KERNEL);

After the space is allocated for the configuration register of the High Precision Event Timer, we allow to main counter to run, and allow timer interrupts if they are enabled by the setting of HPET_CFG_ENABLE bit in the configuration register for all timers. In the end we just register new clock source by the call of the hpet_clocksource_register function:

if (hpet_clocksource_register())
	goto out_nohpet;

which just calls already familiar

clocksource_register_hz(&clocksource_hpet, (u32)hpet_freq);

function. Where the clocksource_hpet is the clocksource structure with the rating 250 (remember rating of the previous refined_jiffies clock source was 2), name - hpet and read_hpet callback for the reading of atomic counter provided by the High Precision Event Timer:

static struct clocksource clocksource_hpet = {
	.name		= "hpet",
	.rating		= 250,
	.read		= read_hpet,
	.mask		= HPET_MASK,
	.flags		= CLOCK_SOURCE_IS_CONTINUOUS,
	.resume		= hpet_resume_counter,
	.archdata	= { .vclock_mode = VCLOCK_HPET },
};
setup_default_timer_irq();
static cycle_t read_hpet(struct clocksource *cs)
{
	return (cycle_t)hpet_readl(HPET_COUNTER);
}

function which just reads and returns atomic counter from the Main Counter Register.

ACPI PM timer

If we will look at implementation of the init_acpi_pm_clocksource function, we will see that it starts from the check of the value of pmtmr_ioport variable:

static int __init init_acpi_pm_clocksource(void)
{
    ...
    ...
    ...
	if (!pmtmr_ioport)
		return -ENODEV;
    ...
    ...
    ...
static int __init acpi_parse_fadt(struct acpi_table_header *table)
{
#ifdef CONFIG_X86_PM_TIMER
        ...
        ...
        ...
		pmtmr_ioport = acpi_gbl_FADT.xpm_timer_block.address;
        ...
        ...
        ...
#endif
	return 0;
}

So, if the CONFIG_X86_PM_TIMER Linux kernel configuration option is disabled or something going wrong in the acpi_parse_fadt function, we can't access the Power Management Timer register and return from the init_acpi_pm_clocksource. In other way, if the value of the pmtmr_ioport variable is not zero, we check rate of this timer and register this clock source by the call of the:

clocksource_register_hz(&clocksource_acpi_pm, PMTMR_TICKS_PER_SEC);

function. After the call of the clocksource_register_hs, the acpi_pm clock source will be registered in the clocksource framework of the Linux kernel:

static struct clocksource clocksource_acpi_pm = {
	.name		= "acpi_pm",
	.rating		= 200,
	.read		= acpi_pm_read,
	.mask		= (cycle_t)ACPI_PM_MASK,
	.flags		= CLOCK_SOURCE_IS_CONTINUOUS,
};

with the rating - 200 and the acpi_pm_read callback to read atomic counter provided by the acpi_pm clock source. The acpi_pm_read function just executes read_pmtmr function:

static cycle_t acpi_pm_read(struct clocksource *cs)
{
	return (cycle_t)read_pmtmr();
}

which reads value of the Power Management Timer register. This register has following structure:

+-------------------------------+----------------------------------+
|                               |                                  |
|  upper eight bits of a        |      running count of the        |
| 32-bit power management timer |     power management timer       |
|                               |                                  |
+-------------------------------+----------------------------------+
31          E_TMR_VAL           24               TMR_VAL           0
static inline u32 read_pmtmr(void)
{
	return inl(pmtmr_ioport) & ACPI_PM_MASK;
}

We just read the value of the Power Management Timer register and mask its 24 bits.

That's all. Now we move to the last clock source in this part - Time Stamp Counter.

Time Stamp Counter

At the beginning of the tsc_init function we can see check, which checks that a processor has support of the Time Stamp Counter:

void __init tsc_init(void)
{
	u64 lpj;
	int cpu;

	if (!cpu_has_tsc) {
		setup_clear_cpu_cap(X86_FEATURE_TSC_DEADLINE_TIMER);
		return;
	}
    ...
    ...
    ...

The cpu_has_tsc macro expands to the call of the cpu_has macro:

#define cpu_has_tsc		boot_cpu_has(X86_FEATURE_TSC)

#define boot_cpu_has(bit)	cpu_has(&boot_cpu_data, bit)

#define cpu_has(c, bit)							\
	(__builtin_constant_p(bit) && REQUIRED_MASK_BIT_SET(bit) ? 1 :	\
	 test_cpu_cap(c, bit))
tsc_khz = x86_platform.calibrate_tsc();
cpu_khz = tsc_khz;

for_each_possible_cpu(cpu) {
	cyc2ns_init(cpu);
	set_cyc2ns_scale(cpu_khz, cpu);
}

because only first bootstrap processor will call the tsc_init. After this we check hat Time Stamp Counter is not disabled:

if (tsc_disabled > 0)
	return;
...
...
...
check_system_tsc_reliable();

and call the check_system_tsc_reliable function which sets the tsc_clocksource_reliable if bootstrap processor has the X86_FEATURE_TSC_RELIABLE feature. Note that we went through the tsc_init function, but did not register our clock source. Actual registration of the Time Stamp Counter clock source occurs in the:

static int __init init_tsc_clocksource(void)
{
	if (!cpu_has_tsc || tsc_disabled > 0 || !tsc_khz)
		return 0;
    ...
    ...
    ...
    if (boot_cpu_has(X86_FEATURE_TSC_RELIABLE)) {
		clocksource_register_khz(&clocksource_tsc, tsc_khz);
		return 0;
	}

After these all three clock sources will be registered in the clocksource framework and the Time Stamp Counter clock source will be selected as active, because it has the highest rating among other clock sources:

static struct clocksource clocksource_tsc = {
	.name                   = "tsc",
	.rating                 = 300,
	.read                   = read_tsc,
	.mask                   = CLOCKSOURCE_MASK(64),
	.flags                  = CLOCK_SOURCE_IS_CONTINUOUS | CLOCK_SOURCE_MUST_VERIFY,
	.archdata               = { .vclock_mode = VCLOCK_TSC },
};

That's all.

Conclusion

Links

You can note that besides these three clock source, we don't see yet another two familiar us clock sources in the output of the /sys/devices/system/clocksource/clocksource0/available_clocksource. These clock sources are jiffy and refined_jiffies. We don't see them because this filed maps only high resolution clock sources or in other words clock sources with the flag.

We can make sure that the order is exactly like this in the output of the util:

The first clock source is the , so let's start from it.

The implementation of the High Precision Event Timer for the architecture is located in the source code file. Its initialization starts from the call of the hpet_enable function. This function is called during Linux kernel initialization. If we will look into start_kernel function from the source code file, we will see that after the all architecture-specific stuff initialized, early console is disabled and time management subsystem already ready, call of the following function:

which does initialization of the late architecture specific timers after early jiffy counter already initialized. The definition of the late_time_init function for the x86 architecture is located in the source code file. It looks pretty easy:

As we may see, it does initialization of the x86 related timer and initialization of the Time Stamp Counter. The seconds we will see in the next paragraph, but now let's consider the call of the x86_init.timers.timer_init function. The timer_init points to the hpet_time_init function from the same source code file. We can verify this by looking on the definition of the x86_init structure from the :

The hpet_time_init function does setup of the if we can not enable High Precision Event Timer and setups default timer for the enabled timer:

The is_hpet_capable function checks that we didn't pass hpet=disable to the kernel command line and the hpet_address is received from the table. The hpet_set_mapping function just maps the virtual address spaces for the timer registers:

As we can read in the :

After the clocksource_hpet is registered, we can return to the hpet_time_init() function from the source code file. We can remember that the last step is the call of the:

function in the hpet_time_init(). The setup_default_timer_irq function checks existence of legacy IRQs or in other words support for the and setups depends on this.

That's all. From this moment the clock source registered in the Linux kernel clock source framework and may be used from generic kernel code via the read_hpet:

The seconds clock source is . Implementation of this clock source is located in the source code file and starts from the call of the init_acpi_pm_clocksource function during fs .

This pmtmr_ioport variable contains extended address of the Power Management Timer Control Register Block. It gets its value in the acpi_parse_fadt function which is defined in the source code file. This function parses FADT or Fixed ACPI Description Table table and tries to get the values of the X_PM_TMR_BLK field which contains extended address of the Power Management Timer Control Register Block, represented in Generic Address Structure format:

Address of this register is stored in the Fixed ACPI Description Table table and we already have it in the pmtmr_ioport. So, the implementation of the read_pmtmr function is pretty easy:

The third and last clock source in this part is - clock source and its implementation is located in the source code file. We already saw the x86_late_time_init function in this part and initialization of the starts from this place. This function calls the tsc_init() function from the source code file.

which check the given bit (the X86_FEATURE_TSC_DEADLINE_TIMER in our case) in the boot_cpu_data array which is filled during early Linux kernel initialization. If the processor has support of the Time Stamp Counter, we get the frequency of the Time Stamp Counter by the call of the calibrate_tsc function from the same source code file which tries to get frequency from the different source like , calibrate over and etc, after this we initialize frequency and scale factor for the all processors in the system:

function. This function called during the device . We do it to be sure that the Time Stamp Counter clock source will be registered after the clock source.

This is the end of the sixth part of the that describes timers and timer management related stuff in the Linux kernel. In the previous part got acquainted with the clockevents framework. In this part we continued to learn time management related stuff in the Linux kernel and saw a little about three different clock sources which are used in the architecture. The next part will be last part of this and we will see some user space related stuff, i.e. how some time related implemented in the Linux kernel.

If you have questions or suggestions, feel free to ping me in twitter , drop me or just create .

Please note that English is not my first language and I am really sorry for any inconvenience. If you found any mistakes please send me PR to .

.

chapter
part
x86
second part
sysfs
Time Stamp Counter
High Precision Event Timer
ACPI Power Management Timer
Time Stamp Counter
second part
frequency
ACPI
High Precision Event Timer
Time Stamp Counter
CLOCK_SOURCE_VALID_FOR_HRES
dmesg
High Precision Event Timer
x86
arch/x86/kernel/hpet.c
init/main.c
arch/x86/kernel/time.c
arch/x86/kernel/x86_init.c
programmable interval timer
IRQ
ACPI HPET
IA-PC HPET (High Precision Event Timers) Specification
arch/x86/kernel/time.c
i8259
IRQ0
High Precision Event Timer
ACPI Power Management Timer
drivers/clocksource/acpi_pm.c
initcall
arch/x86/kernel/acpi/boot.c
ACPI
ACPI
Time Stamp Counter
arch/x86/kernel/tsc.c
Time Stamp Counter
arch/x86/kernel/tsc.c
Model Specific Register
programmable interval timer
initcall
High Precision Event Timer
chapter
x86
chapter
system calls
0xAX
email
issue
linux-insides
x86
sysfs
Time Stamp Counter
High Precision Event Timer
ACPI Power Management Timer (PDF)
frequency
dmesg
programmable interval timer
IRQ
IA-PC HPET (High Precision Event Timers) Specification
IRQ0
i8259
initcall
previous part