《产生式元编程》第五章忆昔年模板三两事

教育 2024-06-30 21:14 美国

Generative Metaprogramming with C++:

前言

本系列分为上中下三个篇章，前四章作为上篇，详细介绍了宏在产生式编程中的原理和应用，本章开始步入中篇，正式进入 C++ 的产生式元编程技术。

宏是 C 时期的产物，功能颇为简陋，亦非图灵完备，即或诸般奇技妙诀加身，限制仍多。且调试不易，有所束缚，以是仅作辅助，用来实现简单的代码生成功能。至 C++ 时期，产生式元编程工具陡增，其中以模板为一切的根基，发枝散叶，尔今正向静态反射前进。

本章涉及不计其数的模板技术，主要集中于模板的核心理论和用法，概念错综复杂，技术层出不穷。难度等级绝对不低，是本系列中非常重要的一章。

纵然某些概念和技术你已知悉，也莫要跳过某个小节，因为本系列的重心不在模板编程，而在产生式编程，论述角度将有所差异。

泛型……元编程

传统过程式编程的思路是将逻辑作为函数，数据作为函数的输入和输出，通过无数个函数来组织完整的逻辑需求，好似搭积木，构建出各种物体。

面向对象编程更进一步，思路是将逻辑和数据合并起来，构成一个个类，类中包含各种数据和函数，外部只能借助公开函数操纵这些数据。这种以对象表示现实事物的方式与人的思维更加接近，简化了抽象化的难度。

泛型编程则欲将逻辑和数据分离，并提供一种抽象化数据的方式，使得不同的数据能够使用同一种逻辑，极大降低了代码的重复性。于是，同一个函数能够传递不同的参数类型，这叫函数模板；同一个类能够传递不同的数据，这叫类模板。

C++ 是一种多范式语言，以模板支持泛型编程，允许编写泛型的函数和数据，功能不依赖于某种具体的数据类型，使用 SFINAE 和 Concepts 约束抽象化的类型。这种能力使其支持 Parametric Polymorphism，在多态的选择上不必局限于传统的 Subtype Polymorphism，可以使用编译期多态替代运行期多态。

C++ 中，元编程指的是发生于编译期的编程，模板为 C++ 带来泛型编程的同时，也带来了元编程能力。由模板实现的元编程，称为模板元编程。而产生式元编程，指的是发生于编译期的对于编程的编程，模板本就支持在编译期生成代码，因此模板也能够实现产生式元编程。

抽象化、具体化……模板

在模板世界中，函数不再是具体的函数，数据类型也不再是具体的数据类型。模板宛如一个模具，借其能够产生各式各样的物体，这些物体的颜色、材料可能不尽相同，但是形状、大小是一致的。也就是说，一类物体，必须存在共同的部分，才能够抽象出一个模具，模具的作用就是重复利用这些共同之处，减少重复性。

编程是工具，本质是解决问题，解决问题考验的就是抽象化和具体化的能力。抽象化应对的是现实世界中的不变，而具体化应对的是现实世界中的变化，能够清楚地认识到变与不变是什么，问题也就迎刃而解了。那先来分清抽象化和具体化的概念。

抽象的意思是，在许多事物中，去除非本质的属性，抽出本质属性。抽象化就是呈现出具体事物共同本质的过程。将复杂的现实，简化成单纯的模型，这种抽象化方式称为模型化，所谓问题建模，就是对问题进行模型化，也即抽象的过程。具体的概念相对简单，就是指看得见摸得着的事物，每个事物都是独一无二的，是以变化尤盛。具体化就是将抽象事物加载清晰的过程，是一种有助于理解陌生事物的方式。数据类型是具体事物，能够清晰简明地呈现出不同数据类型的共同逻辑，就是用模板类型代替具体类型的精妙之处。

模板本身是一种抽象化的工具，以其编写的是抽象逻辑，表示某类类型的共有本质，是不变的部分。其本身是没有用的，实际需要的还是具体逻辑，因此需要有具体化的过程，将抽象逻辑转变为具体逻辑。在 C++ 模板中，这种将抽象逻辑转变成具体逻辑的过程，称为实例化。实例化将在编译期将抽象类型替换成具体类型，根据模板生成一个个具体逻辑。这种实例化机制，便是我们所需要的代码生成能力。

与宏不同，模板具体化后生成的代码，并无法直接在生成的源码中看到，但是可以通过 CppInsight 这类工具察看。

比如：

template <typename T>
inline constexpr T Integer = 42;

int main() {
    std::cout << Integer<int> << Integer<long> << "\n";
}

将在编译期生成以下代码：

template<typename T>
inline constexpr const T Integer = 42;

template<>
inline constexpr const int Integer<int> = 42;
template<>
inline constexpr const long Integer<long> = 42;

由此，通过模板，我们拥有了生成数据和函数的能力。

变量、函数与类模板

在编程语言中，能够操纵的最小单元无非就是数据和函数这两类。数据构成了变量及函数的输入输出，数据和函数又组成了类，抽象化作用其上，便可分成变量模板、函数模板和类模板。

变量模板

变量模板是针对变量的抽象化，与其他模板不同，它直到 C++14 才进入标准。

囿于 C++98/03 的局限性，旧时无法直接对变量施加模板。于是，产生了一些替代之法，一种是在类模板中使用 constexpr static 数据成员，另一种是通过 constexpr 函数模板返回常量值。

第一种方法倒是经典，比如 std::numeric_limits 的实现。

template<typename T>
struct numeric_limits {
    static constexpr bool is_modulo = ...;
};

// ...
template<typename T>
constexpr bool numeric_limits<T>::is_modulo;

这种方法存在两个缺点，一是重复性，必须在类外定义静态数据成员，以满足 ODR-used 数据（这点在 C++17 之后不再是问题，通过 inline 可以光明正大地违背 ODR）；二是简洁性，使用时需要通过 numeric_limits<X>::is_modulo 这种冗余的语法形式。

第二种方法，同样可以看 std::numeric_limits 的实现。

template<>
struct numeric_limits<int>
{
    static constexpr int
    min() noexcept { return -__INT_MAX__ - 1; }

    static constexpr int
    max() noexcept { return __INT_MAX__; }

    ...
};

这种方法不存在第一种方法当中的重复性问题，但在定义时需要提前选择如何传递常量。要么通过 const 引用传递，此时需要返回一个存储于静态区常量；要么通过值传递，每次返回时都复制一份常量。直接使用 const(expr) 不会存在这个问题，常量是否需要实际存储只依赖于使用情境，而不依赖于定义。

事实上，编译器的优化能力今非昔比，RVO 优化在某种程度上也能够避免这些缺陷，但毫无疑问，变量模板能够更直接、简单、高效地解决问题。

后期的实现都会避免这些替代方法，而是直接采用变量模板来解决相关问题。于是，可以看到 xxx::value 都被 xxx_v 进一步替代，Mathematical constants 也清一色地使用 inline constexpr 变量模板。

namespace std::numbers {
    template<class T> inline constexpr T e_v          = /* unspecified */;
    template<class T> inline constexpr T log2e_v      = /* unspecified */;
    template<class T> inline constexpr T log10e_v     = /* unspecified */;
    template<class T> inline constexpr T pi_v         = /* unspecified */;
    template<class T> inline constexpr T inv_pi_v     = /* unspecified */;
    template<class T> inline constexpr T inv_sqrtpi_v = /* unspecified */;
    template<class T> inline constexpr T ln2_v        = /* unspecified */;
    template<class T> inline constexpr T ln10_v       = /* unspecified */;
    template<class T> inline constexpr T sqrt2_v      = /* unspecified */;
    template<class T> inline constexpr T sqrt3_v      = /* unspecified */;
    template<class T> inline constexpr T inv_sqrt3_v  = /* unspecified */;
    template<class T> inline constexpr T egamma_v     = /* unspecified */;
    template<class T> inline constexpr T phi_v        = /* unspecified */;

    inline constexpr double e          = e_v<double>;
    inline constexpr double log2e      = log2e_v<double>;
    inline constexpr double log10e     = log10e_v<double>;
    inline constexpr double pi         = pi_v<double>;
    inline constexpr double inv_pi     = inv_pi_v<double>;
    inline constexpr double inv_sqrtpi = inv_sqrtpi_v<double>;
    inline constexpr double ln2        = ln2_v<double>;
    inline constexpr double ln10       = ln10_v<double>;
    inline constexpr double sqrt2      = sqrt2_v<double>;
    inline constexpr double sqrt3      = sqrt3_v<double>;
    inline constexpr double inv_sqrt3  = inv_sqrt3_v<double>;
    inline constexpr double egamma     = egamma_v<double>;
    inline constexpr double phi        = phi_v<double>;
}

如今，若是想使用常量 PI，可以直接通过 std::numbers::pi 得到默认类型的，显式地传递模板参数，可以得到不同类型的常量。

注意，变量模板常在 file scope 下使用，此时 constexpr 会隐式 Internal Linkage，倘只单独使用 constexpr 修饰变量模板，多个 TUs 间链接会拷贝多份数据，造成巨大的开销。而 inline 能够指定 External Linkage，避免这种开销，是以变量模板一般都采用 inline constexpr 修饰。

这些都只是标准中的例子，现实中有哪些用到变量模板的例子呢？

第一个例子，假设你在开发某个模拟器，需要指定模拟的开始时间和结束时间，模拟器是全局可用的，其中的所有模拟设备都会使用该模拟时间。此时的一种做法就是将时间以 inline constexpr/constinit 变量模板放到全局配置，从而能够分别指定和获取以 milliseconds/seconds/minutes/hours... 为精度的模拟时间。当然，更好的做法是像 asio 那样，抽象出一个 sim_context，将所有配置内容放到该类当中，之后再将该上下文对象传递到每个模拟设备当中，从而避免全局对象。

第二个例子，变量模板可以结合 Lambda 重载使用。泛型 Lambda 虽然具有函数模板和类模板的部分特征，但它的数据部分只能通过捕获参数，无法真正像类那样使用。通过结合变量模板，泛型 Lambda 得以像类那样传递数据类型。具体的例子，这里引用泛型 Lambda，如此强大！中展示的抽象工厂法：

template<class... Ts> struct AbstractAIFactory : Ts... { using Ts::operator()...; };
template<class... Ts> AbstractAIFactory(Ts...) -> AbstractAIFactory<Ts...>;

template<class T, class U>
concept IsAbstractAI = std::same_as<T, U>;

template<class T>
static constexpr auto AIFactory = AbstractAIFactory {
    []() requires IsAbstractAI<T, Lux> { return new LuxEasy; },
    []() requires IsAbstractAI<T, Ziggs> { return new ZiggsEasy; },
    []() requires IsAbstractAI<T, Teemo> { return new TeemoEasy; }
};

auto lux = AIFactory<Ziggs>();
lux->print();

此处 AIFactory 并不位于 file scope，于是可以使用 static constexpr 强保证编译期执行。这里的重复性模板无法解决，但是借助 GMP 库目前的代码生成能力，解决起来却是不再话下。可以简化为：

#define CONCRETE_AI_FACTORY(x) \
    []() requires IsAbstractAI<T, x> { return new GMP_CONCAT(x, Easy); },

template<class T>
static constexpr auto AIFactory = AbstractAIFactory {
    GMP_FOR_EACH(CONCRETE_AI_FACTORY, Lux, Ziggs, Teemo)
}

这也是结合不同类型产生式工具的一个例子。

至此，关于变量模板的讨论告一段落，继续来看下一类模板。

函数模板

函数模板是针对函数的抽象化，借此可产生具有同一逻辑的函数族，C++98/03 便进入标准。

函数模板是抽象化的函数表达，抽象之物本身并不存在，它既不是类型，也不是函数。具体化之后的代码才拥有实体，具有真正的参数类型，才属于函数。生成可用代码的过程，称为函数模板实例化，实例化后生成的函数称为模板函数。

函数模板允许编写泛型函数，比如：

template<class T>
const T& min(const T& a, const T& b) {
    return b < a ? b : a;
}

template<class T>
const T& max(const T& a, const T& b) {
    return b < a ? a : b;
}

min() 和 max() 可以接受支持 operator< 的操作数，如此便抽象了所有类型的最大最小逻辑，无需为每个类型重复编写相同的功能。

注意，这里 max() 的实现与标准的实现不同，标准采用的是 a < b ? b : a。在两个操作数等价却不相等时，标准实现将返回 a，而此处的实现将返回 b。采取这种实现的目的是保证更强的一致性，例如 {min(a, b), max(a, b)}，在a 和 b 等价而不相等时，标准实现将返回 {a, a}，此处实现将返回 {a, b}，一个置小，另一个便置大，更加利于排序。

C++ 中等价的定义是 !(a < b) && !(b < a)，而相等的定义是 a == b，前者只对比部分 values，而后者对比全部 values 和 types。下面是一个完整的示例：

// Check if the two operands are equivalent.
template<class T>
constexpr bool equivalent(T const& a, T const& b) {
    return !(a < b) && !(b < a);
}

// Check if the two operands are equal.
template<class T>
constexpr bool equal(const T& a, const T& b) {
    return a == b;
}

// max() in standard
template<class T>
constexpr const T& max1(T const& a, T const& b) {
    return a < b ? b : a;
}

// max() in example
template<class T>
constexpr const T& max2(T const& a, T const& b) {
    return b < a ? a : b;
}

struct X { int a, b; };
constexpr bool operator==(X const& lhs, X const& rhs) {
    return lhs.a == rhs.a && lhs.b == rhs.b;
}

constexpr bool operator<(X const& lhs, X const& rhs) {
    return lhs.a < rhs.a;
}

int main() {
    constexpr X x1{0, 1};
    constexpr X x2{0, 2};

    static_assert(!equal(x1, x2), "x1 and x2 are equal!");
    static_assert(equivalent(x1, x2), "x1 and x2 are not equivalent!");

    static_assert(x1 == max1(x1, x2), "x1 was not returned by max1");
    static_assert(x2 == max2(x1, x2), "x2 was not returned by max2");
}

函数模板亦可重载，不过在重载决议中非模板函数要优于模板函数，此类话题，不再细述，见 [The Book of Modern C++]/p63。

类模板

类模板是针对类的抽象化，借此可产生不同类型的数据和函数，支持偏特化，用处最广。C++98 即存在，至 C++03 各类奇技淫巧已遍地生花，极一时之盛，啧啧之声不绝。

类模板与前两类模板最大的不同在于类能够继承，这其中最绝妙的技术是递归继承。最经典的应用是 std::tuple 的实现，下面展示精简版的一种实现。

template<typename... Types> class Tuple;
template<> class Tuple<> {}

template<typename Head, typename... Tail>
class Tuple<Head, Tail...> : public Tuple<Tail...>
{
    using TailType = Tuple<Tail...>;
protected:
    Head head_;

public:
    Tuple() {}
    Tuple(Head v, Tail... vtails) : head_(v), TailType(vtails...) {}

    Head head() { return head_; }
    TailType& tail() { return *this; }
};

递归继承将依次继承可变模板参数，直至 Tuple<>，假如调用为 Tuplet(1, 2.0, "blah blah...")，展开后如下所示：

------------------------------
Tuple<>                      |
------------------------------
↑
------------------------------
Tuple<string>                |
string head_("blah blah...");|
------------------------------
↑
------------------------------
Tuple<float, string>         |
float head_(2.0);            |
------------------------------
↑
------------------------------
Tuple<int, float, string>    |
int head_(1);                |
------------------------------

递归继承是模板当中最强的代码生成技术，寥寥数行实现，便能自动产生成千上万行代码。相应地，这部分运用起来极为复杂。

洞悉模板实例化

模板实例化的形式分为两种，一种称为显式实例化，一种称为隐式实例化。模板实例化的变量、函数和类等实体叫作特化。

显式实例化语法具有两种形式，显式实例化定义和显式实例化声明，如下所示：

template declaration        // explicit instantiation definition
extern template declaration // explicit instantiation declaration

显式实例化不能添加 inline/constexpr/consteval 修饰符，也不能添加属性修饰。

对于较少接触模板的 C++ 开发者来说，可能从未使用过显式实例化。但对于产生式元编程来说，模板运用频率极高，这无疑会增加编译时间和输出大小，而显式实例化能够优化这方面的开销。

若要深入讨论这部分内容，需要先理解隐式实例化，否则难以知晓开销从何而来。

隐式实例化就是编译器将抽象模板转换成具体代码的方式，编译器和链接器必须确保每个模板实例在可执行程序中只会存在一份。C++ 标准并没有规定到底应该采用哪种方式实现，事实上，主要存在两种基本的解决方案，分别是 Borland 模型和 Cfront 模型。

Borland 模型中，编译器在每个使用模板的翻译单元中都会生成模板实例，再由链接器将多个翻译单元合并起来，剔除重复生成的模板实例，只保留一份。此模板只需考虑目标文件本身，无需担心外部复杂性。但是，由于重复生成多个模板实例，模板代码被重复编译，导致编译时间增长。这种模型下，模板代码的定义通常需要放在头文件当中，以在实例化时能被找到。

Cfront 模型中，增加了一个模板库的概念，用来自动维护模板实例。当构建单个目标文件时，编译器会将遇到的模板定义和实例化放入模板库，如果实例化已经存在于模板库中，编译器会重用该实例化，而不是重新生成。在链接时，链接包装器会添加模板库中的对象，并编译任何之前未生成的所需实例。这个模型对编译速度的优化程度更高，并且无需像 Borland 模型那样需要替换链接器，直接使用系统链接器即可。但是，复杂性也急剧增加，出错的概率更大。实践中，在一个目录中构建多个程序和在多个目录中构建一个程序可能会非常困难。这种模型下，代码通常会将非内联成员模板的定义分离出来，放到一个文件中单独编译。

GNU C++ 采用的便是 Borland 模型，在编译模板代码时，每个翻译单元中都会生成一份使用的模板实例，随后再由链接器移除重复生成的实例，这便是导致编译时间和输出大小增加的罪魁祸首。

如何干掉这个罪魁祸首？

这里不谈某个编译器特有的解决方式，只说 C++ 标准所提供的通用方式，即本节开头所说的显式实例化。

显式实例化定义 C++03 就已支持，主要是用来强制编译器生成实例化模板；显式实例化声明于 C++11 加入标准，主要是用来禁止编译器在翻译单元中实例化模板。换言之，后者能够阻止隐式实例化。阻止之后便避免了 Borland 模型带来的编译开销，减少编译时间，那如何再使用模板实例化呢？就是通过显式实例化定义，手动强制编译器生成。

口说无凭，我们先来验证一下编译器是否真的会生成多个模板实例。

先准备模板头文件 foo.h，内容为：

// foo.h
#ifndef FOO_H_
#define FOO_H_

#include <iostream>


template<class T>
class Foo {
public:
    Foo(T x) : data_(x) {}

    void print() const {
        std::cout << "x:" << data_ << "\n";
    }

private:
    T data_;
};


#endif // FOO_H_

再准备两个源文件，再其中使用该模板定义：

// source_one.cpp
#include "foo.h"

void do_something1() {
    Foo<int> foo(1);
    foo.print();
}

// source_two.cpp
#include "foo.h"

void do_something2() {
    Foo<int> foo(2);
    foo.print();
}

分别使用如下命令，查看两个翻译单元的符号表：

nm -C -S source_one.o | grep Foo
nm -C -S source_two.o | grep Foo

符号表中的内容分别为：

// source_one.o
0000000000000000 000000000000001b W Foo<int>::Foo(int)
0000000000000000 000000000000001b W Foo<int>::Foo(int)
0000000000000000 n Foo<int>::Foo(int)
0000000000000000 0000000000000048 W Foo<int>::print() const

// source_two.o
0000000000000000 000000000000001b W Foo<int>::Foo(int)
0000000000000000 000000000000001b W Foo<int>::Foo(int)
0000000000000000 n Foo<int>::Foo(int)
0000000000000000 0000000000000048 W Foo<int>::print() const

第一列表示符号在目标文件中的地址，0000000000000000 表示编译器在编译阶段没有为它们分配实际地址，这些地址将在链接阶段由链接器进行分配和处理；第二列表示生成代码的大小；第三列表示符号类型，W 表示弱符号，弱符号允许多个定义在链接时共存，n 表示局部符号，是编译器生成供内部使用的符号，不会被导出到最终的链接阶段；最后一列就是符号的名称。

同一个对象文件中出现了相同的构造函数符号，这是为什么呢？直接输出 Mangled Name 查看一下：

0000000000000000 W _ZN3FooIiEC1Ei
0000000000000000 W _ZN3FooIiEC2Ei
0000000000000000 n _ZN3FooIiEC5Ei
0000000000000000 W _ZNK3FooIiE5printEv

可见，尽管 Demangled Name 是相同的，但 Mangled Name 却不相同。唯一的区别在于 C1/C2，其意义为：

<ctor-dtor-name> ::= C1    # complete object constructor
        ::= C2           # base object constructor
        ::= C3           # complete object allocating constructor
        ::= D0           # deleting destructor
        ::= D1           # complete object destructor
        ::= D2           # base object destructor

C2 表示基类对象构造，例子中并不涉及继承，不需要生成这个符号，但编译器依然生成了。GCC Bugs 中提及，尽管该问题频繁被报告，但并不是实际的错误：

G++ emits two copies of constructors and destructors.
    In general there are three types of constructors (and destructors).
1. The complete object constructor/destructor.
        2. The base object constructor/destructor.
        3. The allocating constructor/deallocating destructor.
    The first two are different, when virtual base classes are involved.

例子中没有虚基类，C1/C2 是相同的，在足够的优化级别上，GCC 实际上会将两个符号链接到相同的代码。因此，不必太过在意出现相同的构造函数符号，只看一个就行。

两个翻译单元，source_one.o 和 source_two.o 之中，为每个模板函数都生成了一个单独的节，并占据着一定的空间。若是不计其数的翻译单元中都使用该模板，编译器会为每个单元都实例化模板，增加编译时间，重复占用输出文件的大小。

显式实例化如何避免这种开销？有三种方式，下面分别展示。

第一种，将模板定义从头文件移动到 CPP 源文件，头文件中只留下模板声明，并在源文件中使用显式实例化定义强制编译器生成模板实例。

也就是说，将代码变为：

// foo.h
#ifndef FOO_H_
#define FOO_H_

template<class T>
class Foo {
public:
    Foo(T x);

    void print();

private:
    T data_;
};

#endif // FOO_H_

// foo.cpp
#include "foo.h"
#include <iostream>


template<class T>
Foo<T>::Foo(T x)
    : data_(x)
{}

template<class T>
void Foo<T>::print()
{
    std::cout << "x:" << data_ << "\n";
}

// Explicit template instantiate
template class Foo<int>;

此时再来观察 source_one.cpp 和 source_two.cpp 生成目标当中的符号表，如下所示：

// source_one.o
                 U Foo<int>::print()
                 U Foo<int>::Foo(int)

// source_two.o
                 U Foo<int>::print()
                 U Foo<int>::Foo(int)

U 表示未定义的符号，模板定义如今放到了 CPP 源文件，这两个翻译单元无法看见，自然不会编译，也没有占用目标文件大小。唯一的一份模板实例在 foo.cpp 当中，它们可以使用此份实例，链接之时并不会存在问题。

添加一个 main.cpp 提供主函数：

extern void do_something1();
extern void do_something2();

int main() {
    do_something1();
    do_something2();
}

链接输出如下：

$> g++ -o main foo.o source_one.o source_two.o main.o
$> ./main
x:1
x:2

这种方式的缺点在于，外部项目无法实例化模板，只能使用 Foo<int> 这一种显式实例化的模板实例。当然，倘若你正好想要禁止用户实例化模板，只允许他们使用内置的某几种实例，又要避免编译开销，这个缺点反而能够利用起来。

第二种，模板定义依旧放在头文件中，但是在其中添加显式实例化声明禁止编译器实例化模板，再手动使用显式实例化定义强制编译器生成想要的模板实例。

具体来说，将代码变为：

// foo.h
#ifndef FOO_H_
#define FOO_H_

#include <iostream>


template<class T>
class Foo {
public:
    Foo(T x) : data_(x) {}

    void print() const {
        std::cout << "x:" << data_ << "\n";
    }

private:
    T data_;
};


// Explicit template declaration
extern template class Foo<int>;

#endif // FOO_H_

然后，在 source_one.cpp 显式实例化模板，source_two.cpp 中什么都不做。代码变为：

// source_one.cpp
#include "foo.h"

template class Foo<int>;

void do_something1() {
    Foo<int> foo(1);
    foo.print();
}


// source_two.cpp
#include "foo.h"

void do_something2() {
    Foo<int> foo(2);
    foo.print();
}

于是，当前只有 source_one.cpp 中实际生成了模板实例，source_two.cpp 中没有模板实例。查看目标文件符号表，内容为：

// source_one.o
0000000000000000 000000000000001b W Foo<int>::Foo(int)
0000000000000000 000000000000001b W Foo<int>::Foo(int)
0000000000000000 n Foo<int>::Foo(int)
0000000000000000 0000000000000048 W Foo<int>::print() const

// source_two.o
                 U Foo<int>::Foo(int)
                 U Foo<int>::print() const

可见事实的确如此。若是现在链接这两个目标文件，其中也只会存在一份实例（但是没有剔除操作，本来就只存在一份模板实例）：

$> ld -r -o source.o source_one.o source_two.o
$> nm -C source.o
                 U _GLOBAL_OFFSET_TABLE_
000000000000009c t _GLOBAL__sub_I__Z13do_something1v
0000000000000151 t _GLOBAL__sub_I__Z13do_something2v
0000000000000000 T do_something1()
00000000000000b5 T do_something2()
000000000000004f t __static_initialization_and_destruction_0(int, int)
0000000000000104 t __static_initialization_and_destruction_0(int, int)
0000000000000000 W Foo<int>::Foo(int)
0000000000000000 W Foo<int>::Foo(int)
0000000000000000 n Foo<int>::Foo(int)
0000000000000000 W Foo<int>::print() const
                 U std::ostream::operator<<(int)
                 U std::ios_base::Init::Init()
                 U std::ios_base::Init::~Init()
                 U std::cout
0000000000000000 r std::piecewise_construct
0000000000000006 r std::piecewise_construct
0000000000000000 b std::__ioinit
0000000000000001 b std::__ioinit
                 U std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)
                 U __cxa_atexit
                 U __dso_handle
                 U __stack_chk_fail

接着，再将这两个文件和 main.cpp 链接，输出可执行文件，看程序是否能够正常运行。

$> g++ -o main source_two.o source_one.o main.o
$> ./main
x:1
x:2

一切正常，这种方式的确有效。但是也有一些缺点，将所有模板定义在一个单独的头文件当中，该文件稍有动改，依赖的所有文件都得重新编译。再者，这种 header-only 的方式，迫使用户自己实例化模板，将 header-only 去除，把实例化放到 CPP 源文件中可以避免该问题。此外，模板类型若是自定义类型，则会强制用户包含包含该类型，无法通过前置声明解决，头文件依赖性增加。

第三种，模板定义依旧放在头文件中，每个包含者增加显式实例化声明禁止隐式实例化模板。

此时，模板头文件就是本节最开始的原始实现，无需改变，在 foo.cpp 中写入显式实例化定义。于是，代码为：

// foo.cpp
#include "foo.h"

template class Foo<int>;

再在每个包含模板头文件的源文件中，使用显式实例化声明禁止编译器生成模板实例。实现变为：

// source_one.cpp
#include "foo.h"

// Explicit template declaration
extern template class Foo<int>;

void do_something1() {
    Foo<int> foo(1);
    foo.print();
}


// source_two.cpp
#include "foo.h"

// Explicit template declaration
extern template class Foo<int>;

void do_something2() {
    Foo<int> foo(2);
    foo.print();
}

编译生成目标文件，再次查看符号表，内容为：

// source_one.o
                 U Foo<int>::Foo(int)
                 U Foo<int>::print() const
// source_two.o
                 U Foo<int>::Foo(int)
                 U Foo<int>::print() const

可见，这两个翻译单元没有实例化模板。

最后，链接所有文件，查看结果是否正常。

$> g++ -o main foo.o source_one.o source_two.o main.o
$> ./main
x:1
x:2

结果依旧不存在问题。这种方式的缺点很明显，每个使用者都需要在源文件中使用显式实例化声明以禁止编译器生成模板实例，而他们可能会在不经意间忘记。

本节至此结束，关于隐式实例化，就是模板参数推导和模板参数替换这个流程，老生常谈，见 [The Book of Modern C++]/p63。

模板参数

本节介绍模板参数，属于模板的输入，也就是抽象的数据类型。

模板参数的类型

模板参数类型可以分成如下六种：

Type
NonType
Type template
Variable template
Concept
Universal

用代码表示为：

template<
  typename Type,                                // Type
  auto NonType,                                 // Non-type
  template<template auto...> class Temp,        // Type template
  template<template auto...> auto Var,          // Variable template
  template<template auto...> concept Concept,   // Concept
  template auto Universal                       // Universal
>

当然，目前 C++ 标准尚不支持全部的六种类型，本节只看前三种。

类型参数（Type）

这种模板参数的类型必须是一个 type-id，平时模板代码中使用的就是此类参数。

例如：

// Example from ISO C++
template<class T> class X {};
template<class T> void f(T t) {}

struct {} unnamed_obj;

void f() {
    struct A {};
    enum { e1 };
    typedef struct {} B;
    B b;
    X<A> x1;        // OK
    X<A*> x2;       // OK
    X<B> x3;        // OK
    f(e1);          // OK
    f(unnamed_obj); // OK
    f(b);           // OK
}

若是模板存在类型参数和非类型参数的重载，type-id 与表达式产生将产生歧义，此时参数类型解析为 type-id，即类型参数。

例如：

// Example from ISO C++
template<class T> void f();
template<int I> void f();

void g() {
    f<int()>(); // int() is a type-id: call the first f()
}

非类型参数（Non-Type）

这种模板参数一般称为 NTTP(Non-Type Template Parameter)。C++20 之前，NTTP 被限制为：

lvalue reference type (to object or to function);
an integral type;
a pointer type (to object or to function);
a pointer to member type (to member object or to member function);
an enumeration type;
std::nullptr_t (since c++11)

C++20 之后，NTTP 进一步扩展为：

an floating-point type;
a literal class type with the following properties:
all base classes and non-static data members are public and non-mutable and
the types of all bases classes and non-static data members are structural types or (possibly multi-dimensional) array thereof.
from https://en.cppreference.com/w/cpp/language/template_parameters

NTTP 作为模板参数，会转换成模板参数类型的常量表达式。NTTP 类型不是抽象类型，而是具体类型的常量表达式，所以称为非类型模板参数。

如果 NTTP 是引用或指针，引用或指向的值不能是：

a temporary object;
a string literal object;
the result of a typeid expression
a predefined func variable;
a subobject of one of the above.
from ISO/IEC C++ 2020 13.4.2/4

例如：

// Example from ISO C++
template<const int* pci> struct X {};
int ai[10];
X<ai> xi; // array to pointer and qualification conversions

struct Y {};
template<const Y& b> struct Z {};
Y y;
Z<y> z;               // no conversion, but note extra cv-qualification

template<int (&pa)[5]> struct W {};
int b[5];
W<b> w;               // no conversion

void f(char);
void f(int);

template<void (*pf)(int)> struct A {};

A<&f> a;              // selects f(int)

template<auto n> struct B {};
B<5> b1;              // OK, template parameter type is int
B<'a'> b2;            // OK, template parameter type is char
B<2.5> b3;            // OK since C++20, template parameter type is double
B<void(0)> b4;        // error, template parameter type cannot be void

再来看一个临时对象的例子：

// Example from ISO C++
template<const int& CRI> struct B {};

B<1> b1;        // error: temporary would be required for template argument

int c = 1;
B<c> b2;        // OK

struct X { int n; };
struct Y { const int &r; };
template<Y y> struct C {};
C<Y{X{1}.n}> c; // error: subobject of temporary object used to initialize
                // reference member of template parameter

最后，需要注意，string-literal 不能直接用作 NTTP，但可以间接通过常量指针或作为类的构造函数参数来使用。

// Example from ISO C++
template<class T, T p> class X {};

X<const char*, "Studebaker"> x; // error: string literal object as template-argument
X<const char*, "Knope" + 1> x2; // error: subobject of string literal object as template-argument

const char p[] = "Vivisectionist";
X<const char*, p> y;            // OK

struct A {
    constexpr A(const char*) {}
};

X<A, "Pyrophoricity"> z;       // OK, string-literal is a constructor argument to A

类型模板参数（Type template）

这是模板 Type 参数的复杂形式，也就是 template template arguments。

此种情况下，模板实参只能是类模板或别名模板。当模板实参是类模板时，形参匹配只会考虑主模板，偏特化模板仅在基于此模板模板形参实例化时才会被考虑。

例如：

/// Example from ISO C++

// primary template
template<class T> class A {
    int x;
};

// partial specialization
template<class T> class A<T*> {
    long x;
};

template<template<class U> class V> class C {
    V<int> y;
    V<int*> z;
};

// V<int> within C<A> uses the primary template, so c.y.x has type int
// V<int*> within C<A> uses the partial specialization, so c.z.x has type long
C<A> c;

模板实参和模板模板形参匹配时，后者至少要与前者同等特殊。简单来说，就是形参 P 不能比实参 A 更加抽象。这是 C++17 才修复的一个问题，之前 P 与 A 必须完全匹配，导致一些合理的实参无法匹配。比如：

template <template <typename> class> void FD();
template <typename, typename = int> struct SD { /* ... */ };
FD<SD>();  // OK; error before C++17

C++17 之后，只要满足同等特殊的规则，P 与 A 就能够匹配，如：

template <template <int> class> void FI();
template <template <auto> class> void FA();
template <auto> struct SA {};
template <int> struct SI {};
FI<SA>();  // OK; error before C++17
FA<SI>();  // error

对于 FI<SA>()，模板模板形参 P 是 int，而模板实参 A 是 auto，前者比后者特殊，这叫至少同等特殊。前者的特殊性只要小于等于后者，就叫同等特殊，否则便匹配失败，例如 FA<SI>()。

下面提供更多例子：

template<class T> class A {};
template<class T, class U = T> class B {};
template<class... Types> class C {};
template<auto n> class D {};

template<template<class> class P> class X {};
template<template<class...> class Q> class Y {};
template<template<int> class R> class Z {};

X<A> xa; // OK
X<B> xb; // OK since C++17
X<C> xc; // OK since C++17
Y<A> ya; // OK
Y<B> yb; // OK
Y<C> yc; // OK
Z<D> zd; // OK since C++17

再来看一个特殊的例子：

// Example from https://stackoverflow.com/questions/72911910/exact-rules-for-matching-variadic-template-template-parameters-in-partial-templa
#include <variant>
#include <iostream>

template<typename T>
struct TypeChecker {
    void operator()() {
        std::cout << "I am other type\n";
    }
};

template<typename ... Ts, template<typename> typename V>
requires std::same_as<V<Ts...>, std::variant<Ts...>>
struct TypeChecker<V<Ts...>>
{
    void operator()()
    {
        std::cout << "I am std::variant\n";
    }
};

int main()
{
    TypeChecker<std::variant<int, float>>{}();
    TypeChecker<std::variant<float>>{}();
    TypeChecker<int>{}();
}

这个例子中模板模板形参和模板实参是否满足至少同等特殊原则？P 是 template<typename> typename V，A 是 std::variant<Ts...>，看着无法完全匹配，实际上 P 与 A 满足至少同等特殊原则。因此输出应该为：

I am std::variant
I am std::variant
I am other type

然而，只有 GCC 和 MSVC 默认会有以上结果， Clang 的输出都是 I am other type。只因其默认没有应用 C++17 这个修复的匹配规则，加上 -frelaxed-template-template-args 编译标志，则可得到正常输出。此外，在 C++17 以前，GCC 可以添加 -fnew-ttp-matching 标志来得到新的匹配结果。

同时也需注意，实参并非一定要是例子中的 std::variant，其他支持可变模板参数的类同样适用，如 std::tuple，或自定义类型 template<class...> struct S。

最后，再来看两个例子，结束本节。

例子一：

template<class T> struct eval;

template<template<class, class...> class TT, class T1, class... Rest>
struct eval<TT<T1, Rest...>> {};

template<class T1> struct A;
template<class T1, class T2> struct B;
template<int N> struct C;
template<class T1, int N> struct D;
template<class T1, class T2, int N = 17> struct E;


eval<A<int>> eA;        // OK: matches partial specialization of eval
eval<B<int, float>> eB; // OK: matches partial specialization of eval
eval<C<17>> eC;         // error: C does not match TT in partial specialization
eval<D<int, 17> eD;     // error: D does not match TT in partial specialization
eval<E<int, float>> eE; // error: E does not match TT in partial specialization

模板形参和实参不匹配，实参中的非类型参数比模板形参更特殊，于是出错。

例子二：

template<typename T> concept C = requires (T t) { t.f(); };
template<typename T> concept D = C<T> && requires (T t) { t.g(); };

template<template<C> class P> struct S {};

template<C> struct X {};
template<D> struct Y {};
template<typename T> struct Z {};

S<X> s1; // OK, X and P have equivalent constraints
S<Y> s2; // error: P is not at least as specialized as Y
S<Z> s3; // OK, P is at least as specialized as Z

当有 Concept 约束时，至少同等特殊原则更易识别。若是模板模板形参和模板实参的约束一致，则约束等价；若模板模板形参比模板实参的约束多一些，则不满足至少同等特殊原则；若模板模板形参比模板实参的约束少一些，则满足。

实现 make_index_sequence 展现出的各种模板技术

本节以实现 make_index_sequence 作为问题，展示诸多经典的模板技术。

需要注意，此处并未参考 std::make_index_sequence 的实现，标准实现一般考虑得更加周到，本处只是籍此需求演示各种模板技术，作为独立的例子理解即可。

另外，模板和宏的代码生成机制不同，模板无法直接生成原生的索引，例如上一章中 GMP_MAKE_INDEX_SEQUENCE(2) 可以生成 0, 1，而模板参数需要依附于变量、函数或类，标准里面这个类命名为 index_sequence，于是对应的生成结果便为 index_sequence<0, 1>。

在实现 make_index_sequence 之前，先定义 index_sequence。代码为：

template<size_t... Is> struct index_sequence {};

template<size_t... Is>
void print(index_sequence<Is...>) {
    (std::cout << Is << " ", ...);
    std::cout << "\n";
}

int main() {
    // Output: 0 1 2
    print(index_sequence<0, 1, 2>{});
}

此处只是利用可变模板参数，无须絮烦。但要注意，make_index_sequence 实现方式不同，index_sequence 的实现可能也需要微调，增加一些抽象。后续小节，有差异时会明确写出，无差异时即为此默认实现。

以模板偏特化实现 make_index_sequence

第一种方式采用模板偏特化，作为最经典的模板特性，暴露核心用法。

完整实现为：

template<size_t N, bool, size_t...> struct make_index_sequence_helper {};

template<size_t N, size_t... Is>
struct make_index_sequence_helper<N, false, Is...> {
    using type = index_sequence<Is...>;
};

template<size_t N, size_t... Is>
struct make_index_sequence_helper<N, true, Is...> {
    using type = typename make_index_sequence_helper<N - 1, 0 < N - 1, Is..., sizeof...(Is)>::type;
};

template<size_t N>
using make_index_sequence = typename make_index_sequence_helper<N, 0 < N>::type;

这个实现引出了一个关键的模板技术，如何动态产生可变模板参数？

一般来说，可变模板参数由用户传入，而 make_index_sequence<N> 本身就是避免用户传入可变的参数，取而代之的是一个编译期常量。怎样从无到有产生可变模板参数，是一个核心技术，直接关系到编译期迭代技术。

当前的模板不支持自下而上迭代，只允许自上而下递归。因此，递归是应对动态产生式问题的根本思维。递归存在两个关键点，一是结束条件，二是链接关系。逻辑被拆成两个分支，结束为 false，未结束为 true，这正是需要使用模板偏特化的原因。结束条件很简单，生成最终的 index_sequence，主要逻辑处理在链接关系这个分支。链接关系逻辑分支中，每次 N 减一的同时，可变模板参数增加一，怎么增加？便需要借助 sizeof...(Is)，它可以计算可变模板参数数量，这个数量正好与索引值对应。初始情况下，不存在可变模板参数，于是 sizeof...(Is) 产生 0，每递归一次，可变模板参数加一，直至 N 等于 0，步入结束条件。

测试代码为：

int main() {
    // Output: 0 1 2 3 4
    print(make_index_sequence<5>{});
}

以 constexpr if 实现 make_index_sequence

第二种方式采用 C++17 constexpr if，两个分支不需再借助偏特化模板，极大降低代码重复性，流程近似动态条件分支，代码更加简洁、易于理解。

template<size_t N, size_t... Is>
constexpr auto make_index_sequence() {
    if constexpr (sizeof...(Is) < N)
        return make_index_sequence<N, Is..., sizeof...(Is)>();
    else
        return index_sequence<Is...>{};
}

解法思路，表过不题。且说两种方式的区别，模板偏特化采用的是类模板，而 constexpr if 采用的是函数模板，由是一个返回的是类型，一个返回的是编译期常量值。

以递归 Lambda 实现 make_index_sequence

第三种方式采用 C++23 Recursive Lambda，无非是第二种方式的另一种表现形式。

constexpr auto make_index_sequence = []<size_t N, size_t... Is>(this auto self) {
    if constexpr (sizeof...(Is) < N)
        return self.template operator()<N, Is..., sizeof...(Is)>();
    else
        return index_sequence<Is...>{};
};

print(make_index_sequence.template operator()<5>());

Template Lambda 是 C++20 加入的特性，调用的其实是 template<size_t N, size_t... Is> operator()，是以必须使用这种奇怪的语法来传递参数。

若想避免这种语法，可以为 Lambda 增加一个 int_constant 参数，用法变为：

template<size_t...> struct int_constant {};

constexpr auto make_index_sequence = []<size_t N, size_t... Is>
    (this auto self, int_constant<N, Is...>) {
        if constexpr (sizeof...(Is) < N)
            return self(int_constant<N, Is..., sizeof...(Is)>{});
        else
            return index_sequence<Is...>{};
};

print(make_index_sequence(int_constant<5>{}));

较于前者，此法略微简化了一些用法，但仍麻烦。更加精妙的一种方式是组合泛型 Lambda 和变量模板，添上类型能力。于是，可以像前面两种实现方式那般使用：

template<size_t N>
inline constexpr auto make_index_sequence = []<size_t... Is>(this auto self) {
    if constexpr (sizeof...(Is) < N)
        return self.template operator()<Is..., sizeof...(Is)>();
    else
        return index_sequence<Is...>{};
};

print(make_index_sequence<5>());

这种方式和第二种方式 constexpr if 同源，返回的依旧是一个编译期常量值。

以标签分发实现 make_index_sequence

第四种方式采用标签分发，本质是利用标签区分函数重载，选择不同的分支，是早期的一种模板技术。

完整实现如下：

template<size_t N, size_t... Is>
constexpr auto make_index_sequence_impl(std::false_type) {
    return index_sequence<Is...>{};
}

template<size_t N, size_t... Is>
constexpr auto make_index_sequence_impl(std::true_type) {
    return make_index_sequence_impl<N, Is..., sizeof...(Is)>(
        std::bool_constant<sizeof...(Is) + 1 < N>{});
}

template<size_t N>
constexpr auto make_index_sequence() {
    return make_index_sequence_impl<N>(std::bool_constant<0 < N>{});
}

这里直接借助标准中的已有标签 std::false_type 和 std::true_type 实现分发，分别对应两个逻辑分支，实现递归。std::bool_constant 能够生成这两个标签，从而完成实际分发。

此法与模板偏特化相似，但本质上属于不同的多态表现方式。

一种更快的 make_index_sequence 实现法

本节介绍一种 O(logN) 复杂度的实现，来自 https://stackoverflow.com/a/17426611/19868918 ，编译速度更快，但相应地，也更复杂。

完整实现为：

template<size_t... Is> struct index_sequence {
    using type = index_sequence;
};

template<class, class> struct merge_index_sequence {};

template<size_t... LIs, size_t... RIs>
struct merge_index_sequence<index_sequence<LIs...>, index_sequence<RIs...>>
    : index_sequence<LIs..., (sizeof...(LIs) + RIs)...>
{};

template<size_t N>
struct make_index_sequence : merge_index_sequence<
        typename make_index_sequence<N / 2>::type,
        typename make_index_sequence<N - N / 2>::type
    >
{};

template<> struct make_index_sequence<0> : index_sequence<> {};
template<> struct make_index_sequence<1> : index_sequence<0> {};

这是一种与递归继承相似的实现思路，前面的实现思路都是线性递增，而这种思路每次会折半实例化，直接把实现化深度从 O(N) 降低到 O(logN)。核心思路并无变化，依旧是递归，利用 sizeof...(Is) 逐次增加模板参数。变化主要在于折半实例化与合并，问题被拆分得更小。

以 N 取 5 为例，此时被拆分成求 2 和 3，代码表示为：

merge_index_sequence<make_index_sequence<2>, make_index_sequence<3>>
↑
make_index_sequence<5>

这是第一层深度，问题转换成求 make_index_sequence<2> 和 make_index_sequence<3>。

先看 make_index_sequence<2>，又被拆分成求 1 和 2，代码表示为：

merge_index_sequence<make_index_sequence<1>, make_index_sequence<0>>
↑
make_index_sequence<2>

到此时就不能再拆分，否则 1 - 1/2 将造成死循环，以是 make_index_sequence<1> 和 make_index_sequence<0> 特化作为结束条件，make_index_sequence<2> 最终生成：

index_sequence<0, 1>
↑
make_index_sequence<2>

取 type，即为 index_sequence<0, 1>。

再看 make_index_sequence<3>，被拆分成 1 和 2，代码表示为：

merge_index_sequence<make_index_sequence<1>, make_index_sequence<2>>
↑
make_index_sequence<3>

同理可得，最终生成 index_sequence<0, 1, 2>。

以上是第二层深度，问题拆分便已解决，至此合并结束。利用 merge_index_sequence 合并 index_sequence<0, 1> 和 index_sequence<0, 1, 2>，最终得到 index_sequence<0, 1, 2, 3, 4>，问题解决。

注意，前文提到这种思路与递归继承相似，但其并非递归继承，只是递归实例化的一种形式，远无递归继承复杂。

类型萃取

类型萃取就是从已有模板类型中提取信息的方法，是模板的核心技术之一。

以 std::function 为例，如何萃取模板参数的基本信息？整体思路是增加间接层，将想要的参数重新写到模板头。

实现一个 function_traits 萃取信息，代码如下：

template<typename T>
struct function_traits;

template<typename R, typename... Args>
struct function_traits<std::function<R(Args...)>>
{
    static constexpr std::size_t size = sizeof...(Args);
    using result_type = R;

    template<size_t I>
    struct get {
        using type = typename std::tuple_element<I, std::tuple<Args...>>::type;
    };
};

这个工具能够萃取 std::function 的参数个数、返回类型及指定位参数类型。使用示例：

template<class F, size_t I>
void print_ith_type() {
    std::cout << typeid(typename function_traits<F>::get<I>::type).name() << "\n";
};

int main() {
    using FuncType = std::function<void(int, double, std::string)>;

    std::cout << "size of template parameter list: " << function_traits<FuncType>::size << "\n";
    std::cout << typeid(function_traits<FuncType>::result_type).name() << "\n";

    print_ith_type<FuncType, 0>();
    print_ith_type<FuncType, 1>();
    print_ith_type<FuncType, 2>();
}

当然可以进一步简化完善，此处只展示这种方式，不细作表。

但也有某些特殊的，以 is_member_function_pointer 的一种实现方式为例：

template< class T >
struct is_member_function_pointer_helper : std::false_type {};

template< class T, class U>
struct is_member_function_pointer_helper<T U::*> : std::is_function<T> {};

template< class T >
struct is_member_function_pointer : is_member_function_pointer_helper< std::remove_cv_t<T> > {};

注意这里的 <T U::*>，T 和 U::* 分别对应哪部分类型？倘若成员函数为 void A::foo()，此时 T 对应 void()，U::* 对应 A::foo。

类型萃取在模板元编程中，并不算难，不再多论。

总结

本章深入回顾了模板涉及的核心理论和技术，是产生式模板元编程的基本知识。

模板所带来的泛型思维、模板参数、模板类型、模板实例化、类型萃取是其中的核心，再高深的模板技术，也以此为基石。

某些概念讨论得鞭辟入里，比较深刻，即便是模板书籍中亦未涉及，有一定难度，但对于产生式元编程来说，很是必要。

后续章节，将利用这些技术实现代码生成，编写强大的工具。

http://mp.weixin.qq.com/s?__biz=MzUxOTQ4NjIzNw==&mid=2247488517&idx=1&sn=81ec9ec31f8ff6381abbea3bb2dbfd9e

CppMore

Dive deep into the C++ core, and discover more!

《产生式元编程》 第五章 忆昔年模板三两事

前言