LLVM学习笔记(14)补1

栏目: 服务器 · 编程工具 · 发布时间: 5年前

3.3.6.4.3. 定义composeSubRegIndicesImpl()方法

前面我们已经输出了寄存器索引的数据,这里面包括了复合寄存器索引。我们已经知道复合寄存器索引是两个寄存器索引共同在一个寄存器上作用的结果,它与这两个索引是等价的。比如,R:a:b与R:c都是指向寄存器同一个部分的索引,我们需要一个方法来确定R:a:b与R:c援引的是同一个东西。为此,LLVM提供了一个方法TargetRegisterInfo::composeSubRegIndices(),对R:a:b与R:c这个例子,composeSubRegIndices(a, b)返回c。

535     unsigned composeSubRegIndices (unsigned a, unsigned b) const {

536     if (!a) return b;

537     if (!b) return a;

538     return composeSubRegIndicesImpl(a, b);

539     }

具体的执行方法是composeSubRegIndicesImpl(),它由下面的RegisterInfoEmitter::runTargetDesc()代码输出的。

RegisterInfoEmitter::runTargetDesc(续)

1331  std::string ClassName = Target.getName() + "GenRegisterInfo";

1332 

1333  auto SubRegIndicesSize =

1334  std::distance(SubRegIndices.begin(), SubRegIndices.end());

1335 

1336  if (!SubRegIndices.empty()) {

1337  emitComposeSubRegIndices (OS, RegBank, ClassName);

1338  emitComposeSubRegIndexLaneMask (OS, RegBank, ClassName);

1339  }

输出的composeSubRegIndicesImpl()是类X86GenRegisterInfo的方法(我们以X86目标机器为例)。基类TargetRegisterInfo的这个方法是不可用的虚函数。

629     void

630       RegisterInfoEmitter::emitComposeSubRegIndices (raw_ostream &OS,

631     CodeGenRegBank &RegBank,

632     const std::string &ClName) {

633     const auto &SubRegIndices = RegBank.getSubRegIndices();

634     OS << "unsigned " << ClName

635     << "::composeSubRegIndicesImpl(unsigned IdxA, unsigned IdxB) const {\n";

636    

637     // Many sub-register indexes are composition-compatible, meaning that

638       //

639       //   compose(IdxA, IdxB) == compose(IdxA', IdxB)

640       //

641       // for many IdxA, IdxA' pairs. Not all sub-register indexes can be composed.

642       // The illegal entries can be use as wildcards to compress the table further.

643    

644       // Map each Sub-register index to a compatible table row.

645     SmallVector RowMap;

646     SmallVector, 4> Rows;

647    

648     auto SubRegIndicesSize =

649     std::distance(SubRegIndices.begin(), SubRegIndices.end());

650     for ( const auto &Idx : SubRegIndices) {

651     unsigned Found = ~0u;

652     for (unsigned r = 0, re = Rows.size(); r != re; ++r) {

653     if ((&Idx, Rows[r])) {

654     Found = r;

655     break ;

656     }

657     }

658     if (Found == ~0u) {

659     Found = Rows.size();

660     Rows.resize(Found + 1);

661     Rows.back().resize(SubRegIndicesSize);

662     combine(&Idx, Rows.back());

663     }

664     RowMap.push_back(Found);

665     }

666    

667     // Output the row map if there is multiple rows.

668     if (Rows.size() > 1) {

669     OS << "  static const " << getMinimalTypeForRange(Rows.size()) << " RowMap["

670     << SubRegIndicesSize << "] = {\n    ";

671     for (unsigned i = 0, e = SubRegIndicesSize; i != e; ++i)

672     OS << RowMap[i] << ", ";

673     OS << "\n  };\n";

674     }

675

676     // Output the rows.

677     OS << "  static const " << getMinimalTypeForRange(SubRegIndicesSize + 1)

678     << " Rows[" << Rows.size() << "][" << SubRegIndicesSize << "] = {\n";

679     for (unsigned r = 0, re = Rows.size(); r != re; ++r) {

680     OS << "    { ";

681     for (unsigned i = 0, e = SubRegIndicesSize; i != e; ++i)

682     if (Rows[r][i])

683     OS << Rows[r][i]->EnumValue << ", ";

684     else

685     OS << "0, ";

686     OS << "},\n";

687     }

688     OS << "  };\n\n";

689    

690     OS << "  --IdxA; assert(IdxA < " << SubRegIndicesSize << ");\n"

691     << "  --IdxB; assert(IdxB < " << SubRegIndicesSize << ");\n";

692     if (Rows.size() > 1)

693     OS << "  return Rows[RowMap[IdxA]][IdxB];\n";

694     else

695     OS << "  return Rows[0][IdxB];\n";

696     OS << "}\n\n";

697     }

650行遍历所有的寄存器索引,容器Rows则保存着每个子寄存器索引到其合成子寄存器索引的映射(这是一个二维数组,每一行以寄存器索引的EnumValue-1为下标(这也是它在容器SubRegIndices中的索引),因此如果某个子寄存器索引存在多个复合方案,将相应存在多个行)。652行的循环检查SubRegIndices[i]是否已经出现在Rows中。combine()函数会逐个比较该索引与Rows的每一行。如果在这一行上,SubRegIndices[i]还未映射,在624行建立这个映射关系。

610     static bool combine ( const CodeGenSubRegIndex *Idx,

611     SmallVectorImpl &Vec) {

612     const CodeGenSubRegIndex::CompMap &Map = Idx->getComposites();

613     for ( const auto &I : Map) {

614     CodeGenSubRegIndex *&Entry = Vec[I.first->EnumValue - 1];

615     if (Entry && Entry != I.second)

616     return false;

617     }

618    

619     // All entries are compatible. Make it so.

620     for ( const auto &I : Map) {

621     auto *&Entry = Vec[I.first->EnumValue - 1];

622     assert ((!Entry || Entry == I.second) &&

623     "Expected EnumValue to be unique");

624     Entry = I.second;

625     }

626     return true;

627     }

而如果与SubRegIndices[i]的映射关系没有建立或者没有映射关系,在616行返回false,继续652行循环。如果循环结束都没有找到这个映射,通过658~665行扩展Rows来新建这个映射。容器RowMap依次记录了这些子寄存器索引对应Rows的行号(664行)。如果Rows多于一行,需要输出RowMap。那么上面的代码将输出这样的代码片段(以ARM为例):

unsigned ARMGenRegisterInfo::composeSubRegIndicesImpl(unsigned IdxA, unsigned IdxB) const {

static const uint8_t RowMap[56] = {

0, 1, 2, 3, 4, 5, 6, 7, 0, 0, 0, 4, 0, 2, 4, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 5, 5, 5, 2,

};

static const uint8_t Rows[8][56] = {

{ 1, 2, 3, 4, 5, 0, 7, 0, 0, 0, 0, 0, 13, 14, 0, 0, 17, 18, 19, 20, 21, 22, 23, 24, 0, 0, 27, 28, 0, 0, 31, 32, 33, 34, 35, 36, 37, 38, 0, 0, 0, 0, 43, 0, 45, 0, 0, 0, 0, 0, 51, 0, 0, 0, 0, 0, },

{ 2, 3, 4, 5, 6, 0, 8, 0, 0, 0, 0, 0, 37, 49, 0, 0, 19, 20, 21, 22, 23, 24, 31, 32, 0, 0, 25, 26, 0, 0, 29, 30, 35, 36, 43, 44, 14, 40, 0, 0, 0, 0, 46, 0, 48, 0, 0, 0, 0, 0, 53, 0, 0, 0, 0, 0, },

{ 3, 4, 5, 6, 7, 0, 0, 0, 0, 0, 0, 0, 14, 15, 0, 0, 21, 22, 23, 24, 31, 32, 29, 30, 0, 0, 0, 0, 0, 0, 27, 28, 43, 44, 46, 47, 49, 0, 0, 0, 0, 0, 51, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, },

{ 4, 5, 6, 7, 8, 0, 0, 0, 0, 0, 0, 0, 49, 55, 0, 0, 23, 24, 31, 32, 29, 30, 27, 28, 0, 0, 0, 0, 0, 0, 25, 26, 46, 47, 51, 52, 15, 0, 0, 0, 0, 0, 53, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, },

{ 5, 6, 7, 8, 0, 0, 0, 0, 0, 0, 0, 0, 15, 16, 0, 0, 31, 32, 29, 30, 27, 28, 25, 26, 0, 0, 0, 0, 0, 0, 0, 0, 51, 52, 53, 54, 55, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, },

{ 6, 7, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 55, 0, 0, 0, 29, 30, 27, 28, 25, 26, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 53, 0, 0, 0, 16, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, },

{ 7, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 27, 28, 25, 26, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, },

{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 26, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, },

};

--IdxA; assert (IdxA < 56);

--IdxB; assert (IdxB < 56);

return Rows[RowMap[IdxA]][IdxB];

}

Rows中0表示不存在复合关系(685行)。

LLVM还提供了一个composeSubRegIndexLaneMask方法,当给定一个寄存器索引的Lane掩码与另一个寄存器索引,该方法返回相应复合寄存器索引的Lane掩码,如果存在的话。

544     unsigned composeSubRegIndexLaneMask (unsigned IdxA, unsigned LaneMask) const {

545     if (!IdxA)

546     return LaneMask;

547     return composeSubRegIndexLaneMaskImpl(IdxA, LaneMask);

548     }

同样,这个方法的实际执行者是TableGen生成的composeSubRegIndexLaneMaskImpl,它由下面的方法输出。

629     void

630 RegisterInfoEmitter::emitComposeSubRegIndexLaneMask (raw_ostream &OS,

631     CodeGenRegBank &RegBank,

632     const std::string &ClName) {

633     // See the comments in computeSubRegLaneMasks() for our goal here.

634     const auto &SubRegIndices = RegBank.getSubRegIndices();

635    

636     // Create a list of Mask+Rotate operations, with equivalent entries merged.

637     SmallVector SubReg2SequenceIndexMap;

638     SmallVector, 4> Sequences;

639     for ( const auto &Idx : SubRegIndices) {

640     const SmallVector &IdxSequence

641     = Idx.CompositionLaneMaskTransform;

642    

643     unsigned Found = ~0u;

644     unsigned SIdx = 0;

645     unsigned NextSIdx;

646     for ( size_t s = 0, se = Sequences.size(); s != se; ++s, SIdx = NextSIdx) {

647     SmallVectorImpl &Sequence = Sequences[s];

648     NextSIdx = SIdx + Sequence.size() + 1;

649     if (Sequence == IdxSequence) {

650     Found = SIdx;

651     break ;

652     }

653     }

654     if (Found == ~0u) {

655     Sequences.push_back(IdxSequence);

656     Found = SIdx;

657     }

658     SubReg2SequenceIndexMap.push_back(Found);

659     }

660    

661     OS << "unsigned " << ClName

662     << "::composeSubRegIndexLaneMaskImpl(unsigned IdxA, unsigned LaneMask)"

663     " const {\n";

664    

665     OS << "  struct MaskRolOp {\n"

666     "    unsigned Mask;\n"

667     "    uint8_t  RotateLeft;\n"

668     "  };\n"

669     "  static const MaskRolOp Seqs[] = {\n";

670     unsigned Idx = 0;

671     for ( size_t s = 0, se = Sequences.size(); s != se; ++s) {

672     OS << "    ";

673     const SmallVectorImpl &Sequence = Sequences[s];

674     for ( size_t p = 0, pe = Sequence.size(); p != pe; ++p) {

675     const MaskRolPair &P = Sequence[p];

676     OS << format("{ 0x%08X, %2u }, ", P.Mask, P.RotateLeft);

677     }

678     OS << "{ 0, 0 }";

679     if (s+1 != se)

680     OS << ", ";

681     OS << "  // Sequence " << Idx << "\n";

682     Idx += Sequence.size() + 1;

683     }

684     OS << "  };\n"

685     "  static const MaskRolOp *const CompositeSequences[] = {\n";

686     for ( size_t i = 0, e = SubRegIndices.size(); i != e; ++i) {

687     OS << "    ";

688     unsigned Idx = SubReg2SequenceIndexMap[i];

689     OS << format("&Seqs[%u]", Idx);

690     if (i+1 != e)

691     OS << ",";

692     OS << " // to " << SubRegIndices[i].getName() << "\n";

693     }

694     OS << "  };\n\n";

695    

696     OS << "  --IdxA; assert(IdxA < " << SubRegIndices.size()

697     << " && \"Subregister index out of bounds\");\n"

698     "  unsigned Result = 0;\n"

699     "  for (const MaskRolOp *Ops = CompositeSequences[IdxA]; Ops->Mask != 0; ++Ops)"

700     " {\n"

701     "    unsigned Masked = LaneMask & Ops->Mask;\n"

702     "    Result |= (Masked << Ops->RotateLeft) & 0xFFFFFFFF;\n"

703     "    Result |= (Masked >> ((32 - Ops->RotateLeft) & 0x1F));\n"

704     "  }\n"

705     "  return Result;\n"

706     "}\n";

707     }

641行CodeGenSubRegIndex的CompositionLaneMaskTransform容器记录了若干个MaskRolPair对象,每个对象记录了从伙伴索引的Lane掩码出发,经过指定位数的右移,能得到复合索引的Lane掩码(参考一节)。709行的循环遍历所有的寄存器索引,临时容器Sequences用于记录不重复的MaskRolPair序列。注意IdxSequence都不可能是空的,因为在调用方法CodeGenRegBank::computeSubRegLaneMasks时,这个方法将不存在复合索引的索引所对应的MaskRolPair对象设置为{~0u, 0}。

在741行与744行的循环里,将MaskRolPair序列的内容输出,声明为一个MaskRolPair数组Seqs,这个数组以元素{0, 0}结尾(空序列只输出{0, 0})。756行循环则是声明了一个CompositeSequences数组,显示指定索引所对应的Seqs元素。最终,我们将得到这样的方法(以ARM为例,X86的太简单、无趣了)。

unsigned ARMGenRegisterInfo::composeSubRegIndexLaneMaskImpl(unsigned IdxA, unsigned LaneMask) const {

struct MaskRolOp {

unsigned Mask;

uint8_t  RotateLeft;

};

static const MaskRolOp Seqs[] = {

{ 0xFFFFFFFF,  0 }, { 0, 0 }, // Sequence 0

{ 0xFFFFFFFF,  2 }, { 0, 0 }, // Sequence 2

{ 0xFFFFFFFF,  4 }, { 0, 0 }, // Sequence 4

{ 0xFFFFFFFF,  6 }, { 0, 0 }, // Sequence 6

{ 0xFFFFFFFF, 14 }, { 0, 0 }, // Sequence 8

{ 0xFFFFFFFF, 12 }, { 0, 0 }, // Sequence 10

{ 0xFFFFFFFF, 10 }, { 0, 0 }, // Sequence 12

{ 0xFFFFFFFF,  8 }, { 0, 0 }, // Sequence 14

{ 0x0000000C, 14 }, { 0x00000030, 10 }, { 0x000000C0,  6 }, { 0x00000300,  2 }, { 0, 0 }, // Sequence 16

{ 0x0000000C, 14 }, { 0x00000030, 10 }, { 0, 0 }, // Sequence 21

{ 0x0000000C, 10 }, { 0x00000030,  6 }, { 0, 0 }, // Sequence 24

{ 0x000000CC,  2 }, { 0x00030000, 30 }, { 0, 0 }, // Sequence 27

{ 0x000000CC,  2 }, { 0x00033000, 30 }, { 0, 0 }, // Sequence 30

{ 0x000000FC,  2 }, { 0x00000300,  8 }, { 0, 0 },   / / Sequence 33

{ 0x0000000C,  4 }, { 0x000000C0, 10 }, { 0, 0 }, // Sequence 36

{ 0x0000003C,  4 }, { 0x000000C0, 10 }, { 0, 0 }, // Sequence 39

{ 0x0000000C,  4 }, { 0x000000C0, 10 }, { 0x00030000, 28 }, { 0, 0 }, // Sequence 42

{ 0x0000000C,  6 }, { 0x000000C0,  8 }, { 0, 0 }, // Sequence 46

{ 0x0000000C,  6 }, { 0x00000030, 12 }, { 0x000000C0,  8 }, { 0, 0 }, // Sequence 49

{ 0x0000000C,  6 }, { 0x000000C0,  8 }, { 0x00030000, 26 }, { 0, 0 }, // Sequence 53

{ 0x0000000C,  6 }, { 0x00000030, 12 }, { 0, 0 }, // Sequence 57

{ 0x0000000C,  6 }, { 0x00000030, 12 }, { 0x000000C0,  8 }, { 0x00000300,  4 }, { 0, 0 }, // Sequence 60

{ 0x0000000C, 14 }, { 0x000000C0,  6 }, { 0, 0 }, // Sequence 65

{ 0x0000000C, 14 }, { 0x00000030, 10 }, { 0x000000C0,  6 }, { 0, 0 }, // Sequence 68

{ 0x0000000C, 12 }, { 0x000000C0,  4 }, { 0, 0 }, // Sequence 72

{ 0x0000000C, 12 }, { 0x00000030,  8 }, { 0x000000C0,  4 }, { 0, 0 }, // Sequence 75

{ 0x0000000C, 12 }, { 0x00000030,  8 }, { 0, 0 }, // Sequence 79

{ 0x0000003C,  4 }, { 0x000000C0, 10 }, { 0x00000300,  6 }, { 0, 0 } // Sequence 82

};

static const MaskRolOp * const CompositeSequences[] = {

&Seqs[0], // to dsub_0

&Seqs[2], // to dsub_1

&Seqs[4], // to dsub_2

&Seqs[6], // to dsub_3

&Seqs[8], // to dsub_4

&Seqs[10], // to dsub_5

&Seqs[12], // to dsub_6

&Seqs[14], // to dsub_7

&Seqs[0], // to gsub_0

&Seqs[0], // to gsub_1

&Seqs[0], // to qqsub_0

&Seqs[16], // to qqsub_1

&Seqs[0], // to qsub_0

&Seqs[4], // to qsub_1

&Seqs[21] , // to qsub_2

&Seqs[24] , // to qsub_3

&Seqs[0], // to ssub_0

&Seqs[0], // to ssub_1

&Seqs[0], // to ssub_2

&Seqs[0], // to ssub_3

&Seqs[0], // to dsub_2_then_ssub_0

&Seqs[0], // to dsub_2_then_ssub_1

&Seqs[0], // to dsub_3_then_ssub_0

&Seqs[0], // to dsub_3_then_ssub_1

&Seqs[0], // to dsub_7_then_ssub_0

&Seqs[0], // to dsub_7_then_ssub_1

&Seqs[0], // to dsub_6_then_ssub_0

&Seqs[0], // to dsub_6_then_ssub_1

&Seqs[0], // to dsub_5_then_ssub_0

&Seqs[0], // to dsub_5_then_ssub_1

&Seqs[0], // to dsub_4_then_ssub_0

&Seqs[0], // to dsub_4_then_ssub_1

&Seqs[0], // to dsub_0_dsub_2

&Seqs[0], // to dsub_0_dsub_1_dsub_2

&Seqs[2], // to dsub_1_dsub_3

&Seqs[2], // to dsub_1_dsub_2_dsub_3

&Seqs[2], // to dsub_1_dsub_2

&Seqs[0], // to dsub_0_dsub_2_dsub_4

&Seqs[0], // to dsub_0_dsub_2_dsub_4_dsub_6

&Seqs[27], // to dsub_1_dsub_3_dsub_5

&Seqs[30], // to dsub_1_dsub_3_dsub_5_dsub_7

&Seqs[33], // to dsub_1_dsub_2_dsub_3_dsub_4

&Seqs[36], // to dsub_2_dsub_4

&Seqs[39], // to dsub_2_dsub_3_dsub_4

&Seqs[42], // to dsub_2_dsub_4_dsub_6

&Seqs[46], // to dsub_3_dsub_5

&Seqs[49], // to dsub_3_dsub_4_dsub_5

&Seqs[53], // to dsub_3_dsub_5_dsub_7

&Seqs[57], // to dsub_3_dsub_4

&Seqs[60], // to dsub_3_dsub_4_dsub_5_dsub_6

&Seqs[65], // to dsub_4_dsub_6

&Seqs[68], // to dsub_4_dsub_5_dsub_6

&Seqs[72], // to dsub_5_dsub_7

&Seqs[75], // to dsub_5_dsub_6_dsub_7

&Seqs[79], // to dsub_5_dsub_6

&Seqs[82] // to qsub_1_qsub_2

};

--IdxA; assert (IdxA < 56 && "Subregister index out of bounds");

unsigned Result = 0;

for ( const MaskRolOp *Ops = CompositeSequences[IdxA]; Ops->Mask != 0; ++Ops) {

unsigned Masked = LaneMask & Ops->Mask;

Result |= (Masked << Ops->RotateLeft) & 0xFFFFFFFF;

Result |= (Masked >> ((32 - Ops->RotateLeft) & 0x1F));

}

return Result;

}

Seqs[0]对应不存在复合索引的情形。如果CompositeSequences给出Seqs[0],那么在倒数第5行Result得到Masked的值,而倒数第4行的右手侧是0。因此返回的是Masked值。

有些Sequence超过两项,这些都对应一个索引能复合从多个索引的情形(只有两项的Sequence也可以对应一个索引能复合从多个索引的情形,关键看Mask有几个比特1),在上面的for循环里,实际上只有其中一个会起作用,其他都会产生0。

V7.0 还引入了 reverseComposeSubRegIndexLaneMask 方法,它是 composeSubRegIndexLaneMask 方法的逆。假设 Mask 是有效的 Lane 掩码,那么下面成立:

X0 = composeSubRegIndexLaneMask(Idx, Mask)

X1 = reverseComposeSubRegIndexLaneMask(Idx, X0)

可以推导出 X1 == Mask

611     LaneBitmask reverseComposeSubRegIndexLaneMask(unsigned IdxA,

612     LaneBitmask LaneMask) const {

613     if (!IdxA)

614     return LaneMask;

615    return reverseComposeSubRegIndexLaneMaskImpl(IdxA, LaneMask);

616     }

v7.0 生成的 X86 相关的函数是这样的:

struct MaskRolOp {

LaneBitmask Mask;

uint8_t  RotateLeft;

};

static const MaskRolOp LaneMaskComposeSequences[] = {

{ LaneBitmask(0xFFFFFFFF),  0 }, { LaneBitmask::getNone(), 0 }, // Sequence 0

{ LaneBitmask(0xFFFFFFFF),  1 }, { LaneBitmask::getNone(), 0 }, // Sequence 2

{ LaneBitmask(0xFFFFFFFF),  2 }, { LaneBitmask::getNone(), 0 }, // Sequence 4

{ LaneBitmask(0xFFFFFFFF),  3 }, { LaneBitmask::getNone(), 0 }, // Sequence 6

{ LaneBitmask(0xFFFFFFFF),  4 }, { LaneBitmask::getNone(), 0 } // Sequence 8

};

static const MaskRolOp * const CompositeSequences[] = {

&LaneMaskComposeSequences[0], // to sub_8bit

&LaneMaskComposeSequences[2], // to sub_8bit_hi

&LaneMaskComposeSequences[4], // to sub_8bit_hi_phony

&LaneMaskComposeSequences[0], // to sub_16bit

&LaneMaskComposeSequences[6], // to sub_16bit_hi

&LaneMaskComposeSequences[0], // to sub_32bit

&LaneMaskComposeSequences[8], // to sub_xmm

&LaneMaskComposeSequences[0] // to sub_ymm

};

LaneBitmask X86GenRegisterInfo::composeSubRegIndexLaneMaskImpl(unsigned IdxA, LaneBitmask LaneMask) const {

--IdxA; assert (IdxA < 8 && "Subregister index out of bounds");

LaneBitmask Result;

for ( const MaskRolOp *Ops = CompositeSequences[IdxA]; Ops->Mask.any(); ++Ops) {

LaneBitmask::Type M = LaneMask.getAsInteger() & Ops->Mask.getAsInteger();

if (unsigned S = Ops->RotateLeft)

Result |= LaneBitmask((M << S) | (M >> (LaneBitmask::BitWidth - S)));

else

Result |= LaneBitmask(M);

}

return Result;

}

LaneBitmask X86GenRegisterInfo::reverseComposeSubRegIndexLaneMaskImpl(unsigned IdxA,  LaneBitmask LaneMask) const {

LaneMask &= getSubRegIndexLaneMask(IdxA);

--IdxA; assert (IdxA < 8 && "Subregister index out of bounds");

LaneBitmask Result;

for ( const MaskRolOp *Ops = CompositeSequences[IdxA]; Ops->Mask.any(); ++Ops) {

LaneBitmask::Type M = LaneMask.getAsInteger();

if (unsigned S = Ops->RotateLeft)

Result |= LaneBitmask((M >> S) | (M << (LaneBitmask::BitWidth - S)));

else

Result |= LaneBitmask(M);

}

return Result;

}

v7.0 emitComposeSubRegIndexLaneMask 方法与 v3.6.1 差异较大,但代码本身不算复杂而且较大,因此不在此处列出。

3.3.6.4.4. ​​​​​​​ 定义getSubClassWithSubReg()方法

RegisterInfoEmitter::runTargetDesc()接下来输出X86GenRegisterInfo方法getSubClassWithSubReg()。这个方法给定一个寄存器类与寄存器索引,返回该寄存器类支持该索引的最大寄存器子类。

RegisterInfoEmitter::runTargetDesc(续)

1341  // Emit getSubClassWithSubReg.

1342  if (!SubRegIndices.empty()) {

1343  OS << "const TargetRegisterClass *" << ClassName

1344  << "::getSubClassWithSubReg(const TargetRegisterClass *RC, unsigned Idx)"

1345  << " const {\n";

1346  // Use the smallest type that can hold a regclass ID with room for a

1347      // sentinel.

1348  if (RegisterClasses.size() < UINT8_MAX)

1349  OS << "  static const uint8_t Table[";

1350  else if (RegisterClasses.size() < UINT16_MAX)

1351  OS << "  static const uint16_t Table[";

1352  else

1353  PrintFatalError("Too many register classes.");

1354  OS << RegisterClasses.size() << "][" << SubRegIndicesSize << "] = {\n";

1355  for ( const auto &RC : RegisterClasses) {

1356  OS << "    {\t// " << RC.getName() << "\n";

1357  for ( auto &Idx : SubRegIndices) {

1358  if (CodeGenRegisterClass *SRC = RC.getSubClassWithSubReg(&Idx))

1359  OS << "      " << SRC->EnumValue + 1 << ",\t// " << Idx.getName()

1360  << " -> " << SRC->getName() << "\n";

1361  else

1362  OS << "      0,\t// " << Idx.getName() << "\n";

1363  }

1364  OS << "    },\n";

1365  }

1366  OS << "  };\n  assert(RC && \"Missing regclass\");\n"

1367  << "  if (!Idx) return RC;\n  --Idx;\n"

1368  << "  assert(Idx < " << SubRegIndicesSize << " && \"Bad subreg\");\n"

1369  << "  unsigned TV = Table[RC->getID()][Idx];\n"

1370  << "  return TV ? getRegClass(TV - 1) : nullptr;\n}\n\n";

1371  }

CodeGenRegisterClass的SubClassWithSubReg容器已经记录了所需要的信息(参见方法CodeGenRegBank::inferSubClassWithSubReg()),因此只要在1355行对CodeGenRegisterClass的遍历过程中,针对每个寄存器索引获取该容器里对应的记录就可以了。对于X86目标机器,输出的函数为:

const TargetRegisterClass *X86GenRegisterInfo::getSubClassWithSubReg( const TargetRegisterClass *RC, unsigned Idx) const {

static const uint8_t Table[80][6] = {

{ // GR8

0, // sub_8bit

0, // sub_8bit_hi

0, // sub_16bit

0, // sub_32bit

0, // sub_xmm

0, // sub_ymm

},

{ // GR16_ABCD

18, // sub_8bit -> GR16_ABCD

18, // sub_8bit_hi -> GR16_ABCD

0, // sub_16bit

0, // sub_32bit

0, // sub_xmm

0, // sub_ymm

},

{ // VR512_with_sub_xmm_in_FR32

0, // sub_8bit

0, // sub_8bit_hi

0, // sub_16bit

0, // sub_32bit

80, // sub_xmm -> VR512_with_sub_xmm_in_FR32

80, // sub_ymm -> VR512_with_sub_xmm_in_FR32

},

};

assert (RC && "Missing regclass");

if (!Idx) return RC;

--Idx;

assert (Idx < 6 && "Bad subreg");

unsigned TV = Table[RC->getID()][Idx];

return TV ? getRegClass(TV - 1) : nullptr;

}

这个方法太大了,我们省略了Table数组的部分定义。在数组定义里注释给出的是对应的寄存器索引,如果是0,表明该寄存器类别没有支持该索引的子类,否则注释会进一步给出这个子类的名字。比如,GR16_ABCD部分,它的前两项分别对应寄存器类GR8与GR8_NOREX,18-1 = 17是GR16_ABCD在RegisterClasses容器的下标(由getRegClass方法获取)。


以上所述就是小编给大家介绍的《LLVM学习笔记(14)补1》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

Powerful

Powerful

Patty McCord / Missionday / 2018-1-25

Named by The Washington Post as one of the 11 Leadership Books to Read in 2018 When it comes to recruiting, motivating, and creating great teams, Patty McCord says most companies have it all wrong. Mc......一起来看看 《Powerful》 这本书的介绍吧!

html转js在线工具
html转js在线工具

html转js在线工具

UNIX 时间戳转换
UNIX 时间戳转换

UNIX 时间戳转换

正则表达式在线测试
正则表达式在线测试

正则表达式在线测试