3.3.6.4.3. 定义composeSubRegIndicesImpl()方法
前面我们已经输出了寄存器索引的数据,这里面包括了复合寄存器索引。我们已经知道复合寄存器索引是两个寄存器索引共同在一个寄存器上作用的结果,它与这两个索引是等价的。比如,R:a:b与R:c都是指向寄存器同一个部分的索引,我们需要一个方法来确定R:a:b与R:c援引的是同一个东西。为此,LLVM提供了一个方法TargetRegisterInfo::composeSubRegIndices(),对R:a:b与R:c这个例子,composeSubRegIndices(a, b)返回c。
535 unsigned composeSubRegIndices (unsigned a, unsigned b) const {
536 if (!a) return b;
537 if (!b) return a;
538 return composeSubRegIndicesImpl(a, b);
539 }
具体的执行方法是composeSubRegIndicesImpl(),它由下面的RegisterInfoEmitter::runTargetDesc()代码输出的。
RegisterInfoEmitter::runTargetDesc(续)
1331 std::string ClassName = Target.getName() + "GenRegisterInfo";
1332
1333 auto SubRegIndicesSize =
1334 std::distance(SubRegIndices.begin(), SubRegIndices.end());
1335
1336 if (!SubRegIndices.empty()) {
1337 emitComposeSubRegIndices (OS, RegBank, ClassName);
1338 emitComposeSubRegIndexLaneMask (OS, RegBank, ClassName);
1339 }
输出的composeSubRegIndicesImpl()是类X86GenRegisterInfo的方法(我们以X86目标机器为例)。基类TargetRegisterInfo的这个方法是不可用的虚函数。
629 void
630 RegisterInfoEmitter::emitComposeSubRegIndices (raw_ostream &OS,
631 CodeGenRegBank &RegBank,
632 const std::string &ClName) {
633 const auto &SubRegIndices = RegBank.getSubRegIndices();
634 OS << "unsigned " << ClName
635 << "::composeSubRegIndicesImpl(unsigned IdxA, unsigned IdxB) const {\n";
636
637 // Many sub-register indexes are composition-compatible, meaning that
638 //
639 // compose(IdxA, IdxB) == compose(IdxA', IdxB)
640 //
641 // for many IdxA, IdxA' pairs. Not all sub-register indexes can be composed.
642 // The illegal entries can be use as wildcards to compress the table further.
643
644 // Map each Sub-register index to a compatible table row.
645
SmallVector
646
SmallVector
647
648 auto SubRegIndicesSize =
649 std::distance(SubRegIndices.begin(), SubRegIndices.end());
650 for ( const auto &Idx : SubRegIndices) {
651 unsigned Found = ~0u;
652 for (unsigned r = 0, re = Rows.size(); r != re; ++r) {
653 if ((&Idx, Rows[r])) {
654 Found = r;
655 break ;
656 }
657 }
658 if (Found == ~0u) {
659 Found = Rows.size();
660 Rows.resize(Found + 1);
661 Rows.back().resize(SubRegIndicesSize);
662 combine(&Idx, Rows.back());
663 }
664 RowMap.push_back(Found);
665 }
666
667 // Output the row map if there is multiple rows.
668 if (Rows.size() > 1) {
669 OS << " static const " << getMinimalTypeForRange(Rows.size()) << " RowMap["
670 << SubRegIndicesSize << "] = {\n ";
671 for (unsigned i = 0, e = SubRegIndicesSize; i != e; ++i)
672 OS << RowMap[i] << ", ";
673 OS << "\n };\n";
674 }
675
676 // Output the rows.
677 OS << " static const " << getMinimalTypeForRange(SubRegIndicesSize + 1)
678 << " Rows[" << Rows.size() << "][" << SubRegIndicesSize << "] = {\n";
679 for (unsigned r = 0, re = Rows.size(); r != re; ++r) {
680 OS << " { ";
681 for (unsigned i = 0, e = SubRegIndicesSize; i != e; ++i)
682 if (Rows[r][i])
683 OS << Rows[r][i]->EnumValue << ", ";
684 else
685 OS << "0, ";
686 OS << "},\n";
687 }
688 OS << " };\n\n";
689
690 OS << " --IdxA; assert(IdxA < " << SubRegIndicesSize << ");\n"
691 << " --IdxB; assert(IdxB < " << SubRegIndicesSize << ");\n";
692 if (Rows.size() > 1)
693 OS << " return Rows[RowMap[IdxA]][IdxB];\n";
694 else
695 OS << " return Rows[0][IdxB];\n";
696 OS << "}\n\n";
697 }
650行遍历所有的寄存器索引,容器Rows则保存着每个子寄存器索引到其合成子寄存器索引的映射(这是一个二维数组,每一行以寄存器索引的EnumValue-1为下标(这也是它在容器SubRegIndices中的索引),因此如果某个子寄存器索引存在多个复合方案,将相应存在多个行)。652行的循环检查SubRegIndices[i]是否已经出现在Rows中。combine()函数会逐个比较该索引与Rows的每一行。如果在这一行上,SubRegIndices[i]还未映射,在624行建立这个映射关系。
610 static bool combine ( const CodeGenSubRegIndex *Idx,
611
SmallVectorImpl
612 const CodeGenSubRegIndex::CompMap &Map = Idx->getComposites();
613 for ( const auto &I : Map) {
614 CodeGenSubRegIndex *&Entry = Vec[I.first->EnumValue - 1];
615 if (Entry && Entry != I.second)
616 return false;
617 }
618
619 // All entries are compatible. Make it so.
620 for ( const auto &I : Map) {
621 auto *&Entry = Vec[I.first->EnumValue - 1];
622 assert ((!Entry || Entry == I.second) &&
623 "Expected EnumValue to be unique");
624 Entry = I.second;
625 }
626 return true;
627 }
而如果与SubRegIndices[i]的映射关系没有建立或者没有映射关系,在616行返回false,继续652行循环。如果循环结束都没有找到这个映射,通过658~665行扩展Rows来新建这个映射。容器RowMap依次记录了这些子寄存器索引对应Rows的行号(664行)。如果Rows多于一行,需要输出RowMap。那么上面的代码将输出这样的代码片段(以ARM为例):
unsigned ARMGenRegisterInfo::composeSubRegIndicesImpl(unsigned IdxA, unsigned IdxB) const {
static const uint8_t RowMap[56] = {
0, 1, 2, 3, 4, 5, 6, 7, 0, 0, 0, 4, 0, 2, 4, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 5, 5, 5, 2,
};
static const uint8_t Rows[8][56] = {
{ 1, 2, 3, 4, 5, 0, 7, 0, 0, 0, 0, 0, 13, 14, 0, 0, 17, 18, 19, 20, 21, 22, 23, 24, 0, 0, 27, 28, 0, 0, 31, 32, 33, 34, 35, 36, 37, 38, 0, 0, 0, 0, 43, 0, 45, 0, 0, 0, 0, 0, 51, 0, 0, 0, 0, 0, },
{ 2, 3, 4, 5, 6, 0, 8, 0, 0, 0, 0, 0, 37, 49, 0, 0, 19, 20, 21, 22, 23, 24, 31, 32, 0, 0, 25, 26, 0, 0, 29, 30, 35, 36, 43, 44, 14, 40, 0, 0, 0, 0, 46, 0, 48, 0, 0, 0, 0, 0, 53, 0, 0, 0, 0, 0, },
{ 3, 4, 5, 6, 7, 0, 0, 0, 0, 0, 0, 0, 14, 15, 0, 0, 21, 22, 23, 24, 31, 32, 29, 30, 0, 0, 0, 0, 0, 0, 27, 28, 43, 44, 46, 47, 49, 0, 0, 0, 0, 0, 51, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, },
{ 4, 5, 6, 7, 8, 0, 0, 0, 0, 0, 0, 0, 49, 55, 0, 0, 23, 24, 31, 32, 29, 30, 27, 28, 0, 0, 0, 0, 0, 0, 25, 26, 46, 47, 51, 52, 15, 0, 0, 0, 0, 0, 53, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, },
{ 5, 6, 7, 8, 0, 0, 0, 0, 0, 0, 0, 0, 15, 16, 0, 0, 31, 32, 29, 30, 27, 28, 25, 26, 0, 0, 0, 0, 0, 0, 0, 0, 51, 52, 53, 54, 55, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, },
{ 6, 7, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 55, 0, 0, 0, 29, 30, 27, 28, 25, 26, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 53, 0, 0, 0, 16, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, },
{ 7, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 27, 28, 25, 26, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, },
{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 26, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, },
};
--IdxA; assert (IdxA < 56);
--IdxB; assert (IdxB < 56);
return Rows[RowMap[IdxA]][IdxB];
}
Rows中0表示不存在复合关系(685行)。
LLVM还提供了一个composeSubRegIndexLaneMask方法,当给定一个寄存器索引的Lane掩码与另一个寄存器索引,该方法返回相应复合寄存器索引的Lane掩码,如果存在的话。
544 unsigned composeSubRegIndexLaneMask (unsigned IdxA, unsigned LaneMask) const {
545 if (!IdxA)
546 return LaneMask;
547 return composeSubRegIndexLaneMaskImpl(IdxA, LaneMask);
548 }
同样,这个方法的实际执行者是TableGen生成的composeSubRegIndexLaneMaskImpl,它由下面的方法输出。
629 void
630 RegisterInfoEmitter::emitComposeSubRegIndexLaneMask (raw_ostream &OS,
631 CodeGenRegBank &RegBank,
632 const std::string &ClName) {
633 // See the comments in computeSubRegLaneMasks() for our goal here.
634 const auto &SubRegIndices = RegBank.getSubRegIndices();
635
636 // Create a list of Mask+Rotate operations, with equivalent entries merged.
637
SmallVector
638
SmallVector
639 for ( const auto &Idx : SubRegIndices) {
640
const
SmallVector
641 = Idx.CompositionLaneMaskTransform;
642
643 unsigned Found = ~0u;
644 unsigned SIdx = 0;
645 unsigned NextSIdx;
646 for ( size_t s = 0, se = Sequences.size(); s != se; ++s, SIdx = NextSIdx) {
647
SmallVectorImpl
648 NextSIdx = SIdx + Sequence.size() + 1;
649 if (Sequence == IdxSequence) {
650 Found = SIdx;
651 break ;
652 }
653 }
654 if (Found == ~0u) {
655 Sequences.push_back(IdxSequence);
656 Found = SIdx;
657 }
658 SubReg2SequenceIndexMap.push_back(Found);
659 }
660
661 OS << "unsigned " << ClName
662 << "::composeSubRegIndexLaneMaskImpl(unsigned IdxA, unsigned LaneMask)"
663 " const {\n";
664
665 OS << " struct MaskRolOp {\n"
666 " unsigned Mask;\n"
667 " uint8_t RotateLeft;\n"
668 " };\n"
669 " static const MaskRolOp Seqs[] = {\n";
670 unsigned Idx = 0;
671 for ( size_t s = 0, se = Sequences.size(); s != se; ++s) {
672 OS << " ";
673
const
SmallVectorImpl
674 for ( size_t p = 0, pe = Sequence.size(); p != pe; ++p) {
675 const MaskRolPair &P = Sequence[p];
676 OS << format("{ 0x%08X, %2u }, ", P.Mask, P.RotateLeft);
677 }
678 OS << "{ 0, 0 }";
679 if (s+1 != se)
680 OS << ", ";
681 OS << " // Sequence " << Idx << "\n";
682 Idx += Sequence.size() + 1;
683 }
684 OS << " };\n"
685 " static const MaskRolOp *const CompositeSequences[] = {\n";
686 for ( size_t i = 0, e = SubRegIndices.size(); i != e; ++i) {
687 OS << " ";
688 unsigned Idx = SubReg2SequenceIndexMap[i];
689 OS << format("&Seqs[%u]", Idx);
690 if (i+1 != e)
691 OS << ",";
692 OS << " // to " << SubRegIndices[i].getName() << "\n";
693 }
694 OS << " };\n\n";
695
696 OS << " --IdxA; assert(IdxA < " << SubRegIndices.size()
697 << " && \"Subregister index out of bounds\");\n"
698 " unsigned Result = 0;\n"
699 " for (const MaskRolOp *Ops = CompositeSequences[IdxA]; Ops->Mask != 0; ++Ops)"
700 " {\n"
701 " unsigned Masked = LaneMask & Ops->Mask;\n"
702 " Result |= (Masked << Ops->RotateLeft) & 0xFFFFFFFF;\n"
703 " Result |= (Masked >> ((32 - Ops->RotateLeft) & 0x1F));\n"
704 " }\n"
705 " return Result;\n"
706 "}\n";
707 }
641行CodeGenSubRegIndex的CompositionLaneMaskTransform容器记录了若干个MaskRolPair对象,每个对象记录了从伙伴索引的Lane掩码出发,经过指定位数的右移,能得到复合索引的Lane掩码(参考一节)。709行的循环遍历所有的寄存器索引,临时容器Sequences用于记录不重复的MaskRolPair序列。注意IdxSequence都不可能是空的,因为在调用方法CodeGenRegBank::computeSubRegLaneMasks时,这个方法将不存在复合索引的索引所对应的MaskRolPair对象设置为{~0u, 0}。
在741行与744行的循环里,将MaskRolPair序列的内容输出,声明为一个MaskRolPair数组Seqs,这个数组以元素{0, 0}结尾(空序列只输出{0, 0})。756行循环则是声明了一个CompositeSequences数组,显示指定索引所对应的Seqs元素。最终,我们将得到这样的方法(以ARM为例,X86的太简单、无趣了)。
unsigned ARMGenRegisterInfo::composeSubRegIndexLaneMaskImpl(unsigned IdxA, unsigned LaneMask) const {
struct MaskRolOp {
unsigned Mask;
uint8_t RotateLeft;
};
static const MaskRolOp Seqs[] = {
{ 0xFFFFFFFF, 0 }, { 0, 0 }, // Sequence 0
{ 0xFFFFFFFF, 2 }, { 0, 0 }, // Sequence 2
{ 0xFFFFFFFF, 4 }, { 0, 0 }, // Sequence 4
{ 0xFFFFFFFF, 6 }, { 0, 0 }, // Sequence 6
{ 0xFFFFFFFF, 14 }, { 0, 0 }, // Sequence 8
{ 0xFFFFFFFF, 12 }, { 0, 0 }, // Sequence 10
{ 0xFFFFFFFF, 10 }, { 0, 0 }, // Sequence 12
{ 0xFFFFFFFF, 8 }, { 0, 0 }, // Sequence 14
{ 0x0000000C, 14 }, { 0x00000030, 10 }, { 0x000000C0, 6 }, { 0x00000300, 2 }, { 0, 0 }, // Sequence 16
{ 0x0000000C, 14 }, { 0x00000030, 10 }, { 0, 0 }, // Sequence 21
{ 0x0000000C, 10 }, { 0x00000030, 6 }, { 0, 0 }, // Sequence 24
{ 0x000000CC, 2 }, { 0x00030000, 30 }, { 0, 0 }, // Sequence 27
{ 0x000000CC, 2 }, { 0x00033000, 30 }, { 0, 0 }, // Sequence 30
{ 0x000000FC, 2 }, { 0x00000300, 8 }, { 0, 0 }, / / Sequence 33
{ 0x0000000C, 4 }, { 0x000000C0, 10 }, { 0, 0 }, // Sequence 36
{ 0x0000003C, 4 }, { 0x000000C0, 10 }, { 0, 0 }, // Sequence 39
{ 0x0000000C, 4 }, { 0x000000C0, 10 }, { 0x00030000, 28 }, { 0, 0 }, // Sequence 42
{ 0x0000000C, 6 }, { 0x000000C0, 8 }, { 0, 0 }, // Sequence 46
{ 0x0000000C, 6 }, { 0x00000030, 12 }, { 0x000000C0, 8 }, { 0, 0 }, // Sequence 49
{ 0x0000000C, 6 }, { 0x000000C0, 8 }, { 0x00030000, 26 }, { 0, 0 }, // Sequence 53
{ 0x0000000C, 6 }, { 0x00000030, 12 }, { 0, 0 }, // Sequence 57
{ 0x0000000C, 6 }, { 0x00000030, 12 }, { 0x000000C0, 8 }, { 0x00000300, 4 }, { 0, 0 }, // Sequence 60
{ 0x0000000C, 14 }, { 0x000000C0, 6 }, { 0, 0 }, // Sequence 65
{ 0x0000000C, 14 }, { 0x00000030, 10 }, { 0x000000C0, 6 }, { 0, 0 }, // Sequence 68
{ 0x0000000C, 12 }, { 0x000000C0, 4 }, { 0, 0 }, // Sequence 72
{ 0x0000000C, 12 }, { 0x00000030, 8 }, { 0x000000C0, 4 }, { 0, 0 }, // Sequence 75
{ 0x0000000C, 12 }, { 0x00000030, 8 }, { 0, 0 }, // Sequence 79
{ 0x0000003C, 4 }, { 0x000000C0, 10 }, { 0x00000300, 6 }, { 0, 0 } // Sequence 82
};
static const MaskRolOp * const CompositeSequences[] = {
&Seqs[0], // to dsub_0
&Seqs[2], // to dsub_1
&Seqs[4], // to dsub_2
&Seqs[6], // to dsub_3
&Seqs[8], // to dsub_4
&Seqs[10], // to dsub_5
&Seqs[12], // to dsub_6
&Seqs[14], // to dsub_7
&Seqs[0], // to gsub_0
&Seqs[0], // to gsub_1
&Seqs[0], // to qqsub_0
&Seqs[16], // to qqsub_1
&Seqs[0], // to qsub_0
&Seqs[4], // to qsub_1
&Seqs[21] , // to qsub_2
&Seqs[24] , // to qsub_3
&Seqs[0], // to ssub_0
&Seqs[0], // to ssub_1
&Seqs[0], // to ssub_2
&Seqs[0], // to ssub_3
&Seqs[0], // to dsub_2_then_ssub_0
&Seqs[0], // to dsub_2_then_ssub_1
&Seqs[0], // to dsub_3_then_ssub_0
&Seqs[0], // to dsub_3_then_ssub_1
&Seqs[0], // to dsub_7_then_ssub_0
&Seqs[0], // to dsub_7_then_ssub_1
&Seqs[0], // to dsub_6_then_ssub_0
&Seqs[0], // to dsub_6_then_ssub_1
&Seqs[0], // to dsub_5_then_ssub_0
&Seqs[0], // to dsub_5_then_ssub_1
&Seqs[0], // to dsub_4_then_ssub_0
&Seqs[0], // to dsub_4_then_ssub_1
&Seqs[0], // to dsub_0_dsub_2
&Seqs[0], // to dsub_0_dsub_1_dsub_2
&Seqs[2], // to dsub_1_dsub_3
&Seqs[2], // to dsub_1_dsub_2_dsub_3
&Seqs[2], // to dsub_1_dsub_2
&Seqs[0], // to dsub_0_dsub_2_dsub_4
&Seqs[0], // to dsub_0_dsub_2_dsub_4_dsub_6
&Seqs[27], // to dsub_1_dsub_3_dsub_5
&Seqs[30], // to dsub_1_dsub_3_dsub_5_dsub_7
&Seqs[33], // to dsub_1_dsub_2_dsub_3_dsub_4
&Seqs[36], // to dsub_2_dsub_4
&Seqs[39], // to dsub_2_dsub_3_dsub_4
&Seqs[42], // to dsub_2_dsub_4_dsub_6
&Seqs[46], // to dsub_3_dsub_5
&Seqs[49], // to dsub_3_dsub_4_dsub_5
&Seqs[53], // to dsub_3_dsub_5_dsub_7
&Seqs[57], // to dsub_3_dsub_4
&Seqs[60], // to dsub_3_dsub_4_dsub_5_dsub_6
&Seqs[65], // to dsub_4_dsub_6
&Seqs[68], // to dsub_4_dsub_5_dsub_6
&Seqs[72], // to dsub_5_dsub_7
&Seqs[75], // to dsub_5_dsub_6_dsub_7
&Seqs[79], // to dsub_5_dsub_6
&Seqs[82] // to qsub_1_qsub_2
};
--IdxA; assert (IdxA < 56 && "Subregister index out of bounds");
unsigned Result = 0;
for ( const MaskRolOp *Ops = CompositeSequences[IdxA]; Ops->Mask != 0; ++Ops) {
unsigned Masked = LaneMask & Ops->Mask;
Result |= (Masked << Ops->RotateLeft) & 0xFFFFFFFF;
Result |= (Masked >> ((32 - Ops->RotateLeft) & 0x1F));
}
return Result;
}
Seqs[0]对应不存在复合索引的情形。如果CompositeSequences给出Seqs[0],那么在倒数第5行Result得到Masked的值,而倒数第4行的右手侧是0。因此返回的是Masked值。
有些Sequence超过两项,这些都对应一个索引能复合从多个索引的情形(只有两项的Sequence也可以对应一个索引能复合从多个索引的情形,关键看Mask有几个比特1),在上面的for循环里,实际上只有其中一个会起作用,其他都会产生0。
V7.0 还引入了 reverseComposeSubRegIndexLaneMask 方法,它是 composeSubRegIndexLaneMask 方法的逆。假设 Mask 是有效的 Lane 掩码,那么下面成立:
X0 = composeSubRegIndexLaneMask(Idx, Mask)
X1 = reverseComposeSubRegIndexLaneMask(Idx, X0)
可以推导出 X1 == Mask
611 LaneBitmask reverseComposeSubRegIndexLaneMask(unsigned IdxA,
612 LaneBitmask LaneMask) const {
613 if (!IdxA)
614 return LaneMask;
615 return reverseComposeSubRegIndexLaneMaskImpl(IdxA, LaneMask);
616 }
由 v7.0 生成的 X86 相关的函数是这样的:
struct MaskRolOp {
LaneBitmask Mask;
uint8_t RotateLeft;
};
static const MaskRolOp LaneMaskComposeSequences[] = {
{ LaneBitmask(0xFFFFFFFF), 0 }, { LaneBitmask::getNone(), 0 }, // Sequence 0
{ LaneBitmask(0xFFFFFFFF), 1 }, { LaneBitmask::getNone(), 0 }, // Sequence 2
{ LaneBitmask(0xFFFFFFFF), 2 }, { LaneBitmask::getNone(), 0 }, // Sequence 4
{ LaneBitmask(0xFFFFFFFF), 3 }, { LaneBitmask::getNone(), 0 }, // Sequence 6
{ LaneBitmask(0xFFFFFFFF), 4 }, { LaneBitmask::getNone(), 0 } // Sequence 8
};
static const MaskRolOp * const CompositeSequences[] = {
&LaneMaskComposeSequences[0], // to sub_8bit
&LaneMaskComposeSequences[2], // to sub_8bit_hi
&LaneMaskComposeSequences[4], // to sub_8bit_hi_phony
&LaneMaskComposeSequences[0], // to sub_16bit
&LaneMaskComposeSequences[6], // to sub_16bit_hi
&LaneMaskComposeSequences[0], // to sub_32bit
&LaneMaskComposeSequences[8], // to sub_xmm
&LaneMaskComposeSequences[0] // to sub_ymm
};
LaneBitmask X86GenRegisterInfo::composeSubRegIndexLaneMaskImpl(unsigned IdxA, LaneBitmask LaneMask) const {
--IdxA; assert (IdxA < 8 && "Subregister index out of bounds");
LaneBitmask Result;
for ( const MaskRolOp *Ops = CompositeSequences[IdxA]; Ops->Mask.any(); ++Ops) {
LaneBitmask::Type M = LaneMask.getAsInteger() & Ops->Mask.getAsInteger();
if (unsigned S = Ops->RotateLeft)
Result |= LaneBitmask((M << S) | (M >> (LaneBitmask::BitWidth - S)));
else
Result |= LaneBitmask(M);
}
return Result;
}
LaneBitmask X86GenRegisterInfo::reverseComposeSubRegIndexLaneMaskImpl(unsigned IdxA, LaneBitmask LaneMask) const {
LaneMask &= getSubRegIndexLaneMask(IdxA);
--IdxA; assert (IdxA < 8 && "Subregister index out of bounds");
LaneBitmask Result;
for ( const MaskRolOp *Ops = CompositeSequences[IdxA]; Ops->Mask.any(); ++Ops) {
LaneBitmask::Type M = LaneMask.getAsInteger();
if (unsigned S = Ops->RotateLeft)
Result |= LaneBitmask((M >> S) | (M << (LaneBitmask::BitWidth - S)));
else
Result |= LaneBitmask(M);
}
return Result;
}
v7.0 的 emitComposeSubRegIndexLaneMask 方法与 v3.6.1 差异较大,但代码本身不算复杂而且较大,因此不在此处列出。
3.3.6.4.4. 定义getSubClassWithSubReg()方法
RegisterInfoEmitter::runTargetDesc()接下来输出X86GenRegisterInfo方法getSubClassWithSubReg()。这个方法给定一个寄存器类与寄存器索引,返回该寄存器类支持该索引的最大寄存器子类。
RegisterInfoEmitter::runTargetDesc(续)
1341 // Emit getSubClassWithSubReg.
1342 if (!SubRegIndices.empty()) {
1343 OS << "const TargetRegisterClass *" << ClassName
1344 << "::getSubClassWithSubReg(const TargetRegisterClass *RC, unsigned Idx)"
1345 << " const {\n";
1346 // Use the smallest type that can hold a regclass ID with room for a
1347 // sentinel.
1348 if (RegisterClasses.size() < UINT8_MAX)
1349 OS << " static const uint8_t Table[";
1350 else if (RegisterClasses.size() < UINT16_MAX)
1351 OS << " static const uint16_t Table[";
1352 else
1353 PrintFatalError("Too many register classes.");
1354 OS << RegisterClasses.size() << "][" << SubRegIndicesSize << "] = {\n";
1355 for ( const auto &RC : RegisterClasses) {
1356 OS << " {\t// " << RC.getName() << "\n";
1357 for ( auto &Idx : SubRegIndices) {
1358 if (CodeGenRegisterClass *SRC = RC.getSubClassWithSubReg(&Idx))
1359 OS << " " << SRC->EnumValue + 1 << ",\t// " << Idx.getName()
1360 << " -> " << SRC->getName() << "\n";
1361 else
1362 OS << " 0,\t// " << Idx.getName() << "\n";
1363 }
1364 OS << " },\n";
1365 }
1366 OS << " };\n assert(RC && \"Missing regclass\");\n"
1367 << " if (!Idx) return RC;\n --Idx;\n"
1368 << " assert(Idx < " << SubRegIndicesSize << " && \"Bad subreg\");\n"
1369 << " unsigned TV = Table[RC->getID()][Idx];\n"
1370 << " return TV ? getRegClass(TV - 1) : nullptr;\n}\n\n";
1371 }
CodeGenRegisterClass的SubClassWithSubReg容器已经记录了所需要的信息(参见方法CodeGenRegBank::inferSubClassWithSubReg()),因此只要在1355行对CodeGenRegisterClass的遍历过程中,针对每个寄存器索引获取该容器里对应的记录就可以了。对于X86目标机器,输出的函数为:
const TargetRegisterClass *X86GenRegisterInfo::getSubClassWithSubReg( const TargetRegisterClass *RC, unsigned Idx) const {
static const uint8_t Table[80][6] = {
{ // GR8
0, // sub_8bit
0, // sub_8bit_hi
0, // sub_16bit
0, // sub_32bit
0, // sub_xmm
0, // sub_ymm
},
…
{ // GR16_ABCD
18, // sub_8bit -> GR16_ABCD
18, // sub_8bit_hi -> GR16_ABCD
0, // sub_16bit
0, // sub_32bit
0, // sub_xmm
0, // sub_ymm
},
…
{ // VR512_with_sub_xmm_in_FR32
0, // sub_8bit
0, // sub_8bit_hi
0, // sub_16bit
0, // sub_32bit
80, // sub_xmm -> VR512_with_sub_xmm_in_FR32
80, // sub_ymm -> VR512_with_sub_xmm_in_FR32
},
};
assert (RC && "Missing regclass");
if (!Idx) return RC;
--Idx;
assert (Idx < 6 && "Bad subreg");
unsigned TV = Table[RC->getID()][Idx];
return TV ? getRegClass(TV - 1) : nullptr;
}
这个方法太大了,我们省略了Table数组的部分定义。在数组定义里注释给出的是对应的寄存器索引,如果是0,表明该寄存器类别没有支持该索引的子类,否则注释会进一步给出这个子类的名字。比如,GR16_ABCD部分,它的前两项分别对应寄存器类GR8与GR8_NOREX,18-1 = 17是GR16_ABCD在RegisterClasses容器的下标(由getRegClass方法获取)。
以上所述就是小编给大家介绍的《LLVM学习笔记(14)补1》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!
猜你喜欢:- 【每日笔记】【Go学习笔记】2019-01-04 Codis笔记
- 【每日笔记】【Go学习笔记】2019-01-02 Codis笔记
- 【每日笔记】【Go学习笔记】2019-01-07 Codis笔记
- Golang学习笔记-调度器学习
- Vue学习笔记(二)------axios学习
- 算法/NLP/深度学习/机器学习面试笔记
本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
Powerful
Patty McCord / Missionday / 2018-1-25
Named by The Washington Post as one of the 11 Leadership Books to Read in 2018 When it comes to recruiting, motivating, and creating great teams, Patty McCord says most companies have it all wrong. Mc......一起来看看 《Powerful》 这本书的介绍吧!