内容简介:2020/08/02tldr; Unified function calling syntax (UFCS) is useful and elegant. I’ve implemented a variant of UFCS that resembles C#’s “extension methods”, in Clang, which you can check out atProposals for UFCS in C++ has been a somewhat perennial discussion
2020/08/02
tldr; Unified function calling syntax (UFCS) is useful and elegant. I’ve implemented a variant of UFCS that resembles C#’s “extension methods”, in Clang, which you can check out at https://github.com/dancrn/llvm-project .
Outline
Proposals for UFCS in C++ has been a somewhat perennial discussion ( N1585 , N4165 , N4174 , N4474 , P0079R0 ), with seemingly positive discussion from many, including both Herb Sutter and Bjarne Stroustrup. C# has a take on UFCS called extension methods, and personally, I’ve found them to be overwhelmingly useful. If you’re unfamiliar C#’s extension methods, they look something like this:
public static class Extensions { public static string ValueOrDefault(this string input, string defaultValue) { return String.IsNullOrWhiteSpace(input) switch { true => defaultValue, false => input }; } } public string GetValue(string str) { return str.ValueOrDefault("No value provided"); }
Extension methods, when invoked, look like first class methods on the type they are defined on. The example provided doesn’t have much benefit over the existing free-form call (i.e. Extensions.GetValue(what, "nothing")
- and this still is also a valid way of calling that method). However, static extension methods come into their own when viewed as generic approaches to extending already existing classes that cannot be modified. One such C# library is LanguageExt
, which provides functional extensions to the base IEnumerable
interface (amongst many
other things), although there are a lot of other examples that extend other commonly used libraries.
Unfortunately, whilst proposals resurface every once in a while, activity on unified call syntax seem to have stagnated. I want to see what it takes to implement it, and who knows, if enough people like and use UFCS, it might make it into.. C++30, maybe?
UFCS models
Revzin has an excellent couple of articles that describe UFCS more generally. Essentially, UFCS can be split into two categories of behaviors: “candidate set” functionality describes which functions are considered for a particular invocation style, and “overload resolution” approaches that describe how to determine which member or function should be chosen when there is more than one candidate. Without repeating those descriptions, this model can be considered to be CS4 - the addition of syntax to indicate UFCS candidacy - and OR2 - perform overload resolution as normal with all candidates. I wont spend too much time going into why I’ve made these choices, but briefly:
Choice of CS4
Whilst not strictly the “most pure” decision, I think that it’s sensible to allow users to specify which functions they intend to be used in overload resolution. And, whilst not strictly speaking a priority, keeping the candidate set as small as possible would be beneficial from the perspective of compilation times. It also allows UFCS to be backward compatible with existing code, means that I think this is the most sensible approach to take.
UFCS syntax additions
There are two obvious ways of using the this
keyword to indicate UFCS candidacy, as a parameter qualifier, or as a parameter name - for anyone familiar with C#, it’s clear that CS4, along with a this
parameter qualifier was style chosen when implementing its idea of UFCS. There are many sensible choices for syntax additions, but these where two I considered. These both look like:
// 1. 'this' parameter name int func(const std::string& this); // 2. 'this' qualifier int func(this const std::string& param);
There are some drawbacks to option 1.:
-
The implicit
this
value generally has access to private & protected members of a class, members that UFCS functions would not have access to. -
The parameter type was chosen to demonstrate an inconsistency:
this
, when used in a member function is generally considered to be a pointer, i.e., we usethis->value
rather thanthis.value
. What should we accept for UFCS functions?
The alternative has a couple of (admittedly smaller) issues:
- UFCS candidacy is most easily seen as a is a property of a function, not a parameter - why change the parameter?
-
this
is a keyword that generally represents a value, and whilst C++ has repurposed keywords before -auto
- this might not be desirable.
In the end, the first option seemed to present more questions than it answered, so I opted for the second alternative.
Choice of OR2
I think the worst case scenario for UFCS would be one where the member functions of a class change, masking a UFCS call in a another part of the code that interacts with values of that type. For instance, consider the following case:
// from include <some/library.h>, // context is defined with a single "read from file" function. class context { public: int read_from_fd(int fd); }; // and in consuming code, has the following extension defined int read_from_file(this context& ctx, FILE *fp);
Now, the library is updated to a later version, which includes its own read_from_file
method:
// from include <some/library.h>, class context { public: int read_from_fd(int fd); int read_from_file(FILE *fp); }; // this function can only be used with regular function call syntax int read_from_file(this context& ctx, FILE *fp);
Preferring member calls over UFCS calls in this case would silently change behaviour of this code, without any obvious change to the read_from_file
method. Broadly speaking, I don’t think it’s sensible to prefer one type of call over the other, so this would seem to rule out any form of overload resolution that has preference for one type of call over the other. Therefore OR1 and OR2+ don’t seem like the best approaches, and the choice of CS4 rules out OR3 from being an option. In my implementation, any ambiguity between calls is treated as an error, as it is now.
It should be noted that the choice of OR2 is in contrast with C#’s extension methods, where, in the case of ambiguity between a UFCS candidate and a member function, the member function is always chosen (i.e., C# uses OR2+).
UFCS for C++
In summary, the following is what I’m going to be implementing:
-
this
precedes the declaration specifiers (const
,volatile
, etc.) of a file/namespace scoped function’s first parameter. - Class methods cannot be defined to be UFCS candidates (although that could probably be relaxed for non-instance methods).
-
Calls of the form
x.f(y)
, in addition to performing member lookup, also perform name lookup for functions of namef
, and overload resolution with argumentsx
, andy
. - Overload resolution proceeds as normal, i.e., if the candidate set contains a class method and a UFCS candidate, then there is no preferential treatment of either, and this is an error.
An example
In summary, we will be able to define functions that appear to be methods defined on a class as such:
class foo { private: std::string m_bar; public: foo(const std::string& bar): m_bar(bar) { } std::string get_bar() const noexcept { return m_bar; } }; int get_bar_length(this const foo& val) { return val.get_bar().length(); }
And using these methods looks like:
void f1() { auto val = foo("pasta"); // the two calls are semantically identical assert(val.get_bar_length() == get_bar_length(val)); }
Clang
I’ll forgo an introduction to Clang here - I expect any readers will be familiar with it. I’ve been motivated to start with Clang rather than GCC primarily because of Saar Raz’s story on getting behind Clang’s implementation of concepts . In any case, Clang seems like a suitable basis for implementation:
- Clang is actively maintained with hundreds of contributers,
- code quality in Clang is widely regarded to be clean and consistent,
- acceptance into Clang, if it were to happen, may encourage discussion on UFCS, and
- it could be fun :)
Implementation
I started UFCS in clang “for real” in around April of this year, although I had been reading and thinking about it on and off probably since September of 2019. In general I thought the code quality in Clang was decent, and whilst the learning curve was probably the steepest I’ve ever encountered, I was impressed with how little you needed to fully understand to make something work - the code is truly quite modular. That said, getting something working versus something that is complete requires understanding very large regions of code. Parsing C++ is what a lot of people would consider to be exotic, and so small changes in one place can have effects in places that you would not expect.
As it stands, I have a working implementation that passes all the tests in make clang-test
. Of course, ‘Parse’ and ‘SemaCXX’ tests have been added, cxx-ufcs.cpp
and unified-call-syntax.cpp
respectively. I’ve added appropriate additional diagnostic messages (albeit as parser errors, rather than semantic analysis errors), though there are some others that I would like to add in. I’ve tested my custom version of Clang on a few projects, and it seems to work as expected, too. Overall, I’m quite satisfied with how it’s turned out, and I (naively) hope someone other than myself will give it a go :)
Using UFCS
If you want to try UFCS, then you can checkout and build Clang from here
, there’s nothing extra to configure (although I recommend you don’t install it in the default prefix!). To enable UFCS, you’ll need to pass an additional argument to Clang when invoking it, -fufcs
. The front end driver hasn’t been changed at all, so you’ll most likely need to pass it through to the compiler manually:
$ /path/to/clang -Xclang -fufcs file.cpp
Again, given the design of this implementation, there shouldn’t be any issues with compiling existing code. If this is not a case, then feel free to create an issue on GitHub!
Remaining Work
Whilst I’m moderately confident that my changes work as intended, I do not consider this to be “done”. There are a few things that feel not quite right, and, even if this is never merged into clang (which is perhaps a bit hopeful..), I’d like to do it “right”.
General stability
Whilst the changes to Clang to support UFCS aren’t very much, it remains less tested than I’d like. I would definitely not recommend using this in any form of production code :)
Changes to FunctionDeclBits
Part of my changes add another bit into this bitfield, which specifies if the function declaration is a UFCS candidate or not. This is undesirable as it pushes some other dependent types 1 bit over their 8 byte limit. Having read about why this was done, it seems like this is not something I want to stick with. Instead, I think creating a new function declaration type derived from FunctionDecl
is probably a better approach.
Parser work
The syntax changes proposed for UFCS candidacy have meant that the parser needed to be changed in order to support it. That said, the parser is doing a few checks that I think are probably better suited to be done during semantic analysis.
Explicit namespace qualification for UFCS candidates
It should probably be allowable to explicitly namespace qualify a UFCS candidate. I haven’t implemented this at all currently. It would allow for usages like:
namespace ext { int get_bar(this const foo& x) { // ... } } int func() { return foo("x").ext::get_bar(); }
Which could be useful for explicit masking of class methods.
Additional Diagnostics
One of the great features of clang is the lengths that its engineers go to toward producing useful error messages. One thing that I think is a bit lacking is warnings: As it stands, you can write a UFCS candidate function that would mask (and hence make ambiguous) a member function. I don’t think this should be an error to do so (maybe it should be?), but it would be nice to at least emit a warning if this was the case:
class foo { int bar(); // note: defined here }; int bar(this foo& f); // warning: UFCS candidate will not mask class method 'foo::bar()'
Wrapping up
C++ is hands-down my favourite language, and getting into the code of a highly popular compiler implementation and modifying it to extend it my own way has been as fun as it has been challenging. Also I have a new appreciation of how damn hard it is to parse C++ :) If anyone feels like checking out the code on github, it can be found here , I’d be super happy for any feedback anyone has, along with comments or suggestions on how I could improve it.
以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
算法详解(卷1)——算法基础
[美]蒂姆·拉夫加登(Tim Roughgarden) / 徐波 / 人民邮电出版社 / 2019-1-1 / 49
算法是计算机科学领域最重要的基石之一。算法是程序的灵魂,只有掌握了算法,才能轻松地驾驭程序开发。 算法详解系列图书共有4卷,本书是第1卷——算法基础。本书共有6章,主要介绍了4个主题,它们分别是渐进性分析和大O表示法、分治算法和主方法、随机化算法以及排序和选择。附录A和附录B简单介绍了数据归纳法和离散概率的相关知识。本书的每一章均有小测验、章末习题和编程题,这为读者的自我检查以及进一步学习提......一起来看看 《算法详解(卷1)——算法基础》 这本书的介绍吧!