The tiniest C sort function? (2008)

栏目: IT技术 · 发布时间: 4年前

内容简介:I wondered how small a sort program one can write in C. The best result to date, after investigating several distinctly different algorithms, tweaking a lot, and getting major help from others, is a 61-byte function at step 23. Can you beat it? Or tie it,

The tiniest C sort function?

I wondered how small a sort program one can write in C. The best result to date, after investigating several distinctly different algorithms, tweaking a lot, and getting major help from others, is a 61-byte function at step 23. Can you beat it? Or tie it, with subexponential running time?

<big>  s(a,n)int*a;{n-->1?s(a,n),s(a+1,n),n=*a,*a=a[1],a[n>*a]=n:0;}
</big>
    GCC warning: some of the programs below need option -std=c99 .

    HTML warning. To preserve verbatim C source, naked > symbols have been left in function definitions instead of the proper HTML escape sequence. Internet Explorer, Safari, and Firefox tolerate the lapse.

My original intuition was that something really short would be really cryptic; and much of the code below is. But the current winner, while unconventional in using a conditional expression instead of if , is very clean, with only one deep-C trick—the conditional subscript. The code is "oblivious", doing comparisons in the same order on every input of a given length; and its running time is extraordinary!

Comparison flaw

A technical flaw (pointed out by Bill Mckeeman, who credits Steve Johnson) afflicts many of the following functions, in which pointer b (or c ) can take on the value a-1 . The C standard does not guarantee the result of comparisons of pointers p related to an array a of length n for p outside the range [ a,a+n

]. Code that suffers from this defect is marked CF.

One example of what could go wrong is that if the array address a happens to be 0, then a-1 may compare high to a . However, the situation of an array being allocated at address 0 won't happen in implementations of C that take address 0 to be the null pointer. Trouble could also arise for pointers represented as base plus (nonnegative) offset.

In some cases the flaw only affects empty arrays and can be dispelled by the classic maneuver of declaring it a feature, not a bug: change the precondition to require n>0 .

    Selection sort

    .
  1. It all began with a vanilla selection sort, the precondition for which is a!=NULL&&n>=0 . The only trickery here is routine: eliminating curly brackets by using commas instead of semicolons. The length is 93 bytes, excluding newlines. (CF)
    <big>void sort(int*a,int n){int*b,*c,t;for(b=a+n;--b>a;)
    for(c=b;--c>=a;)if(*c>*b)t=*b,*b=*c,*c=t;}
    </big>
  2. Ditch a separate declaration by using C++/C99 loop initialization; and can variable t by reusing n . (87 bytes, CF)
    <big>void sort(int*a,int n){for(int*b=a+n,*c;--b>a;)
    for(c=b;c-->a;)if(*c>*b)n=*b,*b=*c,*c=n;}
    </big>
  3. Save the calculation of b by changing signature to sort ( first,last+1 ). (83 bytes, CF)
    <big>void sort(int*a,int*b){for(int*c,t;--b>a;)
    for(c=b;c-->a;)if(*c>*b)t=*b,*b=*c,*c=t;}
    </big>
  4. Peter McIlroy moved the declarations of c and t . (82 bytes, CF)
    <big>void sort(int*a,int*b){while(--b>a)
    for(int*c=b,t;c-->a;)if(*c>*b)t=*b,*b=*c,*c=t;}
    </big>
  5. Now something wild: put both loop tests in one for . (81 bytes, CF)
    Some bootless work gets done. The inner loop test comes first, and fails the first time through. Also the if fails the first time it's executed in each iteration of the outer loop.
    <big>void sort(int*a,int*b){for(int*c=a,t;c-->a
    ||(c=--b)>a;)if(*c>*b)t=*b,*b=*c,*c=t;}
    </big>
  6. Peter takes out a common subexpression, *b , by embedding an assignment in a comparison. (80 bytes, CF)
    <big>void sort(int*a,int*b){for(int*c=a,t;c-->a
    ||(c=--b)>a;)if(*c>(t=*b))*b=*c,*c=t;}
    </big>

    Weed out inversions

  7. Do away with one loop. Sweep the whole array, looking for an inversion. If one is found, reverse it and bump the decreasing loop index back to the top. (79 bytes, CF)
    This turns O(N 2 ) behavior into O(N 3 ). But speed isn't our objective, and O(N 2 ) is shabby itself.
    <big>void sort(int*a,int*b){for(int*c=b,t;--c>a;)
    if(t=c[-1],t>*c)c[-1]=*c,*c=t,c=b;}
    </big>
  8. Shorten the name. (76 bytes, CF)
    <big>void s(int*a,int*b){for(int*c=b,t;--c>a;)
    if(t=c[-1],t>*c)c[-1]=*c,*c=t,c=b;}
    </big>
  9. Convert from first,last+1 to first,last to get rid of a decrement operator. Use decrement and subscript [1] instead of two subscripts [-1] . (72 bytes, CF)
    <big>void s(int*a,int*b){for(int*c=b,t;c>a;)
    if(t=*c--,*c>t)c[1]=*c,*c=t,c=b;}
    </big>
  10. Use obsolescent (pre- void ) syntax. (68 bytes)
    <big>s(a,b)int*a,*b;{for(int*c=b,t;c>a;)
    if(t=*c--,*c>t)c[1]=*c,*c=t,c=b;}
    </big>
  11. Live dangerously. Same as step 10, but with modern parameter declarations. (67 bytes, CF)
    Actually I wrote 11 before 10, but erroneously assumed I had to revert to ancient syntax in order to have an int function that doesn't return a value. In fact this is legal; the standard merely forbids use of the missing result.
    <big>s(int*a,int*b){for(int*c=b,t;c>a;)
    if(t=*c--,*c>t)c[1]=*c,*c=t,c=b;}
    </big>
  12. Jeremy Yallop replaced the if with a conditional expression. (66 bytes, CF)

    The if had no else , so the second branch ( :0 ) in the conditional is useless code. Still the trick saves a byte.

    The nesting of commas illustrates an asymmetry of the two branches of a conditional: only the first can contain commas.

    <big>s(int*a,int*b){for(int*c=b,t;c>a;)
    t=*c--,*c>t?c[1]=*c,*c=t,c=b:0;}
    </big>
  13. Then he shed a comma by moving all but the first comma-clause from the body of the for into the "increment" clause. (65 bytes, CF)
    <big>s(int*a,int*b){for(int*c=b,t;c>a;
    *c>t?c[1]=*c,*c=t,c=b:0)t=*c--;}
    </big>

    Cheating

  14. Grzegorz Lukasik observed that at least one compiler (gcc) will infer type specifiers in global object declarations. (64 bytes, CF)
    Error: allowed in the early days of C, this behavior does not comply with current standards. If the code compiles, though, it is likely to work.
    <big>*c,t;s(int*a,int*b){for(c=b;c>a;
    *c>t?c[1]=*c,*c=t,c=b:0)t=*c--;}
    </big>

    Recursion

  15. In this approach the first recursive call sorts all but the first element of the array. Then if the first element isn't smallest, it's swapped with the smallest and the rest of the array is sorted again. The signature is s ( first,last+1 ). (68 bytes)

    With two recursive calls in the body, the code may look as if its running time is O(2 N ), but more detailed analysis reveals it's O(N 3 ), the same as for the previous program.

    Flaw: one element is accessed even if the array is empty.

    <big>s(int*a,int*b){int t=*a;b>++a?s(a,b),
    t>*a?a[-1]=*a,*a=t,s(a,b):0:0;}
    </big>
  16. Several suggestions from a blog discussion can be combined to make a little progress. First, change the signature to s( first,length,dummy ) . (68 bytes)
    This code doesn't do all its own work, as it depends on the caller to allocate temporary storage for it by supplying a dummy argument. Flaws: accesses one element beyond end of array and goes wild on empty arrays.
    <big>s(a,n,t)int*a;{t=*a++;--n?s(a,n,0),
    t>*a?a[-1]=*a,*a=t,s(a,n,0):0:0;}
    </big>
  17. Then suppress the increment of a to facilitate a remarkable conditional swap in place of a conditional expression. Change the guard, --n ⇒ n-- ), to tame one flaw. (67 bytes)
    Big effect of a "small" change: eliminating the conditional expression causes the running time to explode to O (2 N ) .
    Flaw: accesses one element beyond end of array.
    <big>s(a,n,t)int*a;{t=*a;n--?s(a+1,n,0),
    *a=a[1],a[t>*a]=t,s(a+1,n,0):0;}
    </big>
  18. Get rid of the dummy-argument zeroes by putting useful code in their place. (62 bytes)

    The trick was pioneered by a blog commenter who put the first recursive call in the dummy argument of the second:

      s(a,n,t)int*a;{n--&&s(a,n,a[t>(*a=1[s(a+1,n),a])]=t=*a);} Alas, this valiant offering overreaches and stumbles into severely undefined behaviors: calling with the wrong number of arguments, order of side-effects in an expresssion. As the code happened to compile for me, it propagated a[n] throughout the array.

    Flaw (below): accesses one element beyond end of nonempty array.
    <big>s(a,n,t)int*a;{n--?s(a+1,n,t=*a),s(a+1,n,a[t>(*a=a[1])]=t):0;}
    </big>
  19. Correct the flaw by strengthening the guard. (64 bytes)
    <big>s(a,n,t)int*a;{n-->1?s(a+1,n,t=*a),s(a+1,n,a[t>(*a=a[1])]=t):0;}
    </big>

    Hybrid

  20. Ben Carter saw past the ingrained habit of swapping adjacent elements and abolished subscripts by swapping data via existing pointers. Make the last element nonminimal. Bring the minimal element to the front by sorting all but the last, then sort all but the first. The signature is s( first,last ) and the running time is O (2 N ). (70 bytes, CF)
    <big>s(int*a,int*b){int t;if(b>a)t=*a,t>*b?*a=*b,*b=t:0,
    s(a++,b-1),s(a,b);}
    </big>
  21. Replace tail recursion with iteration and split the conditional-swap sequence between the increment clause and the body of the for statement. (64 bytes, CF)
    <big>s(int*a,int*b){for(int t;b>a;t>*b?*a=*b,*b=t:0,s(a++,b-1))t=*a;}
    </big>
  22. Displace the useless 0 with the recursive call. (62 bytes, CF)
    Due to exponential running time, the cost of this change (which causes the tests to be reexecuted after a swap) is less than that of lengthening the array by 1.
    <big>s(int*a,int*b){for(int t;b>a;t>*b?*a=*b,*b=t:s(a++,b-1))t=*a;}
    </big>
  23. Returning to step 19, Ben put the sorts before the swap, which allows t to be eliminated. The two recursive calls order the array unless the minimal element is last, in which case the minimal element ends up in second place and gets swapped home with conditional-swap code from step 17. The signature is s( first,length ) . (61 bytes)
    <big>s(a,n)int*a;{n-->1?s(a,n),s(a+1,n),n=*a,*a=a[1],a[n>*a]=n:0;}
    </big>

    Cheating again

  24. Silas Poulson exploited a recent relaxation in gcc to drop the only keyword, int . (56 bytes)
    Error: as in step 10, this nonstandard program is likely to work if a compiler accepts it.
    <big>s(*a,n){n-->1?s(a,n),s(a+1,n),n=*a,*a=a[1],a[n>*a]=n:0;}
    </big>

Originally written March 2008. "Recursion" section soon shortened, and corrected with regard to asymptotic running time.

Steps 12,13,14,16,17 added January 2010.

Steps 18 and 19 added October 2010.

Steps 20-23 and special treatment of "Comparison flaw" added November 2010.

Step 24 added and some words tweaked April 2017.


以上所述就是小编给大家介绍的《The tiniest C sort function? (2008)》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

数据结构与算法分析

数据结构与算法分析

Mark Allen Weiss / 冯舜玺 / 电子工业出版社 / 2016-8 / 89.00元

本书是数据结构和算法分析的经典教材,书中使用主流的程序设计语言C++作为具体的实现语言。书中内容包括表、栈、队列、树、散列表、优先队列、排序、不相交集算法、图论算法、算法分析、算法设计、摊还分析、查找树算法、k-d树和配对堆等。本书把算法分析与C++程序的开发有机地结合起来,深入分析每种算法,内容全面、缜密严格,并细致讲解精心构造程序的方法。一起来看看 《数据结构与算法分析》 这本书的介绍吧!

HTML 压缩/解压工具
HTML 压缩/解压工具

在线压缩/解压 HTML 代码

CSS 压缩/解压工具
CSS 压缩/解压工具

在线压缩/解压 CSS 代码

MD5 加密
MD5 加密

MD5 加密工具