Was Knuth Really Framed by Jon Bentley?

栏目: IT技术 · 发布时间: 6年前

内容简介:Recently, the formal methods specialistHere is the commented version of the original pipeline that McIlroy devised.And here is the version solving the problem that Hillel Wayne claimed would be difficult to solve with a Unix pipeline. It turns out that thi

Recently, the formal methods specialist Hillel Wayne posted an interesting article discussing whether Donald Knuth was actually framed when Jon Bentley asked him to demonstrate literate programming . (Knuth came up with an 8-page long monolithic listing, whereas in a critique Doug McIlroy provided a six line shell script.) The article makes many interesting and valid points. However, among the raised points one is that the specified problem was ideal for solving with Unix tools, and that a different problem, such as “find the top K pairs of words and print the Levenshtein distance between each pair", would be much more difficult to solve with Unix commands. As the developer of an edX massive open open online course (MOOC) on the use of Unix Tools for data, software and production engineering I decided to put this claim to test.

Here is the commented version of the original pipeline that McIlroy devised.

# Split text into words by replacing non-word characters with newlines
tr -cs A-Za-z '\n' |
# Convert uppercase to lowercase
tr A-Z a-z |
# Sort so that identical words occur adjacently
sort |
# Count occurrences of each line
uniq -c |
# Sort numerically by decreasing number of word occurrences
sort -rn |
# Quit after printing the K specified number of words
sed ${1}q

And here is the version solving the problem that Hillel Wayne claimed would be difficult to solve with a Unix pipeline. It turns out that this can also be done in a pipeline of just nine (non commented) lines.

# Split text into words by replacing non-word characters with newlines
tr -cs A-Za-z '\n' |
# Convert uppercase to lowercase
tr A-Z a-z |
# Make pairs out of words by testing and storing the previous word
awk 'prev {print prev, $1} {prev = $1}' |
# Sort so that identical words occur adjacently
sort |
# Count occurrences of each line
uniq -c |
# Sort numerically by decreasing number of word occurrences
sort -nr |
# Print the K specified number of pairs
head -n $1 |
# Remove the occurrence count, keeping the two words
awk '{print $2, $3}' |
# Print the Levenshtein distance between word pair (autosplit into @F)
perl -a -MText::LevenshteinXS -e 'print distance(@F), "\n"'

One may claim that I cheated above by invoking Perl and using the Text::LevenshteinXS module. But the reuse of existing tools, rather than the building of monoliths is exactly the Unix command line philosophy. In fact, one of the reasons I sometimes prefer using Perl over Python is that it's very easy to incorporate into modular Unix tool pipelines. In contrast, Python encourages the creation of monoliths of the type McIlroy criticized.

Regarding my choice of awk for obtaining word pairs, note that this can also be done with the command sed -n 'H;x;s/\n/ /;p;s/.* //;x' . However, I find the awk version much more readable.

Through this demonstration I haven't proven that Bentley didn't frame Knuth; it seems that at some point McIlroy admitted that the criticism was unfair. However, I did show that a counter-example chosen specifically to demonstrate the limits of the Unix pipeline processing power, is in fact quite easy to implement with just three additional commands. So my claim is that the power of the Unix tools is often vastly underestimated.

In my everyday work, I use Unix commands many times daily to perform diverse and very different tasks. I very rarely encounter tasks that cannot be solved by joining together a couple of commands. The automated editing of a course's videos and animations was such a task. Even in those cases, what I typically do is write a small script or program in order to complement a Unix tools pipeline or make -based workflow.

Read and post comments , or share through   

以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

离散数学及其应用(原书第6版·本科教学版)

离散数学及其应用(原书第6版·本科教学版)

[美] Kenneth H. Rosen / 袁崇义、屈婉玲、张桂芸 / 机械工业出版社 / 2011-11 / 49.00元

《离散数学及其应用》一书是介绍离散数学理论和方法的经典教材,已经成为采用率最高的离散数学教材,仅在美国就被600多所高校用作教材,并获得了极大的成功。第6版在前5版的基础上做了大量的改进,使其成为更有效的教学工具。 本书基于该书第6版进行改编,保留了国内离散数学课程涉及的基本内容,更加适合作为国内高校计算机及相关专业本科生的离散数学课程教材。本书的具体改编情况如下: · 补充了关于范式......一起来看看 《离散数学及其应用(原书第6版·本科教学版)》 这本书的介绍吧!

RGB转16进制工具
RGB转16进制工具

RGB HEX 互转工具

图片转BASE64编码
图片转BASE64编码

在线图片转Base64编码工具

Base64 编码/解码
Base64 编码/解码

Base64 编码/解码