What makes a word beautiful?

栏目: IT技术 · 发布时间: 4年前

内容简介：What makes a word beautiful? Answering this question is not easy because of the inherent complexity and ambiguity in defining what it means to be beautiful. Let's tackle the question with a quantitative approach introducing thewhere

What makes a word beautiful? Answering this question is not easy because of the inherent complexity and ambiguity in defining what it means to be beautiful. Let's tackle the question with a quantitative approach introducing the Aesthetic Potential , a metric that aims to quantify the beaty of a word w as follows:

where w ⁺ is a word labelled as beautifu, w ^- as ugly and the function s is a similarity function between two words. In a nutshell, AP is the difference of the average similarity to beautiful words minus the average similarity to ugly words. This metric is positive for beautiful words and negative for ugly ones.

Before we can compute the Aesthetic Potential we need a similarity function s and a set of words labeled as beautiful and ugly. The similarity function that we will use considers the similarity of two words as the maximum Lin similarity between all the synonyms in WordNet

of the two words in input (I will not introduce WordNet or the Lin similarity for brevity, but the curious reader is invited to follow the links above). Here's the Python implementation:

import numpy as np
from itertools import product
from nltk.corpus import wordnet, wordnet_ic
brown_ic = wordnet_ic.ic('ic-brown.dat')

def similarity(word1, word2):
    """
    returns the similarity between word1 and word2 as the maximum
    Lin similarity between all the synsets of the two words.
    """
    syns1 = wordnet.synsets(word1)
    syns2 = wordnet.synsets(word2)
    sims = []
    for sense1, sense2 in product(syns1, syns2):
        if sense1._pos == sense2._pos and not sense1._pos in ['a', 'r', 's']:
            d = wordnet.lin_similarity(sense1, sense2, brown_ic)
            sims.append(d)            
    if len(sims) > 0 or not np.all(np.isnan(sims)):        
        return np.nanmax(sims)
    return 0 # no similarity

print('s(cat, dog) =', similarity('cat', 'dog'))
print('s(cat, bean) = ', similarity('cat', 'bean'))
print('s(coffee, bean) = ', similarity('coffee', 'bean'))

s(cat, dog) = 0.8768009843733973
s(cat, bean) = 0.3079964716744931
s(coffee, bean) = 0.788150820826125

This function returns a value between 0 and 1. High values indicate that the two words are highly similar and low values indicate that there's no similarity. Looking at the output of the function three pairs of test words we note that the function considers "cat" and "dog" fairly similar while "dog" and "bean" not similar. Finally, "coffee" and "bean" are considered similar but not as similar as "cat" and "dog".

Now we need some words labeled as beautiful and some as ugly. Here I propose two lists of words inspired by the ones used in (Jacobs, 2017)

for the German language:

beauty = ['amuse',  'art', 'attractive',
          'authentic', 'beautiful', 'beauty',
          'bliss', 'cheerful', 'culture',
          'delight', 'emotion', 'enjoyment',
          'enthusiasm', 'excellent', 'excited',
          'fascinate', 'fascination', 'flower',
          'fragrance', 'good', 'grace',
          'graceful', 'happy', 'heal',
          'health', 'healthy', 'heart',
          'heavenly', 'hope', 'inspire',
          'light', 'joy', 'love',
          'lovely', 'lullaby', 'lucent',
          'loving', 'luck', 'magnificent',
          'music', 'muse', 'life',
          'paradise', 'perfect', 'perfection',
          'picturesque', 'pleasure',
          'poetic', 'poetry', 'pretty',
          'protect', 'protection',
          'rich', 'spring', 'smile',
          'summer', 'sun', 'surprise',          
          'wealth', 'wonderful']

ugly = ['abuse', 'anger', 'imposition', 'anxiety',
        'awkward', 'bad', 'unlucky', 'blind',
        'chaotic', 'crash', 'crazy',
        'cynical', 'dark', 'disease',
        'deadly', 'decrepit', 'death',
        'despair', 'despise', 'disgust',
        'dispute', 'depression', 'dull',
        'evil', 'fail', 'hate',
        'hideous', 'horrible', 'horror',
        'haunted', 'illness', 'junk',
        'kill', 'less',
        'malicious', 'misery', 'murder',
        'nasty', 'nausea', 'pain',
        'piss', 'poor', 'poverty',
        'puke', 'punishment', 'rot',
        'scam', 'scare', 'shame',
        'spoil', 'spite', 'slaughter',
        'stink', 'terrible', 'trash',
        'trouble', 'ugliness', 'ugly',
        'unattractive', 'virus']

A remark is necessary here. The AP strongly depends on these two lists and the fact that I made them on my own strongly biases the results towards my personal preferences. If you're interested on a more general approach to label your data, the work published by Westbury et all in 2014 is a good place to start.

We now have all the pieces to compute our Aesthetic Potential:

def aesthetic_potential(word, beauty, ugly):
    """
    returns the aesthetic potential of word
    beauty and ugly must be lists of words
    labelled as beautiful and ugly respectively
    """
    b = np.nanmean([similarity(word, w) for w in beauty])
    u = np.nanmean([similarity(word, w) for w in ugly])
    return (b - u)*100

print('AP(smile) =', aesthetic_potential('smile', beauty, ugly))
print('AP(conjuncture) =', aesthetic_potential('conjuncture', beauty, ugly))
print('AP(hassle) =', aesthetic_potential('hassle', beauty, ugly))

AP(smile) = 2.6615214570040195
AP(conjuncture) = -3.418813636728729e-299
AP(hassle) = -2.7675826881674497

It is a direct implementation of the equation introduced above, the only difference is that the result is multiplied by 100 to have the metric in percentage for readability purposes. Looking at the results we see that the metric is positive for the word "smile", indicating that the word tends toward the beauty side. It's negative for "hassle", meaning it tends to the ugly side. It's 0 for "conjuncture", meaning that we can consider it a neutral word. To better understand these results we can compute the metric for a set of words and plot it agains the probability of a value of the metric:

test_words = ['hate', 'rain', #'snow', 
         'earth', 'love', 'child', #'clarinettist',
         'sun', 'patience', #'smile', 'touch',
         'coffee', 'regret', #'shepherd', 'man',
         'depression', 'obscure', 'bat', 'woman',
         'dull', 'nothing', 'disillusion',
         'abort', 'blurred', 'cruelness', #'hassle',
         'stalking', 'relevance', #'infected', 
         'conjuncture', 'god', 'moon', #'tortoise',
         'humorist', 'idea', 'poisoning']

ap = [aesthetic_potential(w.lower(), beauty, ugly) for w in test_words]

from scipy.stats import norm
import matplotlib.pyplot as plt
from matplotlib.colors import to_hex, LinearSegmentedColormap, Normalize
%matplotlib inline

p_score = norm.pdf(ap, loc=0.0, scale=0.7) #params estimated on a larger sample
p_score = p_score / p_score.sum()

normalizer = Normalize(vmin=-10, vmax=10)
colors = ['crimson', 'crimson', 'silver', 'deepskyblue', 'deepskyblue']
cmap = LinearSegmentedColormap.from_list('beauty', colors=colors)

plt.figure(figsize=(8, 12))
plt.title('What makes a word beautiful?',
          loc='left', color='gray', fontsize=22)
plt.scatter(p_score, ap, c='gray', marker='.', alpha=.6)
for prob, potential, word in zip(p_score, ap, test_words):
    plt.text(prob, potential, word.lower(),
             fontsize=(np.log10(np.abs(potential)+2))*30, alpha=.8,
             color=cmap(normalizer(potential)))
plt.text(-0.025, 6, 'beautiful', va='center',
         fontsize=20, rotation=90, color='deepskyblue')
plt.text(-0.025, -6, 'ugly', va='center',
         fontsize=20, rotation=90, color='crimson')
plt.xlabel('P(Aesthetic Potential)', fontsize=20)
plt.ylabel('Aesthetic Potential', fontsize=20)
plt.gca().tick_params(axis='both', which='major', labelsize=14)
plt.gca().spines['right'].set_visible(False)
plt.gca().spines['top'].set_visible(False)
plt.show()

What makes a word beautiful?

This chart gives us a better insight on the meaning of the values we just computed. We note that high probability values are around 0, hence most words in the vocabulary are neutral. Values above 2 and below -2 have a quite low probability, this tells us that words associated with these values have a strong Aesthetic Potential. From this chart we can see that the words "idea" and "sun" are considered beautiful while "hate" and "poisoning" are ugly (who would disagree with that :).

以上所述就是小编给大家介绍的《What makes a word beautiful?》，希望对大家有所帮助，如果大家有任何疑问请给我留言，小编会及时回复大家的。在此也非常感谢大家对码农网的支持！

查看所有标签

猜你喜欢:

What makes a word beautiful?

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

深入理解C#（第3版）

斯基特 (Jon Skeet) / 姚琪琳 / 人民邮电出版社 / 2014-4-1 / 99.00元

本书是世界顶级技术专家“十年磨一剑”的经典之作，在C#和.NET领域享有盛誉。与其他泛泛介绍C#的书籍不同，本书深度探究C#的特性，并结合技术发展，引领读者深入C#的时空。作者从语言设计的动机出发，介绍支持这些特性的核心概念。作者将新的语言特性放在C#语言发展的背景之上，用极富实际意义的示例，向读者展示编写代码和设计解决方案的最佳方式。同时作者将多年的C#开发经验与读者分享，读者可咀其精华、免走弯......一起来看看《深入理解C#（第3版）》这本书的介绍吧!

码农工具

What makes a word beautiful?

深入理解C#（第3版）

HTML 压缩/解压工具

Base64 编码/解码

UNIX 时间戳转换