Generate names using posterior probabilities

栏目: IT技术 · 发布时间: 6年前

内容简介:If you are building synthetic data and need to generate people names, this article will be a helpful guide. This article is part of a series of articles regarding the R packageInstallThe package

If you are building synthetic data and need to generate people names, this article will be a helpful guide. This article is part of a series of articles regarding the R package conjurer . You can find the first part of this serieshere.

Steps to generate people names

1. Installation

Install conjurer package by using the following code. 

install.packages("conjurer")

2. Training data Vs default data

The package conjurer provides 2 two options to generate names.

    • The first option is to provide a custom training data. 
    • The second option is to use the default training data provided by the package.

If it is people names that you are interested in generating, you are better off using the default training data. However, if you would like to generate names of  items or products (example: pharmaceutical drug names), it is recommended that you build your own training data.

The function that helps in generating names is buildNames . Let us understand the inputs of the function. This function takes the form as given below.

buildNames(dframe, numOfNames, minLength, maxLength)

In this function,

dframe is a dataframe. This dataframe must be a single column dataframe where each row contains a name. These names must only contain english alphabets(upper or lower case) from A to Z but no special characters such as “;” or non ASCII characters. If you do not pass this argument to the function, the function uses the default prior probabilities to generate the names.

numOfNames is a numeric. This specifies the number of names to be generated. It should be a non-zero natural number. 

minLength is a numeric. This specifies the minimum number of alphabets in the name. It must be a non-zero natural number .

maxLength is a numeric. This specifies the maximum number of alphabets in the name. It must be a non-zero natural number

.

3. Example

Let us run this function with an example to see how it works. Let us use the default matrix of prior probabilities for this example. The output would be a list of names as given below.

library(conjurer)
peopleNames <- buildNames(numOfNames = 3, minLength = 5, maxLength = 7)
print(peopleNames)
[1] "ellie"   "bellann" "netar"

Please note that since this is a random generator, you may get other names than displayed in the above example.

4. Consolidated code

Following is the consolidated code for your convenience.

#install latest version
install.packages("conjurer") 

#invoke library
library(conjurer)

#generate names
peopleNames <- buildNames(numOfNames = 3, minLength = 5, maxLength = 7) 

#inspect the names generated
print(peopleNames)

5. Concluding remarks

In this article, we have learnt how to use the R package conjurer and generate names. Since the algorithm relies on prior probabilities, the names that are output may not look exactly like real human names but will phonetically sound like human names. So, go ahead and give it a try. If you like to understand the underlying code that generates these names, you can explore the GitHub repository here . If you are interested in what’s coming next in this package, you can find it in the issues section here


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

浅薄

浅薄

[美] 尼古拉斯·卡尔 / 刘纯毅 / 中信出版社 / 2010-12 / 42.00元

《浅薄:互联网如何毒化了我们的大脑》在我们跟计算机越来越密不可分的过程中,我们越来越多的人生体验通过电脑屏幕上闪烁摇曳、虚无缥缈的符号完成,最大的危险就是我们即将开始丧失我们的人性,牺牲人之所以区别于机器的本质属性。——尼古拉斯•卡尔“谷歌在把我们变傻吗?”当尼古拉斯•卡尔在发表于《大西洋月刊》上赫赫有名的那篇封面文章中提出这个问题的时候,他就开启了人们热切渴望的期盼源泉,让人急于弄清楚互联网是在......一起来看看 《浅薄》 这本书的介绍吧!

MD5 加密
MD5 加密

MD5 加密工具

XML、JSON 在线转换
XML、JSON 在线转换

在线XML、JSON转换工具

HEX HSV 转换工具
HEX HSV 转换工具

HEX HSV 互换工具