Introduction to Bimachines

栏目: IT技术 · 发布时间: 5年前

内容简介:Bimachines are deterministic finite-state devices that represent the class regular string functions and are commonly used for text rewriting. What is fascinating about them is that they process the input text in both directions - from left to right (→) and

Bimachines are deterministic finite-state devices that represent the class regular string functions and are commonly used for text rewriting. What is fascinating about them is that they process the input text in both directions - from left to right (→) and from right to left (←) which makes them very efficient in practice as they guarantee linear time input processing.

Bimachines have less expressive power that the finite-state transducers (FST) because FSTs can also model the class of regular relations (for one input we may have more that one output). FSTs, however, are generally non-deterministic, which means that, unlike bimachines, they might not be as efficient. Unlike finite automata, not every non-deterministic transducer can be converted to an equivalent deterministic one. Also, deterministic FSTs are not powerful enough to cover the class of regular functions, so bimachines stand in-between.

In practice, when implementing algorithms, we are typically more interested in deterministic outputs, which makes the bimachines suitable for a variety of tasks due to also their high performance. In this article, we’ll introduce the concept of a bimachine and see how they work.

Bimachines

A classical bimachine is a triple \( \mathcal{B} = \langle \mathcal{A}_L, \mathcal{A}_R, \psi \rangle \), where \( \mathcal{A}_L, \mathcal{A}_R \) are deterministic finite automata and \( \psi: Q_L \times \Sigma \times Q_R \to \Sigma^* \) is called the output function of the bimachine which given a state of the left automaton, an input symbol and a state of the right automaton, it returns a string. The left automaton of the bimachine scans the text from left to right, whereas the right automaton scans from right to left. Once we get the two paths, we construct the output by concatenating the results of the output function.

Example 1

Let’s go through an example. We have the simple regular string relation \( R_1 = \{ \langle a, x \rangle, \langle ab, y \rangle \} \). What is represents is two text rewritings - either a gets replaced by x or ab gets replaced by y . This s a regular string function which can be expressed as a bimachine ().

Introduction to Bimachines

Figure 1.1: Bimachine representing R 1

Note that the right and left automata’s all states are final, \(\epsilon\) represents the empty string. There are two possible successful simulations of the bimachine - one for the word a and one for the word ab .

Introduction to Bimachines

The top row depicts the recognition path in the left automaton while the text is being scanned from left to right, whereas the bottom row is the path of the right automaton while reading the text in the opposite direction. Inwe depict the equivalent FST.

Introduction to Bimachines

Figure 1.2: Finite-state transducer representing R 1

Example 2

Let’s take a look at another example where bimachines are particularly useful. We want to mark ( <, > ) substrings in a text and we have the following two patterns:

  • a+b - a substring consists of one or more a ’s followed by a b
  • a - a substring contains the single character a

Obviously the rules overlap, so in this case we pick the longest matched substring (this makes the relation functional). Let’s mark this string relation as \( R_2 \) and see a few examples:

  1.     \( R_2(a) = <a> \)
  2.     \( R_2(aab) = <aab> \)
  3.     \( R_2(baaaabba) = b<aaaab>b<a> \)
  4.     \( R_2(aaaaa) = <a><a><a><a><a> \)

The challenge here occurs in the case where we read an a - we can either directly output <a> , or we’d have to continue reading the text until we see a b . In the case when there is no b (example 4), we’d end up reading the whole text and then backtrack all the way which would lead to \( \mathcal{O}(n^2) \) processing time. Bimachines deal with this problem quite elegantly. We construct one insuch that it represents \( R_2 \).

Introduction to Bimachines

Figure 2: Bimachine representing R 2

Let’s see how we process strings using this bimachine. Below the simulation of the bimachine for the input string aaaaa .

Introduction to Bimachines

For baaaabba we end up with the following simulation.

Introduction to Bimachines

Caveats

Bimachines are simple and efficient but, as with everything, they come at a cost. The state-of-the-art algorithms for constructing bimachines rely on building the corresponding transducer first. Due to the bimachines’ deterministic nature, we end up an upper bound of exponential time and space requirements for the construction, more specifically \( \mathcal{O}(2^{|Q|}) \) where \( |Q| \) is the number of states of the transducer. Some approaches that improve on that, given specific conditions are met, but still, the bimachine construction remains a computationally intensive operation.

The simulation procedure also requires us to store the recognition path of the right automaton while we read the text and run the left automaton. For large inputs, this might prove also costly.

Conclusion

Besides plain text rewriting, bimachines can be used for various linguistic tasks, such as text annotation, tagging, tokenization, etc. They can also be generalized in a sense that the output is in a monoid (string output is a just special type called a free monoid ). Although having been around since the 1960s, even nowadays, they remain relatively unexplored. Efficient construction algorithms and applications are a good topic for further research.

References


以上所述就是小编给大家介绍的《Introduction to Bimachines》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

C++ API设计

C++ API设计

[美] Martin Reddy / 刘晓娜、臧秀涛、林健 / 人民邮电出版社 / 2013-8 / 89.00

现代软件开发中的一大难题就是如何编写优质的API。API负责为某个组件提供逻辑接口并隐藏该模块的内部细节。多数程序员依靠的是经验和冒险,从而很难达到健壮、高效、稳定、可扩展性强的要求。Martin Reddy博士在自己多年经验基础之上,对于不同API风格与模式,总结出了API设计的种种最佳策略,着重针对大规模长期开发项目,辅以翔实的代码范例,从而有助于设计决策的成功实施,以及软件项目的健壮性及稳定......一起来看看 《C++ API设计》 这本书的介绍吧!

JSON 在线解析
JSON 在线解析

在线 JSON 格式化工具

随机密码生成器
随机密码生成器

多种字符组合密码

XML 在线格式化
XML 在线格式化

在线 XML 格式化压缩工具