Unicode 13.0.0

栏目: IT技术 · 发布时间: 6年前

内容简介：This page summarizes the important changes for the Unicode Standard, Version 13.0.0. This version supersedes all previous versions of the Unicode Standard.A.B.

Unicode® 13.0.0

2020 March 10 (Announcement)

This page summarizes the important changes for the Unicode Standard, Version 13.0.0. This version supersedes all previous versions of the Unicode Standard.

C.Stability Policy Update

D.Textual Changes and Character Additions

F.Changes in the Unicode Character Database

G.Changes in the Unicode Standard Annexes

H.Changes in Synchronized Unicode Technical Standards

M.Implications for Migration

A. Summary

Unicode 13.0 adds 5,930 characters, for a total of 143,859 characters. These additions include, for a total of 154 scripts, as well as 55 new emoji characters.

The new scripts and characters in Version 13.0 add support for lesser-used languages and unique written requirements worldwide, including numerous symbols additions. Funds from theAdopt-a-Character program provided support for some of these additions. The new scripts and characters include:

Yezidi, historically used in Iraq and Georgia for liturgical purposes, with some modern revival of usage
Chorasmian, historically used in Central Asia across Uzbekistan, Kazakhstan, and Turkmenistan to write an extinct Eastern Iranian language
Dives Akuru, historically used in the Maldives until the 20th century
Khitan Small Script, historically used in northern China
Arabic script additions used to write Hausa, Wolof, and other languages in Africa, and other additions used to write Hindko and Punjabi in Pakistan
A character fors Syloti Nagri in South Asia
Bopomofo additions used for Cantonese

Popular symbol additions:

55 emoji characters. For complete statistics regarding all emoji as of Unicode 13.0, see Emoji Counts. For more information about emoji additions in version 13.0, including new emoji ZWJ sequences and emoji modifier sequences, see Emoji Recently Added, v13.0 .
Creative Commons license symbols that are used to describe functions, permissions, and concepts related to intellectual property that have extensive use on the web

Other symbol additions include:

Two Vietnamese reading marks that mark ideographs as having a distinct, colloquial reading
214 graphic characters that provide compatibility with various home computers from the mid-1970s to the mid-1980s and with early teletext broadcasting standards

Support for CJK unified ideographs was enhanced in Version 13.0, by the addition of Extension G, which is the first block to be encoded in Plane 3, as well as by significant corrections and improvements to the Unihan database. Changes to the Unihan database include updated regular expressions for many properties, the addition of several new properties, and the removal of three obsolete provisional properties. See UAX #38, Unicode Han Database (Unihan) for more information on the updates.

Additional support for lesser-used languages and scholarly work was extended, including:

A character used in Sinhala to write Sanskrit

Important chart font updates, including:

An update to the code charts for the Adlam script, now using the Ebrima font. That font has an improved design and has gained widespread acceptance in the user community.
A completely updated font for the CJK Radicals Supplement and the Kangxi Radicals blocks. This font is also used to show the radicals in the CJK unified ideographs code charts, as well as in the radical-stroke indexes.

Synchronization

Several other important Unicode specifications have been updated for Version 13.0. The following four Unicode Technical Standards are versioned in synchrony with the Unicode Standard, because their data files cover the same repertoire. All have been updated to Version 13.0:

UTS #10, Unicode Collation Algorithm — sorting Unicode text
UTS #39, Unicode Security Mechanisms — reducing Unicode spoofing
UTS #46, Unicode IDNA Compatibility Processing — compatible processing of non-ASCII URLs
UTS #51, Unicode Emoji — emoji-related data and behavior

Some of the changes in Version 13.0 and associated Unicode Technical Standards may require modifications to implementations. For more information, see the migration and modification sections of UTS #10, UTS #39, UTS #46, and UTS #51.

This version of the Unicode Standard is also synchronized with ISO/IEC 10646:2020, sixth edition.

See Sections D through H below for additional details regarding the changes in this version of the Unicode Standard, its associated annexes, and the other synchronized Unicode specifications.

B. Technical Overview

Version 13.0 of the Unicode Standard consists of:

The core specification
The code charts (delta and archival) for this version
The Unicode Standard Annexes
The Unicode Character Database (UCD)

The core specification gives the general principles, requirements for conformance, and guidelines for implementers. The code charts show representative glyphs for all the Unicode characters. The Unicode Standard Annexes supply detailed normative information about particular aspects of the standard. The Unicode Character Database supplies normative and informative data for implementers to allow them to implement the Unicode Standard.

Core Specification

The core specification is available as a single pdf for viewing . (14 MB) Links are also available in the navigation bar on the left of this page to accessandof the core specification.

Code Charts

Several sets of code charts are available. They serve different purposes:

The latest set of code charts for the Unicode Standard is available online. Those charts are always the most current code charts available, and may be updated at any time. The charts are organized by scripts and blocks for easy reference. An online index by character name is also provided.

For Unicode 13.0.0 in particular two additional sets of code chart pages are provided:

A set of delta code charts showing the new blocks and any blocks in which characters were added for Unicode 13.0.0. The new characters are visually highlighted in the charts.
A set of archival code charts that represents the entire set of characters, names and representative glyphs at the time of publication of Unicode 13.0.0.

The delta and archival code charts are a stable part of this release of the Unicode Standard. They will never be updated.

Unicode Standard Annexes

Links to the individualUnicode Standard Annexesare available in the navigation bar on the left of this page. The list of significant changes in the content of the Unicode Standard Annexes for Version 13.0 can be found inbelow.

Unicode Character Database

Data files for Version 13.0 of the Unicode Character Database are available. The ReadMe.txt in that directory provides a roadmap to the functions of the various subdirectories.Zipped versions of the UCD for bulk download are available, as well.

Version References

Version 13.0.0 of the Unicode Standard should be referenced as:

The Unicode Consortium. The Unicode Standard, Version 13.0.0 , (Mountain View, CA: The Unicode Consortium, 2020. ISBN 978-1-936213-26-9)

http://www.unicode.org/versions/Unicode13.0.0/

The terms “Version 13.0” or “Unicode 13.0” are abbreviations for the full version reference, Version 13.0.0.

The citation and permalink for the latest published version of the Unicode Standard is:

The Unicode Consortium. The Unicode Standard .

http://www.unicode.org/versions/latest/

A complete specification of the contributory files for Unicode 13.0 is found on the page Components for 13.0.0 . That page also provides the recommended reference format for Unicode Standard Annexes. For examples of how to cite particular portions of the Unicode Standard, see also theReference Examples.

Errata

Errata incorporated into Unicode 13.0 are listed by date in aseparate table. For corrigenda and errata after the release of Unicode 13.0, see the list of currentUpdates and Errata.

C. Stability Policy Update

There were no significant changes to the Stability Policy of the core specification between Unicode 12.1 and Unicode 13.0.

D. Textual Changes and Character Additions

Four new scripts were added with accompanying new block descriptions:

Script	Number of Characters
Chorasmian	28
Dives Akuru	72
Khitan Small Script	470
Yezidi	47

Changes in the Unicode Standard Annexes are listed in.

Character Assignment Overview

5,930 characters have been added. Most character additions are in new blocks, but there are also character additions to a number of existing blocks. For details, seedelta code charts.

E. Conformance Changes

There are no significant new conformance requirements in Unicode 13.0.

F. Changes in the Unicode Character Database

The detailed listing of all changes to the contributory data files of the Unicode Character Database for Version 13.0 can be found in UAX #44, Unicode Character Database . The changes listed there include character additions and property revisions to existing characters that will affect implementations. Some of the important impacts on implementations migrating from earlier versions of the standard are highlighted in.

G. Changes in the Unicode Standard Annexes

In Version 13.0, some of the Unicode Standard Annexes have had significant revisions. The most important of these changes are listed below. For the full details of all changes, see the Modifications section of each UAX, linked directly from the following list of UAXes.

Unicode Standard Annex	Changes
UAX #9 Unicode Bidirectional Algorithm	No significant changes in this version.
UAX #11 East Asian Width	The East_Asian_Width property status was changed from informative to normative.
UAX #14 Unicode Line Breaking Algorithm	Rule LB22 was changed to simply disallow breaking before ellipsis. Rule LB30 was changed to exclude full-width CP and OP. Text was added clarifying the tailorability of line break classes.
UAX #15 Unicode Normalization Forms	The explanation of script-specific exclusions was updated in Section 5.1, Composition Exclusion Types .
UAX #24 Unicode Script Property	Yezi was added to the scx set listed for U+060C.
UAX #29 Unicode Text Segmentation	A number of adjustments were made to values in Table 3, Word_Break Property Values . For consistency, prepended concatenation marks were omitted from the definition of Control in Table 2, Grapheme_Cluster_Break Property Values. Unnecessary external references to UTS #51 were removed.
UAX #31 Unicode Identifier and Pattern Syntax	A qualification was added to the example under UAX31-R1. Default Identifiers . Table 4 was retitled to "Excluded Scripts". Four new scripts were added to Table 4. Rows in Table 4 dedicated to character exclusions not directly associated with scripts were removed from the table, and those exclusions were moved to the derivation rules associated with UTS #39, Unicode Security Mechanisms .
UAX #34 Unicode Named Character Sequences	No significant changes in this version.
UAX #38 Unicode Han Database (Unihan)	The regular expressions for most of the existing IRG Source fields were updated. Documentation was added for new fields: kIRG_SSource, kIRG_UKSource, kTGHZ2013, kUnihanCore2020, and kSpoofingVariant. Documentation was removed for the obsolete fields: kRSJapanese, kRSKanWa, and kRSKorean. The format of the tables in Sections 4.2, 4.3, and 4.4 was revised for legibility, and a "Count" column was added to the tables in Section 4.4.
UAX #41 Common References for Unicode Standard Annexes	All references were updated for Unicode 13.0.
UAX #42 Unicode Character Database in XML	New code point attributes, values, and patterns were added.
UAX #44 Unicode Character Database	Documentation was added for emoji properties. Documentation of the new ccc=6 value was added in Table 15. The Khitan Small Script was added to the list of scripts whose Name property is derived by rule. A note was added indicating that code point labels are included in the scope of the matching rule UAX44-LM2. There were also numerous other small editorial improvements to the text.
UAX #45 U-Source Ideographs	A table was added summarizing U-source prefixes. The "UCI" prefix was marked as obsolete. The semantics of the "UK" prefix were clarified. References to SAT-sourced ideographs were removed. A "Comp" value was added to the list of possible status values. UNC-2013 and UNC-2015 status values were removed. A description of the radical-stroke charts associated with the U-Source ideographs was added.
UAX #50 Unicode Vertical Text Layout	Section 3.2 was significantly reorganized, with new content added regarding layout issues for squared Katakana and ideographic words. Horizontal and vertical glyphs were added for U+32FF SQUARE ERA NAME REIWA.

H. Changes in Synchronized Unicode Technical Standards

There are also significant revisions in the Unicode Technical Standards whose versions are synchronized with the Unicode Standard. The most important of these changes are listed below. For the full details of all changes, see the Modifications section of each UTS, linked directly from the following list of UTSes.

Unicode Technical Standard	Changes
UTS #10 Unicode Collation Algorithm	Khitan Small Script and the new Tangut Supplement block were added to the specification for computing implicit weights in Table 16.
UTS #39 Unicode Security Mechanisms	This update systematically corrected various citations of "IdentifierType", "Identifier Type" and "Type" to use "Identifier_Type" consistently, and similarly for "Identifier_Status". Definitions of Identifier_Type values were clarified in Table 1.
UTS #46 Unicode IDNA Compatibility Processing	No significant changes in this version.
UTS #51 Unicode Emoji	A new section was added on how to use ZWJ sequences to change the color of base emoji, to represent such emoji as black cat . Five characters were removed from the explicit gender table, since they were made gender-neutral. RGI sequences were added, showing more skin tone combinations for people holding hands. A definition was added for emoji component . Color was added to the specification of the order of elements in emoji ZWJ sequences.

M. Implications for Migration

There are a significant number of changes in Unicode 13.0 which may impact implementations upgrading to Version 13.0 from earlier versions of the standard. The most important of these are listed and explained here, to help focus on the issues most likely to cause unexpected trouble during upgrades.

Script-related Changes

Four new scripts have been added in Unicode 13.0.0. Some of these scripts have particular attributes which may cause issues for implementations. The more important of these attributes are summarized here.

Dives Akuru is a complex script of the Indic type.
Khitan Small Script has rules for stacking characters into phonogram clusters. One new, Khitan-specific format control character is used to distinguish between two patterns for phonogram clusters. The Khitan Small Script is traditionally laid out in vertical orientation.

General Character Property Changes

There are a number of issues related to particular character properties:

A new Canonical_Combining_Class value of ccc=6 has been added for two Vietnamese Han reading marks. Implementations should be checked to ensure that their handling of combining class values does not fail when encountering this new value.
A new value of the Indic_Positional_Category property has been added: Top_And_Bottom_And_Left.

Numeric Property Issues

A new set of decimal digits has been added for the Dives Akuru script.
A new set of compatibility decimal digits has been added, for segmented (LED-like) digit display support for legacy computer graphic symbol sets.

CJK/Unihan Changes

Three obsolete provisional properties have been removed: kRSJapanese, kRSKanWa, kRSKorean.
Two new normative source properties have been added: kIRG_SSource, kIRG_UKSource, with values split off from kIRG_USource. These properties involve data for the CJK charts and have some impact on the distribution of sources in those charts.
A new informative property has been added: kUnihanCore2020. This is intended as a more useful indicator of the basic Han set to support, superseding the function of kIICore.
One informative property, kTotalStrokes, has been moved from the Unihan subfile Unihan_DictionaryLikeData.txt to the subfile Unihan_IRGSources.txt. This change may impact implementations that parse for that particular Unihan property value.
There are large changes in the values for kSimplifiedVariant, kTraditionalVariant, and kZVariant, and many additions for the new kSpoofingVariant property.

See UAX #38 , Unicode Han Database (Unihan) for further details on these changes, especially Section 4.2, Listing by Date of Addition to the Unicode Standard , and Section 4.3, Listing by Location within Unihan.zip . UAX #38 also has updated regex values for numerous Unihan properties.

There are multiple new ideographic ranges defined for Version 13.0.0, as well as changes to the end of several existing CJK unified ideograph ranges. Because implementations often hard-code ideographic ranges to short-cut lookups and reduce table sizes, it is especially important that implementers pay close attention to the implications of range changes for Version 13.0.0. See Blocks.txt for details.
There is a second range defined for Tangut ideographs now, for the new Tangut Supplement block. This means that Tangut is the second ideographic script (after Han) which has multiple ranges defined in multiple blocks. The Tangut Supplement block, like the main Tangut block, has character names defined by a rule which is based on the code point: TANGUT IDEOGRAPH-.


		The Khitan Small Script is a new ideographic script, encoded for
            the first time in Version 13.0.0. This is the fourth ideographic
            script (after Han, Tangut, and Nushu) to use the range notation
            in UnicodeData.txt. This script also has character names defined by a rule 
            which is based 
            on the code point: KHITAN SMALL SCRIPT CHARACTER-.

		
Three existing CJK unified ideographic blocks have small extensions
            added at the end of the blocks. These extensions increment the end
            ranges by a few code points for each block: 13 code points for the URO,
            10 code points for Extension A, and 7 code points for Extension B.
            Implementers expect these kinds of extension for the URO, because they
            have happened for multiple versions of the standard. However, these are the very			first
such small range additions for both Extension A and Extension B.
            Note that the addition for Extension A also happens to completely fill
            the CJK Unified Ideographs Extension A block.
            See Section 4.4,			Listing of Characters Covered by the Unihan Database
in UAX #38
            for the version history of all these small CJK unified ideograph additions
            inside existing blocks.		
		Finally, the new CJK Unified Ideogaphs Extension G block is the
            first block of assigned characters in Plane 3, the Tertiary Ideographic Plane.
            Implementers should check their assumptions about valid ranges past
            U+2FFFF, to ensure that code points in the range U+30000..U+3134A are correctly handled.


	Standardized Variation Sequences
	Two new standardized variation sequences were added to emoji-variation-sequences.txt to
        distinguish text presentation and emoji presentation forms of U+26A7 MALE WITH STROKE
        AND MALE AND FEMALE SIGN. This results from the new use of U+26A7 in an emoji sequence
        defined for Version 13.0.0.
	Emoji Changes
	
		
55 new emoji characters have been added. However, in addition
            to those individual characters, many new emoji sequences have been
            recognized, as well. Implementations supporting emoji should be
            checked to reflect changes in			UTS #51, Unicode Emoji
and all of its associated data files.		
	
	New Data Files Added to the UCD
	
		Two of the emoji data files have been formally incorporated into the UCD for
          Version 13.0.0. These files are located in a new emoji/ subdirectory of
          the main ucd/ directory. See UTS #51 and UAX #44 for details.
		emoji-data.txt specifies six emoji-related binary properties, which
          assist in the identification and parsing of emoji, and 
          which are relevant to Unicode segmentation algorithms.
		emoji-variation-sequences.txt specifies the emoji variation sequences,
          which enable control of emoji presentation versus text presentation of
          emoji characters. The format of this file is the same as that used for
          StandardizedVariants.txt.
		Other data files related to emoji sequences, as well as the emoji test
          file, are located in the /Public/emoji/13.0/ directory associated
          with UTS #51. Implementations should be prepared to adapt to the new
          locations of some data files.
		There have been no significant changes to the format of any of the
          normative data content of the emoji data files; however, in the comment
          section of the data lines, emoji version information has replaced the
          Unicode version information associated with characters and sequences.
	
	Code Charts
	
		The font for the Kangxi Radicals and CJK Radicals Supplement blocks has
          been updated, so that it more accurately represents the
          actual forms of Kangxi radicals and the variant radicals. This new font is also used
          for the indexing radical shown in the CJK unified ideograph blocks in
          the code charts, as well as in the updated radical-stroke indexes for 
          Version 13.0.0.
		A new radical-stroke index has been provided: Unihan2020CoreRSIndex.pdf.
          This provides an index for looking up CJK unified ideographs that are
          members of the set defined by the new Unihan property kUnihanCore2020.
		
The format for the Mongolian code chart has been substantially revised,
          removing all details about positional variants and standardized variation
          sequences. The old format, showing all the variant glyphs, is preserved
          inUTR #54,			Unicode Mongolian 12.1 Baseline
. Note that future updates
          to the Mongolian model and the rules for rendering and interpretation of
          variation sequences, will be worked out in a separate specification,
          instead of being documented in the basic code chart for Mongolian.		
	
	Collation-related Issues
	The Default Unicode Collation Element Table (DUCET) was updated to the Unicode 13.0.0
        repertoire for UCA 13.0. For the most part, the additions for new scripts and other
        characters are unremarkable, but the following issue may cause problems for migration
        of implementations that parse allkeys.txt:
	
		
Because of the addition of a second, non-contiguous range of Tangut ideographs
          to the standard, there are now			two
@implicitweights statements for Tangut
          ranges at the top of allkeys.txt associated with the same FB00 base weight.
          Parsers must			accumulate
ranges associated with the same base weight,
          rather than clobbering a prior range assignment when encountering the second
          range.


                    
以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持 码农网
                    
                    
                        
                        关注我们，获取更多IT资讯^_^
                        

                    
                                                            
                    
                                                     查看所有标签
                                            
                                            猜你喜欢:
                        
                                                            Unicode 13.0.0
                                                    
                                        本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。


                                    
        码农书籍
        
            
            
                Beginning Java Objects中文版从概念到代码
                巴克 / 万波 / 人民邮电出版社 / 2007-1 / 78.00元
                《Beginning Java Objects中文版从概念到代码(第2版)》是关于软件对象和Java的，但并不是纯粹地介绍Java语言，而是强调如何从对象模型转换到功能完整的Java应用程序。书中讲述了对象基础、对象建模和模型的实现。《Beginning Java Objects中文版从概念到代码(第2版)》除了用学生注册系统(SRS)示例贯穿全书之外，还在附录中给出三个附加的案例，这些案例是每章......一起来看看 《Beginning Java Objects中文版从概念到代码》 这本书的介绍吧!
            
        
    
            
        码农工具
        
                            
                    
                        
                        
                            JSON 在线解析
                            在线  JSON 格式化工具
                        
                    
                
                            
                    
                        
                        
                            在线进制转换器
                            各进制数互转换器
                        
                    
                
                            
                    
                        
                        
                            Base64 编码/解码
                            Base64 编码/解码


        
     
    
       


    

     
    
        
            New
            文章
            话题
            教程
        
    
    
                            · Python 3.14 RC3 发布
                    · Spring AI 1.0.2 现已发布
                    · OpenSSL 3.5.3 发布
                    · 苹果：开发者计划所有会员到本月底必须启用两步认证
                    · NG-ZORRO-MOBILE 0.11.0 发布，antd 移动规范的 Angular 实现
                    · 韩国Luna币暴跌 分析师：Luna 2.0也没戏 过去一周缩水一多半
            
    
                            · 2026年4月30日 程序员老黄历，宜:锻炼一下身体,面试,在妹子面前吹牛,代码复审
                    · 2026年4月29日 程序员老黄历，宜:锻炼一下身体,白天上线
                    · 2026年4月28日 程序员老黄历，宜:跳槽,招人
                    · 2026年4月27日 程序员老黄历，宜:写单元测试,使用%t,招人,提交辞职申请
                    · 2026年4月26日 程序员老黄历，宜:拒绝996
                    · 2026年4月25日 程序员老黄历，宜:拒绝996
            
    
                            · JMeter录制登录测试
                    · JMeter分布式负载测试（吞吐量控制器）
                    · JMS主题测试计划
                    · JMS点对点测试计划
                    · JMeter JMS测试计划
                    · JMeter Webservice API测试计划
            
    
        
    关注 码农网 公众号