Base64 编码学习笔记（ Java 实现）

栏目: IT技术 · 发布时间: 6年前

内容简介：但凡从事码工这一行，多多少少会遇到那么 Base64 的编码规则是怎样的？因为在计算机的世界中，所有的内容都以字节数组（ Byte Array ）的形式呈现，所以 Base64 的输入数据自然是字节数组，每一个字节有 8 位 bit ， Base64 以每三个字节为一组，然后均分成四份，每一份有 6 个 bit ，而这 6 个 bit ，恰好就能对应到 64 个指定的 ASCII 字符上去。转换规则如下（抄自那么如果输入的字节长度不是 3 的整数倍，最后的一个或两个字节岂不是就无法应用上述规则了吗？所以 B

但凡从事码工这一行，多多少少会遇到 Base64 编码这个概念，因为我们总要接触互联网，而 Base64 编码诞生的目的就是为了让二进制数据能够在只支持文本的媒介上传输，比如说在网络上传输一张图片或者一段音频。而 Base64 本身是一种无损编码转换规则，同时编码后的内容与原始内容差别非常大，所以很多时候大家在网上留联系方式的时候也喜欢用 Base64 转换一下，既能把信息传达给网友，又避免了充斥在网络上的各种机器人的骚扰，比如这位朋友的自我介绍，甚至还贴心地给出了完整的解码命令行。

理论

那么 Base64 的编码规则是怎样的？因为在计算机的世界中，所有的内容都以字节数组（ Byte Array ）的形式呈现，所以 Base64 的输入数据自然是字节数组，每一个字节有 8 位 bit ， Base64 以每三个字节为一组，然后均分成四份，每一份有 6 个 bit ，而这 6 个 bit ，恰好就能对应到 64 个指定的 ASCII 字符上去。转换规则如下（抄自维基百科）

Index	Binary	Char	Index	Binary	Char	Index	Binary	Char	Index	Binary	Char
0	000000	`A`	16	010000	`Q`	32	100000	`g`	48	110000	`w`
1	000001	`B`	17	010001	`R`	33	100001	`h`	49	110001	`x`
2	000010	`C`	18	010010	`S`	34	100010	`i`	50	110010	`y`
3	000011	`D`	19	010011	`T`	35	100011	`j`	51	110011	`z`
4	000100	`E`	20	010100	`U`	36	100100	`k`	52	110100	`0`
5	000101	`F`	21	010101	`V`	37	100101	`l`	53	110101	`1`
6	000110	`G`	22	010110	`W`	38	100110	`m`	54	110110	`2`
7	000111	`H`	23	010111	`X`	39	100111	`n`	55	110111	`3`
8	001000	`I`	24	011000	`Y`	40	101000	`o`	56	111000	`4`
9	001001	`J`	25	011001	`Z`	41	101001	`p`	57	111001	`5`
10	001010	`K`	26	011010	`a`	42	101010	`q`	58	111010	`6`
11	001011	`L`	27	011011	`b`	43	101011	`r`	59	111011	`7`
12	001100	`M`	28	011100	`c`	44	101100	`s`	60	111100	`8`
13	001101	`N`	29	011101	`d`	45	101101	`t`	61	111101	`9`
14	001110	`O`	30	011110	`e`	46	101110	`u`	62	111110	`+`
15	001111	`P`	31	011111	`f`	47	101111	`v`	63	111111	`/`

那么如果输入的字节长度不是 3 的整数倍，最后的一个或两个字节岂不是就无法应用上述规则了吗？所以 Base64 规定，对于末尾的空位，用等号 = 补齐，因此转换而成的 Base64 编码的长度总是 4 的整数倍。

简而言之，给一段二进制数据进行 Base64 编码时，需要以下三步

维基上也给出了不同情况下的转换示例

二进制字节的长度恰好为 3 的整数倍（ Man => TWFu ）

Source	Text (ASCII)	M								a								n
Source	Octets	77 (0x4d)								97 (0x61)								110 (0x6e)
Bits		0	1	0	0	1	1	0	1	0	1	1	0	0	0	0	1	0	1	1	0	1	1	1	0
Base64 encoded	Sextets	19						22						5						46
	Character	T						W						F						u
	Octets	84 (0x54)						87 (0x57)						70 (0x46)						117 (0x75)

二进制字节的末尾余两位（ Ma => TWE= ）

Source	Text (ASCII)	M								a
Source	Octets	77 (0x4d)								97 (0x61)
Bits		0	1	0	0	1	1	0	1	0	1	1	0	0	0	0	1	0	0
Base64 encoded	Sextets	19						22						4						Padding
	Character	T						W						E						=
	Octets	84 (0x54)						87 (0x57)						69 (0x45)						61 (0x3D)

二进制字节的末尾余一位（ M => TQ== ）

Source	Text (ASCII)	M
Source	Octets	77 (0x4d)
Bits		0	1	0	0	1	1	0	1	0	0	0	0
Base64 encoded	Sextets	19						16						Padding	Padding
	Character	T						Q						=	=
	Octets	84 (0x54)						81 (0x51)						61 (0x3D)	61 (0x3D)

解码的过程就是编码的逆过程，同样地，也可以用三步来概括

将 Base64 字符串每四个分为一组，从上表中找到每个字符对应的 6 位二进制码，拼在一起成 24 位 bit 串
将这 24 位 bit 串均分为三份，每部分 8 个 bit 作为一个字节，直接放到解码结果相应的位置
最后的四位字符，拿掉末尾所有的等号，根据末尾等号的个数（一位还是两位）判断需要从末尾拿走几个零，最后解码为两位或一位字节

实践

为了证明自己会写 Java ，闲暇时用 Java + Maven 简单写了一个 Base64 的编码和解码方法。为了节约篇幅，这里省掉了类的定义以及依赖的引入。

编码

private final static byte[] encodeMap = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/".getBytes();

    public static byte[] encode(byte[] plainBytes) {
        if (plainBytes.length == 0) {
            return "".getBytes();
        }

        int encodedLength = (plainBytes.length % 3 == 0) ? plainBytes.length/3 * 4 : (plainBytes.length/3 + 1) * 4;
        byte[] encodedBytes = new byte[encodedLength];
        int i = 0, j = 0;
        while (i < plainBytes.length / 3 * 3) {
            int value = plainBytes[i] << 16 | plainBytes[i+1] << 8 | plainBytes[i+2];
            encodedBytes[j] = encodeMap[value>>18&0x3f];
            encodedBytes[j+1] = encodeMap[value>>12&0x3f];
            encodedBytes[j+2] = encodeMap[value>>6&0x3f];
            encodedBytes[j+3] = encodeMap[value&0x3f];
            i += 3;
            j += 4;
        }

        int remains = plainBytes.length - i;
        if (remains > 0) {
            int value = plainBytes[i] << 16;
            if (remains == 2) {
                value |= plainBytes[i+1] << 8;
            }
            encodedBytes[j] = encodeMap[value>>18&0x3f];
            encodedBytes[j+1] = encodeMap[value>>12&0x3f];
            if (remains == 1) {
                encodedBytes[j+2] = '=';
                encodedBytes[j+3] = '=';
            } else if (remains == 2) {
                encodedBytes[j+2] = encodeMap[value>>6&0x3f];
                encodedBytes[j+3] = '=';
            }
        }
        return encodedBytes;
    }

解码

// decodeMap 的初始化需借助上面的 encodeMap ，其实质上是一个 ASCII 字符到它在 encodeMap 中位置的映射
    private final static Map<Byte, Integer> decodeMap = new HashMap<Byte, Integer>();
    static {
        for (int i = 0; i < encodeMap.length; i++) {
            decodeMap.put(encodeMap[i], i);
        }
    }

    public static byte[] decode(byte[] encodedBytes) {
        if (encodedBytes.length == 0) {
            return "".getBytes();
        }

        int decodedLength = (encodedBytes.length - 4) / 4 * 3;
        if (encodedBytes[encodedBytes.length-1] == '=' && encodedBytes[encodedBytes.length-2] == '=') {
            decodedLength += 1;
        } else if (encodedBytes[encodedBytes.length-1] == '=') {
            decodedLength += 2;
        } else {
            decodedLength += 3;
        }

        byte[] decodedBytes = new byte[decodedLength];
        int i = 0, j = 0;
        while (i < encodedBytes.length - 4) {
            int value = decodeMap.get(encodedBytes[i])<<18 | decodeMap.get(encodedBytes[i+1])<<12 | decodeMap.get(encodedBytes[i+2])<<6 | decodeMap.get(encodedBytes[i+3]);
            decodedBytes[j] = (byte)(value>>16&0xff);
            decodedBytes[j+1] = (byte)(value>>8&0xff);
            decodedBytes[j+2] = (byte)(value&0xff);

            i += 4;
            j += 3;
        }

        if (decodedLength - j == 1) {
            int value = decodeMap.get(encodedBytes[i])<<18 | decodeMap.get(encodedBytes[i+1])<<12;
            decodedBytes[j] = (byte)(value>>16&0xff);
        } else if (decodedLength - j == 2) {
            int value = decodeMap.get(encodedBytes[i])<<18 | decodeMap.get(encodedBytes[i+1])<<12 | decodeMap.get(encodedBytes[i+2])<<6;
            decodedBytes[j] = (byte)(value>>16&0xff);
            decodedBytes[j+1] = (byte)(value>>8&0xff);
        } else {
            int value = decodeMap.get(encodedBytes[i])<<18 | decodeMap.get(encodedBytes[i+1])<<12 | decodeMap.get(encodedBytes[i+2])<<6 | decodeMap.get(encodedBytes[i+3]);
            decodedBytes[j] = (byte)(value>>16&0xff);
            decodedBytes[j+1] = (byte)(value>>8&0xff);
            decodedBytes[j+2] = (byte)(value&0xff);
        }

        return decodedBytes;
    }

以上所述就是小编给大家介绍的《Base64 编码学习笔记（ Java 实现）》，希望对大家有所帮助，如果大家有任何疑问请给我留言，小编会及时回复大家的。在此也非常感谢大家对码农网的支持！

查看所有标签

猜你喜欢:

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

颠覆医疗

[美]埃里克·托普 / 张南、魏薇、何雨师 / 译言·东西文库／电子工业出版社 / 2014-1-20 / 55.00

“创造性破坏”是奥地利经济学家约瑟夫·熊彼特最著名的理论，当一个产业在革新之时，都需要大规模地淘汰旧的技术与生产体系，并建立起新的生产体系。电器之于火器、汽车之于马车、个人计算机之于照排系统，都是一次又一次的“创造性破坏”，旧的体系完全不复存在，新的体系随之取代。 “创造性破坏”已经深深地改变了我们的生活，在这个数字时代，我们身边的一切都被“数字化”了。只有一处，也许是由于其本身的根深蒂固，......一起来看看《颠覆医疗》这本书的介绍吧!

码农工具

JSON 在线解析

在线 JSON 格式化工具

图片转BASE64编码

在线图片转Base64编码工具