Flattening and Filtering JSON for Cleaner Types

栏目: IT技术 · 发布时间: 4年前

内容简介:Before I grokked theI’ll use an example from GitHub’sThat is, I want parse

Before I grokked the Unmarshaler interface, it was hard to know how to parse a complex JSON string into a type in one-shot, with or without preprocessing. There are many good blog posts on techniques to parse JSON in Go, but I had to learn this by experimentation to finally wrap my head around it.

I’ll use an example from GitHub’s /commits REST API, using PR: ruby/ruby#3365 . I’ve saved the response in the repo where I’ve added full implementation of the example used in this post. The commits response from GitHub REST API is very verbose, depending on the PR size, and having depth greater than 1. In the hypothetical application that I’m writing, I need a list of “objects” that have the following information:

type MetaData struct {
	Author string
	Committer string
	SHA string
	Message string
}

That is, I want parse this response into a []MetaData slice. I do not want to traverse the structs in the format of the responses in my main “business logic”, as that makes it hard to follow the important bits. I don’t want to use interface{} as a placeholder. A better trade-off, in my opinion and use case, is to do as much as possible during the parse phase to massage the data into the structure you want. I’m positive that this is a common use case. I ended up learning one way to do this cleanly almost by accident. First, the components involved:

Use anonymous structs

Anonymous structs can be used to avoid defining a concrete type and skip giving it a name for one-off use-cases. It’s heavily used in parsing and marshalling code paths, and testing. In our case, this technique can be used to define a “dirty” struct inside the UnmarshalJSON function on the fly, and use that for parsing the JSON .

Implementing Unmarshaler interface

Any type that has a UnmarshalJSON function on it implements the Unmarshaler interface. This type then can be used as the target for parsing a JSON sub tree or the entire JSON itself!

Implementation

First step is to mock out the main function:

func main() {
	// This variable contains the raw json bytes that resulted from the
	// API call. I'm not adding the code for the actual network fetch
	// for now, but in the example repository, I read the commits
	// response from a file
	var jsonb []byte
	jsonb = JSONFromSomewhere()

	var metadatas []MetaData
	if err := json.Unmarshal(jsonb, &metadatas); err != nil {
		log.Fatalln("error parsing JSON", err)
	}

	fmt.Println(metadatas)
}

The JSON response of /commits endpoint is a list of commit objects, and I’m using a list of MetaData types to match that interface. For each commit item from the JSON array, the raw bytes get passed as the argument to the UnmarshalJSON function on MetaData .

Next step is to implement the UnmarshalJSON function using an anonymous struct to parse out the raw commit object JSON string into it:

func (m *MetaData) UnmarshalJSON(buf []byte) error {
	var commit struct {
		SHA    string `json:"sha"`
		Commit struct {
			Author struct {
				Name string `json:"name"`
			} `json:"author"`
			Committer struct {
				Name string `json:"name"`
			} `json:"committer"`
			Message string `json:"message"`
		} `json:"commit"`
	}

	if err := json.Unmarshal(buf, &commit); err != nil {
		return errors.Wrap(err, "parsing into MetaData failed")
	}

	// continued
}

Final step is to process the commit struct, and set the appropriate fields on MetaData struct:

func (m *MetaData) UnmarshalJSON(buf []byte) error {
	// same as above

	m.AuthorName = commit.Commit.Author.Name
	m.CommitterName = commit.Commit.Committer.Name
	m.SHA = commit.SHA
	m.Message = commit.Commit.Message

	return nil
}

That’s it! An additional advantage to this type of narrow types is it’s easier to test.

Bonus: Filtering the slice further

For bonus points, I want to skip certain []MetaData elements based on a condition. A way to do this, keeping the same principles as above in mind, is to define a type that covers []MetaData , which implements the Unmarshaler interface:

type MetaDatas []MetaData

func (ms *MetaDatas) UnmarshalJSON(buf []byte) error {
	// []MetaData is not the same as MetaDatas, and this difference is
	// important!
	var metadatas []MetaData

	if err := json.Unmarshal(buf, &metadatas); err != nil {
		log.Fatalln("error parsing JSON", err)
	}

	// filtering without allocations
	// https://github.com/golang/go/wiki/SliceTricks#filtering-without-allocating
	cleanedms := metadatas[:0]
	for _, metadata := range metadatas {
		if !strings.HasPrefix(metadata.Message, "WIP") {
			cleanedms = append(cleanedms, metadata)
		}
	}
	*ms = cleanedms

	return nil
}

Like before, I’m using a temporary type of the kind that matches our main type, and using that to parse into. Then I’m clean out slice based on a condition—I want to skip all the commits that start with WIP . Note that the metadatas variable defined inside the UnmarshalJSON function is defined as []MetaData and not as MetaDatas , since doing that would result in a parse-loop. By design, var metadatas Metadatas and var metadatas []MetaData are not the same type.

Finally, the filtered slice gets assigned to the underlying object that the JSON is getting parsed into.

A note about performance

In these examples, the parse flow will create the entire []MetaData slice, even though we filter out many of the elements. To my knowledge, this seems like a necessary hit to take. I’m not aware if there’s a way to avoid allocations by pre-pre-processing the incoming bytes to avoid the allocation in the first place. My thought process here is that if we didn’t filter, or cleanup the JSON data, it will anyway allocate all the objects, so this may not be a huge difference in allocations per se, but that’s just my opinion at this point.


以上所述就是小编给大家介绍的《Flattening and Filtering JSON for Cleaner Types》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

公众号运营实战手册

公众号运营实战手册

中信出版社 / 2018-11 / 58

作者粥左罗在刚入行做新媒体的一年时间里,就写了100篇阅读量10万+的公众号文章,但是在此之前,他足足花了两个月的时间研究公众号运营和爆款文章的逻辑和打法。 这本书就是他总结和归纳自己公众号写作和运营的全部秘诀和技巧,是一本行之有效的实战指南。 从如何注册一个公号,给公号起什么名字? 多长时间更新一次为好? 到如何找选题,如何积累爆款素材? 如何编辑内容,如何做版面设......一起来看看 《公众号运营实战手册》 这本书的介绍吧!

HTML 压缩/解压工具
HTML 压缩/解压工具

在线压缩/解压 HTML 代码

JSON 在线解析
JSON 在线解析

在线 JSON 格式化工具

HEX CMYK 转换工具
HEX CMYK 转换工具

HEX CMYK 互转工具