ASTs, Markdown and MDX

栏目: IT技术 · 发布时间: 4年前

内容简介:Markdown for documents, React for interaction, MDX for both! But how do Markdown and MDX arrive at HTML and JSX? The answer is Abstract Syntax Trees.Markdown is the perfect format for writing documents, documentation, blog posts, static content, and more.

Markdown for documents, React for interaction, MDX for both! But how do Markdown and MDX arrive at HTML and JSX? The answer is Abstract Syntax Trees.

Markdown is the perfect format for writing documents, documentation, blog posts, static content, and more. React on the other hand is great for building interactive interfaces. That said, have you ever tried writing a blog post in React/HTML? There's a reason Markdown exists! But what if you want to add some interactive elements to a Markdown document? Maybe an embedded YouTube video or maybe a chart that pulls in some dynamic data? Or maybe a form to collect some contact information on a sales page?

MDX gives you the best of both worlds. Write your documents in Markdown, but feel free to import and use React components right there inside of your document. Beautiful.

In this article we're going to go beyond surface level and dive into some of the inner workings of Markdown and MDX. How does a file with Markdown get converted into HTML, and how does MDX get converted into JSX?

We are going to explore Abstract Syntax Trees (AST) and what Markdown and MDX have to do with them. The code samples in this article can be found here .

MDX Real-World Usage (A Warning)

The examples in this article are meant to provide a glimpse of what MDX is doing behind the scenes and what ASTs are like and used for. If you'd like to use MDX in Gatsby , Next.js , or Create React App , the MDX website provides examples and documentation on how to easily use it within your app.

Syntax Trees

The ability to view code as data - rather than simply some text in a file - opens up a world of possibilities. Take Prettier for example. It is able to take some poorly formatted JavaScript or Markdown and give you something nicely formatted in return. You may think the conversion goes from ugly Markdown directly to formatted Markdown, but the key to this process is the intermediary step, a data structure called an Abstract Syntax Tree (AST).

Think of what you can produce with a Markdown file. Yes, you can produce HTML, but you can also produce formatted Markdown (like what Prettier does), or it can be checked for linter errors, display how many words are in it, among other things.

Markdown -> AST -> HTML
Markdown -> AST -> Formatted Markdown
Markdown -> AST -> Lint Errors
Markdown -> AST -> Word Counts

It is with ASTs that MDX is able to combine Markdown and React so beautifuly together.

Abstract Syntax Trees in Action

To see ASTs in action, let's look at this small Markdown example with a Level 1 Heading and a Paragraph:

# Welcome

A paragraph.

If we process this markdown with unified along with the remark-parse plugin, we'll take the Markdown input and end up with an AST which represents the Markdown.

import unified from "unified";
import markdown from "remark-parse";

const input = `
# Welcome

A paragraph.
`;

const tree = unified()
  .use(markdown)
  .parse(input);

If you do this yourself, you'll see all sorts of data about the position and line of the characters, but I have stripped this out to make it a bit more digestible. Each node (an object) in this tree contains a number of properties:

  • type : What data type is this node? Heading, Paragraph, Emphasis, Strong, etc.
  • children : Nested nodes contained within the current one. Imagine an Image inside of a Link, or a Link within a Paragraph
  • depth : Used to differentiate Level 1, 2, 3 Headings (h1, h2, h3)
  • value : Text nodes have a value attribute which contain their actual text value
{
  "type": "root",
  "children": [
    {
      "type": "heading",
      "depth": 1,
      "children": [
        {
          "type": "text",
          "value": "Welcome"
        }
      ]
    },
    {
      "type": "paragraph",
      "children": [
        {
          "type": "text",
          "value": "A paragraph."
        }
      ]
    }
  ]
}

Using the AST for Calculations

We can process the AST to count how many of each type we see (recursive function alert):

function counts(acc, node) {
  // add 1 to an initial or existing value
  acc[node.type] = (acc[node.type] || 0) + 1;

  // find and add up the counts from all of this node's children
  return (node.children || []).reduce(
    (childAcc, childNode) => counts(childAcc, childNode),
    acc
  );
}

Which, depending on your input, produces something like:

{
  "root": 1,
  "heading": 1,
  "text": 7,
  "paragraph": 3,
  "strong": 1,
  "emphasis": 1
}

Counting Words with the AST

The word count tool I'm using in VS Code right now counts ## Welcome as 2 words, when we can really see that it is only a single word which happens to be in an h2 tag. Using an AST we can provide a more accurate word count by only counting the text values.

import unified from "unified";
import markdown from "remark-parse";

function wordCount(count, node) {
  if (node.type === "text") {
    return count + node.value.split(" ").length;
  } else {
    return (node.children || []).reduce(
      (childCount, childNode) => wordCount(childCount, childNode),
      count
    );
  }
}

// Our markdown input
const input = `## Welcome`;

// Convert markdown into an AST
const tree = unified()
  .use(markdown)
  .parse(input);

// Extract Word Count from AST
const words = wordCount(0, tree);

Visualizing the AST

With this AST we can also create a React component called Node which renders it and its children (using padding to display its tree like structure):

const Node = ({ node }) => (
  <div style={{ paddingLeft: `15px` }}>
    <strong>
      {node.type}
      {node.depth && <span> (d{node.depth})</span>}
    </strong>

    {node.value && <div style={{ paddingLeft: "15px" }}>{node.value}</div>}

    {/* Render additional Nodes for each child */}
    {node.children &&
      node.children.map(child => {
        const { line, column, offset } = child.position.start;
        return <Node key={`${line}-${column}-${offset}`} node={child} />;
      })}
  </div>
);

This output allows us to see how the tree is structured and indented:

root
  heading (d1)
    text
      Welcome
  paragraph
    text
      A paragraph.

MDX

If you came here for MDX and not Markdown, you're in luck! We're now going to transition into exploring how MDX works and how it is related to the Markdown examples shown above.

AST Explorer

For all the visual learners, there is a great website called AST Explorer which allows you to visualize the AST produced by a number of different input formats such as Markdown and MDX. We're going to be diving into MDX a bit further now, so let's take a look at the AST produced by an MDX file .

MDAST, HAST, MDXAST, MDXHAST... What??

That's a lot of acronyms! But what do they mean and what does this have to do with Markdown and MDX? In order to convert Markdown into an AST, we need a specification, or a set of rules to follow so we know what types of Nodes are available (heading, paragraph, link, etc.) and what properties they might have (type, children, value).

This set of rules for Markdown is called mdast . Similarly, there are other sets of rules for dealing with HTML, called hast . With both specifications, someone could write code that converts a Markdown AST (mdast) into an HTML AST (hast), which is exactly what remark-rehype does.

MDX is a superset of Markdown, meaning that everything you can do in Markdown you can also do in MDX, plus three additional features, which are:

  • jsx (replacing html)
  • import statements
  • export statements

This specification is called MDXAST .

Compiling MDX into an AST

Unless you are developing a plugin for MDX , you probably won't need to deal directly with the MDX AST, but since this article is about learning, let's write some code which produces an AST.

const { createMdxAstCompiler } = require("@mdx-js/mdx");

// A "unified" compiler
const compiler = createMdxAstCompiler({ remarkPlugins: [] });
const input = `
import YouTube from "./YouTube";

# Welcome

<YouTube id="123" />
`;

const ast = compiler.parse(input);
const astString = JSON.stringify(ast, null, 2);
console.log(astString);

After we strip out some of the position data, the AST ends up looking like the data below. Notice that we are seeing two of the custom MDX node types: import and jsx .

{
  "type": "root",
  "children": [
    {
      "type": "import",
      "value": "import YouTube from \"./YouTube\";"
    },
    {
      "type": "heading",
      "depth": 1,
      "children": [
        {
          "type": "text",
          "value": "Welcome"
        }
      ]
    },
    {
      "type": "jsx",
      "value": "<YouTube id=\"123\" />"
    }
  ]
}

Compiling MDX into JSX

What we really want MDX to do is to produce JSX, not an AST. This code is similar to the previous example which produced an AST, but we're adding on the utility function mdxHastToJsx which takes the AST from the previous step and produces JSX.

const { createMdxAstCompiler } = require("@mdx-js/mdx");
const mdxHastToJsx = require("@mdx-js/mdx/mdx-hast-to-jsx");

const input = `
import YouTube from "./YouTube";

# Welcome

<YouTube id="123" />
`;

const compiler = createMdxAstCompiler({ remarkPlugins: [] }).use(mdxHastToJsx);
const jsx = compiler.processSync(input).toString();
console.log(jsx);

What is produced is valid JSX, which looks like:

import YouTube from "./YouTube";

const layoutProps = {};
const MDXLayout = "wrapper";
export default function MDXContent({ components, ...props }) {
  return (
    <MDXLayout
      {...layoutProps}
      {...props}
      components={components}
      mdxType="MDXLayout"
    >
      <h1>{`Welcome`}</h1>
      <YouTube id="123" mdxType="YouTube" />
    </MDXLayout>
  );
}

Conclusion

I hope you've enjoyed learning about ASTs and the role they play with Markdown and MDX. With ASTs we're able to process and tweak our code on its way to the desired result. It could be as simple as counting how many words are in a Markdown document, or as complex as Prettier or Babel. They open the door to a number of possibilities, which may have at one point seemed like a far-fetched idea. Take MDX itself for example. It was just an idea that a few people had, and with the help of ASTs and some hard work by some smart people, became a reality.


以上所述就是小编给大家介绍的《ASTs, Markdown and MDX》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

机器学习及其应用2007

机器学习及其应用2007

周志华 编 / 清华大学 / 2007-10 / 37.00元

机器学习是人工智能的一个核心研究领域,也是近年来计算机科学中最活跃的研究分支之一。目前,机器学习技术不仅在计算机科学的众多领域中大显身手,还成为一些交叉学科的重要支撑技术。本书邀请相关领域的专家撰文,以综述的形式介绍机器学习中一些领域的研究进展。全书共分13章,内容涉及高维数据降维、特征选择、支持向量机、聚类、强化学习、半监督学习、复杂网络、异构数据、商空间、距离度量以及机器学习在自然语言处理中的......一起来看看 《机器学习及其应用2007》 这本书的介绍吧!

URL 编码/解码
URL 编码/解码

URL 编码/解码

html转js在线工具
html转js在线工具

html转js在线工具

RGB CMYK 转换工具
RGB CMYK 转换工具

RGB CMYK 互转工具