mirror of https://github.com/tooot-app/app
59 lines
2.6 KiB
TypeScript
59 lines
2.6 KiB
TypeScript
/**
|
|
* Parses an HTML string, calling the callbacks to notify of tags and text.
|
|
*
|
|
* ## History
|
|
*
|
|
* This file previously used a regular expression to find html tags in the input
|
|
* text. Unfortunately, we ran into a bunch of catastrophic backtracking issues
|
|
* with certain input text, causing Autolinker to either hang or just take a
|
|
* really long time to parse the string.
|
|
*
|
|
* The current code is intended to be a O(n) algorithm that walks through
|
|
* the string in one pass, and tries to be as cheap as possible. We don't need
|
|
* to implement the full HTML spec, but rather simply determine where the string
|
|
* looks like an HTML tag, and where it looks like text (so that we can autolink
|
|
* that).
|
|
*
|
|
* This state machine parser is intended just to be a simple but performant
|
|
* parser of HTML for the subset of requirements we have. We simply need to:
|
|
*
|
|
* 1. Determine where HTML tags are
|
|
* 2. Determine the tag name (Autolinker specifically only cares about <a>,
|
|
* <script>, and <style> tags, so as not to link any text within them)
|
|
*
|
|
* We don't need to:
|
|
*
|
|
* 1. Create a parse tree
|
|
* 2. Auto-close tags with invalid markup
|
|
* 3. etc.
|
|
*
|
|
* The other intention behind this is that we didn't want to add external
|
|
* dependencies on the Autolinker utility which would increase its size. For
|
|
* instance, adding htmlparser2 adds 125kb to the minified output file,
|
|
* increasing its final size from 47kb to 172kb (at the time of writing). It
|
|
* also doesn't work exactly correctly, treating the string "<3 blah blah blah"
|
|
* as an HTML tag.
|
|
*
|
|
* Reference for HTML spec:
|
|
*
|
|
* https://www.w3.org/TR/html51/syntax.html#sec-tokenization
|
|
*
|
|
* @param {String} html The HTML to parse
|
|
* @param {Object} callbacks
|
|
* @param {Function} callbacks.onOpenTag Callback function to call when an open
|
|
* tag is parsed. Called with the tagName as its argument.
|
|
* @param {Function} callbacks.onCloseTag Callback function to call when a close
|
|
* tag is parsed. Called with the tagName as its argument. If a self-closing
|
|
* tag is found, `onCloseTag` is called immediately after `onOpenTag`.
|
|
* @param {Function} callbacks.onText Callback function to call when text (i.e
|
|
* not an HTML tag) is parsed. Called with the text (string) as its first
|
|
* argument, and offset (number) into the string as its second.
|
|
*/
|
|
export declare function parseHtml(html: string, { onOpenTag, onCloseTag, onText, onComment, onDoctype }: {
|
|
onOpenTag: (tagName: string, offset: number) => void;
|
|
onCloseTag: (tagName: string, offset: number) => void;
|
|
onText: (text: string, offset: number) => void;
|
|
onComment: (offset: number) => void;
|
|
onDoctype: (offset: number) => void;
|
|
}): void;
|