`@telegraf/entity`

Convert Telegram entities to HTML or Markdown (and back).

⚠️ Before you start using this module, consider using copyMessage instead.

This module will produce Telegram-compatible HTML or MarkdownV2. However it is better to simply pass the text and entities back to Telegram rather than converting to HTML or Markdown.

This module is really for the rare cases where you want to convert Telegram-formatted text for consumption outside of Telegram.

npm install @telegraf/entity

Simple usage

Usage is very straightforward!

import { toHTML, toMarkdownV2 } from "@telegraf/entity";
// if Deno:
// import { toHTML, toMarkdownV2 } from "https://deno.land/x/telegraf_entity/mod.ts";

bot.on(message("text"), async ctx => {
    const html = toHTML(ctx.message); // convert text to HTML string
    const md = toMarkdownV2(ctx.message); // convert text to MarkdownV2 string
});

Both functions will also just work with captioned messages like photos or videos.

bot.on(message("photo"), async ctx => {
    const html = toHTML(ctx.message); // convert caption to HTML string
    const md = toMarkdownV2(ctx.message); // convert caption to MarkdownV2 string
});

You can also directly pass just a text and entities object:

toHTML({ text: '...', entities: [...] }); // HTML string

Advanced usage

toHTML and toMarkdown produce HTML or Markdown compatible with Telegram because it’s a sensible default for a Telegram library. You may want to serialise differently, to target a different system. This module exposes a way to do this: serialiseWith.

To use this, you must first implement a serialiser with the following type:

import type { Serialiser } from "@telegraf/entity";

const myHTMLSerialiser: Serialiser (match, node) {
    // implement
}

Each matched node will be passed to your function, and you only need to wrap it however you want.

Refer to the implementation of the builtin serialisers for something you can simply copy-paste and edit to satisfaction.

The builtin escapers are also exported for your convenience:

import { escapers, type Escaper } from "@telegraf/entity";

escapers.HTML(text); // HTML escaped text
escapers.MarkdownV2(text); // escaped for Telegram's MarkdownV2

// or
const yourEscaper: Escaper = match => { /* implement */ };

By using both of these tools, you can implement your own HTML serialiser like so:

import { serialiseWith, escapers } from "@telegraf/entity";

const serialise = serialiseWith(myHTMLSerialiser, escapers.HTML);
serialise(ctx.message);

Parsing HTML and Markdown into entities

We now have a fully Bot API-compliant parser for HTML and Markdown (MarkdownV2 not supported yet). This was ported over from tdlib, so it should parse exactly like Bot API does, and throw the same errors, but natively in JavaScript.

This is quite an advanced usecase, and because it’s just as strict as the official API, you will not be able to use this to leniently parse HTML or Markdown (for instance, to massage LLM output into entities for Telegram).

Some usecases enabled by this:

For Telegram client-like applications, which want to parse exactly like Telegram. It’s quite trivial to translate the resulting entities into TL types for use with MTProto, for instance.
The ability to statically check any piece of HTML or Markdown to ensure it’s valid.
Fold existing HTML/Markdown into entities, to be used with Telegraf’s fmt helpers which construct entities directly.

import { parse_html, parse_markdown } from "@telegraf/entity";

const parsed = parse_html(html); // { text: string, entities: MessageEntity[] }
// or
const parsed = parse_markdown(markdown); // { text: string, entities: MessageEntity[] }

Thanks to codinary.org for commissioning this feature.