Skip to content

Recognize supplementary (non-BMP) punctuations around emphasis delimiter runs #860

Open
@tats-u

Description

@tats-u

public static void CheckOpenCloseDelimiter(char pc, char c, bool enableWithinWord, out bool canOpen, out bool canClose)
{
pc.CheckUnicodeCategory(out bool prevIsWhiteSpace, out bool prevIsPunctuation);
c.CheckUnicodeCategory(out bool nextIsWhiteSpace, out bool nextIsPunctuation);
var prevIsExcepted = prevIsPunctuation && IsPunctuationException(pc);
var nextIsExcepted = nextIsPunctuation && IsPunctuationException(c);
// A left-flanking delimiter run is a delimiter run that is
// (1) not followed by Unicode whitespace, and either
// (2a) not followed by a punctuation character or
// (2b) followed by a punctuation character and preceded by Unicode whitespace or a punctuation character.
// For purposes of this definition, the beginning and the end of the line count as Unicode whitespace.
canOpen = !nextIsWhiteSpace &&
((!nextIsPunctuation || nextIsExcepted) || prevIsWhiteSpace || prevIsPunctuation);
// A right-flanking delimiter run is a delimiter run that is
// (1) not preceded by Unicode whitespace, and either
// (2a) not preceded by a punctuation character, or
// (2b) preceded by a punctuation character and followed by Unicode whitespace or a punctuation character.
// For purposes of this definition, the beginning and the end of the line count as Unicode whitespace.
canClose = !prevIsWhiteSpace &&
((!prevIsPunctuation || prevIsExcepted) || nextIsWhiteSpace || nextIsPunctuation);

The specs says "A character is a Unicode code point." However, the char type is just a UTF-16 code unit, which does not cover supplementary characters (whose code points are U+100000 or greater).

https://spec.commonmark.org/0.31.2/#character

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions