Skip to content

words ¤

This module contains a function to retrieve words from HTML text.

Functions:

Name Description
get_words

Get words in HTML text.

get_words(html: str, *, known_words: set[str] | None = None, min_length: int = 2, max_capital: int = 1, ignore_code: bool = True, allow_unicode: bool = True) -> list[str] ¤

Get words in HTML text.

Parameters:

Name Type Description Default
html str

The HTML text.

required
known_words set[str] | None

Words to exclude.

None
min_length int

Words minimum length.

2
max_capital int

Maximum number of capital letters.

1
ignore_code bool

Ignore words in code tags.

True
allow_unicode bool

Keep unicode characters.

True

Returns:

Type Description
list[str]

A list of words.

Source code in src/mkdocs_spellcheck/words.py
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
def get_words(
    html: str,
    *,
    known_words: set[str] | None = None,
    min_length: int = 2,
    max_capital: int = 1,
    ignore_code: bool = True,
    allow_unicode: bool = True,
) -> list[str]:
    """Get words in HTML text.

    Parameters:
        html: The HTML text.
        known_words: Words to exclude.
        min_length: Words minimum length.
        max_capital: Maximum number of capital letters.
        ignore_code: Ignore words in code tags.
        allow_unicode: Keep unicode characters.

    Returns:
        A list of words.
    """
    known_words = known_words or set()
    keep = partial(_keep_word, min_length=min_length, max_capital=max_capital)
    filtered = filter(keep, _normalize(_strip_tags(html, ignore_code), allow_unicode).split("-"))
    words = {word.lower() for word in filtered}
    return sorted(words - known_words)