pyfcstm.highlight.pygments_lexer

Pygments lexer implementation for FCSTM DSL syntax highlighting.

This module defines FcstmLexer, a Pygments lexer tailored for the FCSTM (Finite State Machine) DSL. The lexer mirrors the FCSTM surface syntax defined by Grammar.g4 and provides highlighting support for Sphinx documentation as well as other Pygments-based tools.

The module exposes the following public component:

  • FcstmLexer - Regex-based lexer for FCSTM DSL tokens and comments

Note

The lexer is designed for use with Pygments and Sphinx’s code-block directive. It does not parse or validate DSL input. In particular, FcstmLexer.analyse_text() must remain a pure string/token heuristic and must not call the FCSTM parser/model loader, so malformed but still recognizably FCSTM snippets can continue to be detected.

Example:

>>> from pygments import highlight
>>> from pygments.formatters import HtmlFormatter
>>> from pygments.lexers import get_lexer_by_name
>>> code = 'state Root { import "./worker.fcstm" as Worker; }'
>>> lexer = get_lexer_by_name("fcstm")
>>> html = highlight(code, lexer, HtmlFormatter())

Usage in Sphinx documentation:

.. code-block:: fcstm

    state Root {
        import "./worker.fcstm" as Worker {
            def counter -> shared_counter;
            event /Start -> Start named "Mapped Start";
        }
    }

__all__

pyfcstm.highlight.pygments_lexer.__all__ = ['FcstmLexer']

Built-in mutable sequence.

If no argument is given, the constructor creates a new empty list. The argument must be an iterable if specified.

FcstmLexer

class pyfcstm.highlight.pygments_lexer.FcstmLexer(*args, **kwds)[source]

Lexer for FCSTM (Finite State Machine) DSL.

This lexer provides syntax highlighting for hierarchical state machine definitions in the FCSTM DSL. It recognizes keywords, operators, numbers, strings, comments (including nested multiline comments), and identifiers. The implementation uses stateful regular expressions via pygments.lexer.RegexLexer.

The lexer supports:

  • Variable definitions and types (def, int, float)

  • State and import definitions (state, pseudo, import, as, named)

  • Transitions and lifecycle actions (enter, during, exit)

  • Aspect-oriented actions (before, after, >>)

  • Guards and effects (if, effect)

  • Import mapping blocks (def mapping, event mapping, $n / ${n} templates)

  • Logical and arithmetic expressions

  • Events and scoped references (::)

Example:

>>> from pygments.lexers import get_lexer_by_name
>>> lexer = get_lexer_by_name("fcstm")
>>> list(lexer.get_tokens('state Root { import "./worker.fcstm" as Worker; }'))[:5]
[(Token.Keyword.Declaration, 'state'), ...]

Note

The lexer includes a heuristic analyse_text() method used by Pygments to guess if input text is likely FCSTM code.

aliases = ['fcstm', 'fcsm']

A list of short, unique identifiers that can be used to look up the lexer from a list, e.g., using get_lexer_by_name().

static analyse_text(text)

Analyze text to determine if it is likely FCSTM code.

This method is used by Pygments to heuristically determine whether the input should be lexed by FcstmLexer. It scans for key tokens and constructs a confidence score in the range 0.0 to 1.0.

The heuristic balances recall (detecting FCSTM files) with precision (avoiding false positives from other languages like C++, Rust, Java). It deliberately uses only string and token-stream operations. This keeps detection tolerant of incomplete or slightly broken FCSTM input without depending on a successful DSL parse/load round-trip.

Parameters:

text (str) – Text content to analyze

Returns:

Confidence score indicating likelihood of FCSTM syntax

Return type:

float

Example:

>>> # FCSTM code - should score high
>>> fcstm_code = '''
... def int counter = 0;
... state MyState {
...     enter { counter = 0; }
...     [*] -> Active;
... }
... '''
>>> FcstmLexer.analyse_text(fcstm_code)
1.0

>>> # C++ code - should score low
>>> cpp_code = '''
... class MyClass {
...     void enter() { counter = 0; }
...     std::vector<int> data;
... };
... '''
>>> FcstmLexer.analyse_text(cpp_code)
0.0

>>> # Python code - should score low
>>> python_code = '''
... def enter():
...     counter = 0
...     state = "active"
... '''
>>> FcstmLexer.analyse_text(python_code)
0.0

>>> # Java code - should score low
>>> java_code = '''
... public class State {
...     private int counter = 0;
...     public void enter() { counter = 0; }
... }
... '''
>>> FcstmLexer.analyse_text(java_code)
0.0

>>> # Rust code - should score low
>>> rust_code = '''
... struct State {
...     counter: i32,
... }
... impl State {
...     fn enter(&mut self) { self.counter = 0; }
... }
... '''
>>> FcstmLexer.analyse_text(rust_code)
0.0
filenames = ['*.fcstm']

A list of fnmatch patterns that match filenames which contain content for this lexer. The patterns in this list should be unique among all lexers.

mimetypes = ['text/x-fcstm']

A list of MIME types for content that can be lexed with this lexer.

name = 'FCSTM'

Full name of the lexer, in human-readable form

tokens = {'comment-multiline': [('[^*/]+', ('Comment', 'Multiline')), ('/\\*', ('Comment', 'Multiline'), '#push'), ('\\*/', ('Comment', 'Multiline'), '#pop'), ('[*/]', ('Comment', 'Multiline'))], 'comments': [('/\\*', ('Comment', 'Multiline'), 'comment-multiline'), ('//[^\\r\\n]*', ('Comment', 'Single')), ('#[^\\r\\n]*', ('Comment', 'Single'))], 'import-block': ['whitespace', 'comments', ('\\bdef\\b', ('Keyword', 'Declaration'), 'import-def-selector'), ('\\bevent\\b', ('Keyword', 'Declaration')), ('\\bnamed\\b', ('Keyword', 'Declaration')), ('->', ('Operator',)), ('\\$\\{[0-9]+\\}|\\$[0-9]+', ('Name', 'Variable')), ('[a-zA-Z_][a-zA-Z0-9_]*\\*(?:[a-zA-Z0-9_*]*)', ('Name', 'Variable')), ('\\*', ('Operator',)), ('/', ('Operator',)), ('\\.', ('Punctuation',)), (';', ('Punctuation',)), ('\\{', ('Punctuation',)), ('\\}', ('Punctuation',), '#pop'), ('"([^"\\\\]|\\\\[btnfr"\\\'\\\\]|\\\\[0-7]{1,3}|\\\\u[0-9a-fA-F]{4}|\\\\x[0-9a-fA-F]{2})*"', ('Literal', 'String', 'Double')), ('\'([^\'\\\\]|\\\\[btnfr\\"\'\\\\]|\\\\[0-7]{1,3}|\\\\u[0-9a-fA-F]{4}|\\\\x[0-9a-fA-F]{2})*\'', ('Literal', 'String', 'Single')), ('[a-zA-Z_][a-zA-Z0-9_]*', ('Name',))], 'import-def-selector': ['whitespace', 'comments', ('->', ('Operator',), 'import-def-target'), ('\\{', ('Punctuation',)), ('\\}', ('Punctuation',)), (',', ('Punctuation',)), ('[a-zA-Z_][a-zA-Z0-9_]*\\*(?:[a-zA-Z0-9_*]*)', ('Name', 'Variable')), ('\\*[a-zA-Z0-9_][a-zA-Z0-9_*]*', ('Name', 'Variable')), ('[a-zA-Z_][a-zA-Z0-9_]*', ('Name',)), ('\\*', ('Operator',))], 'import-def-target': ['whitespace', 'comments', (';', ('Punctuation',), ('#pop', '#pop')), ('\\$\\{[0-9]+\\}|\\$[0-9]+', ('Name', 'Variable')), ('[a-zA-Z_][a-zA-Z0-9_]*(?:(?:\\$\\{[0-9]+\\}|\\$[0-9]+|\\*)(?:[a-zA-Z0-9_]*))*', ('Name', 'Variable')), ('\\*', ('Operator',)), ('[a-zA-Z_][a-zA-Z0-9_]*', ('Name',))], 'import-header': ['whitespace', 'comments', ('\\b(?:as)\\b', ('Keyword', 'Declaration')), ('\\bnamed\\b', ('Keyword', 'Declaration')), ('"([^"\\\\]|\\\\[btnfr"\\\'\\\\]|\\\\[0-7]{1,3}|\\\\u[0-9a-fA-F]{4}|\\\\x[0-9a-fA-F]{2})*"', ('Literal', 'String', 'Double')), ('\'([^\'\\\\]|\\\\[btnfr\\"\'\\\\]|\\\\[0-7]{1,3}|\\\\u[0-9a-fA-F]{4}|\\\\x[0-9a-fA-F]{2})*\'', ('Literal', 'String', 'Single')), ('\\{', ('Punctuation',), ('#pop', 'import-block')), (';', ('Punctuation',), '#pop'), ('[a-zA-Z_][a-zA-Z0-9_]*', ('Name',))], 'root': ['whitespace', 'comments', ('\\bimport\\b', ('Keyword', 'Declaration'), 'import-header'), (<pygments.lexer.words object>, ('Keyword', 'Declaration')), (<pygments.lexer.words object>, ('Keyword', 'Reserved')), (<pygments.lexer.words object>, ('Keyword', 'Namespace')), (<pygments.lexer.words object>, ('Keyword', 'Type')), (<pygments.lexer.words object>, ('Keyword', 'Reserved')), (<pygments.lexer.words object>, ('Operator', 'Word')), (<pygments.lexer.words object>, ('Keyword', 'Constant')), (<pygments.lexer.words object>, ('Name', 'Constant')), (<pygments.lexer.words object>, ('Name', 'Builtin')), ('>>', ('Operator', 'Word')), ('->', ('Operator',)), ('\\[\\*\\]', ('Keyword', 'Pseudo')), ('::', ('Operator',)), (':', ('Punctuation',)), ('/', ('Operator',)), ('0x[0-9a-fA-F]+', ('Literal', 'Number', 'Hex')), ('[0-9]+\\.[0-9]*([eE][+-]?[0-9]+)?', ('Literal', 'Number', 'Float')), ('\\.[0-9]+([eE][+-]?[0-9]+)?', ('Literal', 'Number', 'Float')), ('[0-9]+[eE][+-]?[0-9]+', ('Literal', 'Number', 'Float')), ('[0-9]+', ('Literal', 'Number', 'Integer')), ('\\*\\*', ('Operator',)), ('<<', ('Operator',)), ('<=|>=|==|!=', ('Operator',)), ('&&|\\|\\|', ('Operator',)), ('!', ('Operator', 'Word')), ('[+\\-*/%&|^~<>]', ('Operator',)), ('=|\\?', ('Operator',)), ('[{}()\\[\\];,.]', ('Punctuation',)), ('"([^"\\\\]|\\\\[btnfr"\\\'\\\\]|\\\\[0-7]{1,3}|\\\\u[0-9a-fA-F]{4}|\\\\x[0-9a-fA-F]{2})*"', ('Literal', 'String', 'Double')), ('\'([^\'\\\\]|\\\\[btnfr\\"\'\\\\]|\\\\[0-7]{1,3}|\\\\u[0-9a-fA-F]{4}|\\\\x[0-9a-fA-F]{2})*\'', ('Literal', 'String', 'Single')), ('[a-zA-Z_][a-zA-Z0-9_]*', ('Name',))], 'whitespace': [('\\s+', ('Text', 'Whitespace'))]}

At all time there is a stack of states. Initially, the stack contains a single state ‘root’. The top of the stack is called “the current state”.

Dict of {'state': [(regex, tokentype, new_state), ...], ...}

new_state can be omitted to signify no state transition. If new_state is a string, it is pushed on the stack. This ensure the new current state is new_state. If new_state is a tuple of strings, all of those strings are pushed on the stack and the current state will be the last element of the list. new_state can also be combined('state1', 'state2', ...) to signify a new, anonymous state combined from the rules of two or more existing ones. Furthermore, it can be ‘#pop’ to signify going back one step in the state stack, or ‘#push’ to push the current state on the stack again. Note that if you push while in a combined state, the combined state itself is pushed, and not only the state in which the rule is defined.

The tuple can also be replaced with include('state'), in which case the rules from the state named by the string are included in the current one.