pyfcstm.highlight.pygments_lexer
Pygments lexer implementation for FCSTM DSL syntax highlighting.
This module defines FcstmLexer, a Pygments lexer tailored for the
FCSTM (Finite State Machine) DSL. The lexer is based on the ANTLR grammar in
Grammar.g4 and provides syntax highlighting support for Sphinx
documentation as well as other Pygments-based tools.
The module exposes the following public component:
FcstmLexer- Regex-based lexer for FCSTM DSL tokens and comments
Note
The lexer is designed for use with Pygments and Sphinx’s code-block
directive. It does not perform parsing or validation; it only assigns
token types based on regular expressions.
Example:
>>> from pygments import highlight
>>> from pygments.formatters import HtmlFormatter
>>> from pygments.lexers import get_lexer_by_name
>>> code = "state MyState { enter { counter = 0; } }"
>>> lexer = get_lexer_by_name("fcstm")
>>> html = highlight(code, lexer, HtmlFormatter())
Usage in Sphinx documentation:
.. code-block:: fcstm
def int counter = 0;
state MyState {
enter { counter = 0; }
}
__all__
- pyfcstm.highlight.pygments_lexer.__all__ = ['FcstmLexer']
Built-in mutable sequence.
If no argument is given, the constructor creates a new empty list. The argument must be an iterable if specified.
FcstmLexer
- class pyfcstm.highlight.pygments_lexer.FcstmLexer(*args, **kwds)[source]
Lexer for FCSTM (Finite State Machine) DSL.
This lexer provides syntax highlighting for hierarchical state machine definitions in the FCSTM DSL. It recognizes keywords, operators, numbers, strings, comments (including nested multiline comments), and identifiers. The implementation uses stateful regular expressions via
pygments.lexer.RegexLexer.The lexer supports:
Variable definitions and types (
def,int,float)State definitions (
state,pseudo,named)Transitions and lifecycle actions (
enter,during,exit)Aspect-oriented actions (
before,after,>>)Guards and effects (
if,effect)Logical and arithmetic expressions
Events and scoped references (
::)
Example:
>>> from pygments.lexers import get_lexer_by_name >>> lexer = get_lexer_by_name("fcstm") >>> list(lexer.get_tokens("state A { enter { x = 1; } }"))[:5] [(Token.Keyword.Declaration, 'state'), ...]
Note
The lexer includes a heuristic
analyse_text()method used by Pygments to guess if input text is likely FCSTM code.- aliases = ['fcstm', 'fcsm']
A list of short, unique identifiers that can be used to look up the lexer from a list, e.g., using get_lexer_by_name().
- static analyse_text(text)
Analyze text to determine if it is likely FCSTM code.
This method is used by Pygments to heuristically determine whether the input should be lexed by
FcstmLexer. It scans for key tokens and constructs a confidence score in the range0.0to1.0.The heuristic balances recall (detecting FCSTM files) with precision (avoiding false positives from other languages like C++, Rust, Java).
- Parameters:
text (str) – Text content to analyze
- Returns:
Confidence score indicating likelihood of FCSTM syntax
- Return type:
float
Example:
>>> # FCSTM code - should score high (0.95) >>> fcstm_code = ''' ... def int counter = 0; ... state MyState { ... enter { counter = 0; } ... [*] -> Active; ... } ... ''' >>> FcstmLexer.analyse_text(fcstm_code) 0.95 >>> # C++ code - should score low (0.00) >>> cpp_code = ''' ... class MyClass { ... void enter() { counter = 0; } ... std::vector<int> data; ... }; ... ''' >>> FcstmLexer.analyse_text(cpp_code) 0.0 >>> # Python code - should score low (0.00) >>> python_code = ''' ... def enter(): ... counter = 0 ... state = "active" ... ''' >>> FcstmLexer.analyse_text(python_code) 0.0 >>> # Java code - should score low (0.00) >>> java_code = ''' ... public class State { ... private int counter = 0; ... public void enter() { counter = 0; } ... } ... ''' >>> FcstmLexer.analyse_text(java_code) 0.0 >>> # Rust code - should score low (0.00) >>> rust_code = ''' ... struct State { ... counter: i32, ... } ... impl State { ... fn enter(&mut self) { self.counter = 0; } ... } ... ''' >>> FcstmLexer.analyse_text(rust_code) 0.0
- filenames = ['*.fcstm']
A list of fnmatch patterns that match filenames which contain content for this lexer. The patterns in this list should be unique among all lexers.
- mimetypes = ['text/x-fcstm']
A list of MIME types for content that can be lexed with this lexer.
- name = 'FCSTM'
Full name of the lexer, in human-readable form
- tokens = {'comment-multiline': [('[^*/]+', ('Comment', 'Multiline')), ('/\\*', ('Comment', 'Multiline'), '#push'), ('\\*/', ('Comment', 'Multiline'), '#pop'), ('[*/]', ('Comment', 'Multiline'))], 'comments': [('/\\*', ('Comment', 'Multiline'), 'comment-multiline'), ('//[^\\r\\n]*', ('Comment', 'Single')), ('#[^\\r\\n]*', ('Comment', 'Single'))], 'root': ['whitespace', 'comments', (<pygments.lexer.words object>, ('Keyword', 'Declaration')), (<pygments.lexer.words object>, ('Keyword', 'Reserved')), (<pygments.lexer.words object>, ('Keyword', 'Namespace')), (<pygments.lexer.words object>, ('Keyword', 'Type')), (<pygments.lexer.words object>, ('Keyword', 'Reserved')), (<pygments.lexer.words object>, ('Operator', 'Word')), (<pygments.lexer.words object>, ('Keyword', 'Constant')), (<pygments.lexer.words object>, ('Name', 'Constant')), (<pygments.lexer.words object>, ('Name', 'Builtin')), ('>>', ('Operator', 'Word')), ('->', ('Operator',)), ('\\[\\*\\]', ('Keyword', 'Pseudo')), ('::', ('Operator',)), (':', ('Punctuation',)), ('/', ('Operator',)), ('0x[0-9a-fA-F]+', ('Literal', 'Number', 'Hex')), ('[0-9]+\\.[0-9]*([eE][+-]?[0-9]+)?', ('Literal', 'Number', 'Float')), ('\\.[0-9]+([eE][+-]?[0-9]+)?', ('Literal', 'Number', 'Float')), ('[0-9]+[eE][+-]?[0-9]+', ('Literal', 'Number', 'Float')), ('[0-9]+', ('Literal', 'Number', 'Integer')), ('\\*\\*', ('Operator',)), ('<<', ('Operator',)), ('<=|>=|==|!=', ('Operator',)), ('&&|\\|\\|', ('Operator',)), ('!', ('Operator', 'Word')), ('[+\\-*/%&|^~<>]', ('Operator',)), ('=|\\?', ('Operator',)), ('[{}()\\[\\];,.]', ('Punctuation',)), ('"([^"\\\\]|\\\\[btnfr"\\\'\\\\]|\\\\[0-7]{1,3}|\\\\u[0-9a-fA-F]{4}|\\\\x[0-9a-fA-F]{2})*"', ('Literal', 'String', 'Double')), ('\'([^\'\\\\]|\\\\[btnfr\\"\'\\\\]|\\\\[0-7]{1,3}|\\\\u[0-9a-fA-F]{4}|\\\\x[0-9a-fA-F]{2})*\'', ('Literal', 'String', 'Single')), ('[a-zA-Z_][a-zA-Z0-9_]*', ('Name',))], 'whitespace': [('\\s+', ('Text', 'Whitespace'))]}
At all time there is a stack of states. Initially, the stack contains a single state ‘root’. The top of the stack is called “the current state”.
Dict of
{'state': [(regex, tokentype, new_state), ...], ...}new_statecan be omitted to signify no state transition. Ifnew_stateis a string, it is pushed on the stack. This ensure the new current state isnew_state. Ifnew_stateis a tuple of strings, all of those strings are pushed on the stack and the current state will be the last element of the list.new_statecan also becombined('state1', 'state2', ...)to signify a new, anonymous state combined from the rules of two or more existing ones. Furthermore, it can be ‘#pop’ to signify going back one step in the state stack, or ‘#push’ to push the current state on the stack again. Note that if you push while in a combined state, the combined state itself is pushed, and not only the state in which the rule is defined.The tuple can also be replaced with
include('state'), in which case the rules from the state named by the string are included in the current one.