pyfcstm.highlight.pygments_lexer
Pygments lexer implementation for FCSTM DSL syntax highlighting.
This module defines FcstmLexer, a Pygments lexer tailored for the
FCSTM (Finite State Machine) DSL. The lexer mirrors the FCSTM surface syntax
defined by Grammar.g4 and provides highlighting support for Sphinx
documentation as well as other Pygments-based tools.
The module exposes the following public component:
FcstmLexer- Regex-based lexer for FCSTM DSL tokens and comments
Note
The lexer is designed for use with Pygments and Sphinx’s code-block
directive. It does not parse or validate DSL input. In particular,
FcstmLexer.analyse_text() must remain a pure string/token heuristic
and must not call the FCSTM parser/model loader, so malformed but still
recognizably FCSTM snippets can continue to be detected.
Example:
>>> from pygments import highlight
>>> from pygments.formatters import HtmlFormatter
>>> from pygments.lexers import get_lexer_by_name
>>> code = 'state Root { import "./worker.fcstm" as Worker; }'
>>> lexer = get_lexer_by_name("fcstm")
>>> html = highlight(code, lexer, HtmlFormatter())
Usage in Sphinx documentation:
.. code-block:: fcstm
state Root {
import "./worker.fcstm" as Worker {
def counter -> shared_counter;
event /Start -> Start named "Mapped Start";
}
}
__all__
- pyfcstm.highlight.pygments_lexer.__all__ = ['FcstmLexer']
Built-in mutable sequence.
If no argument is given, the constructor creates a new empty list. The argument must be an iterable if specified.
FcstmLexer
- class pyfcstm.highlight.pygments_lexer.FcstmLexer(*args, **kwds)[source]
Lexer for FCSTM (Finite State Machine) DSL.
This lexer provides syntax highlighting for hierarchical state machine definitions in the FCSTM DSL. It recognizes keywords, operators, numbers, strings, comments (including nested multiline comments), and identifiers. The implementation uses stateful regular expressions via
pygments.lexer.RegexLexer.The lexer supports:
Variable definitions and types (
def,int,float)State and import definitions (
state,pseudo,import,as,named)Transitions and lifecycle actions (
enter,during,exit)Aspect-oriented actions (
before,after,>>)Guards and effects (
if,effect)Import mapping blocks (
defmapping,eventmapping,$n/${n}templates)Logical and arithmetic expressions
Events and scoped references (
::)
Example:
>>> from pygments.lexers import get_lexer_by_name >>> lexer = get_lexer_by_name("fcstm") >>> list(lexer.get_tokens('state Root { import "./worker.fcstm" as Worker; }'))[:5] [(Token.Keyword.Declaration, 'state'), ...]
Note
The lexer includes a heuristic
analyse_text()method used by Pygments to guess if input text is likely FCSTM code.- aliases = ['fcstm', 'fcsm']
A list of short, unique identifiers that can be used to look up the lexer from a list, e.g., using get_lexer_by_name().
- static analyse_text(text)
Analyze text to determine if it is likely FCSTM code.
This method is used by Pygments to heuristically determine whether the input should be lexed by
FcstmLexer. It scans for key tokens and constructs a confidence score in the range0.0to1.0.The heuristic balances recall (detecting FCSTM files) with precision (avoiding false positives from other languages like C++, Rust, Java). It deliberately uses only string and token-stream operations. This keeps detection tolerant of incomplete or slightly broken FCSTM input without depending on a successful DSL parse/load round-trip.
- Parameters:
text (str) – Text content to analyze
- Returns:
Confidence score indicating likelihood of FCSTM syntax
- Return type:
float
Example:
>>> # FCSTM code - should score high >>> fcstm_code = ''' ... def int counter = 0; ... state MyState { ... enter { counter = 0; } ... [*] -> Active; ... } ... ''' >>> FcstmLexer.analyse_text(fcstm_code) 1.0 >>> # C++ code - should score low >>> cpp_code = ''' ... class MyClass { ... void enter() { counter = 0; } ... std::vector<int> data; ... }; ... ''' >>> FcstmLexer.analyse_text(cpp_code) 0.0 >>> # Python code - should score low >>> python_code = ''' ... def enter(): ... counter = 0 ... state = "active" ... ''' >>> FcstmLexer.analyse_text(python_code) 0.0 >>> # Java code - should score low >>> java_code = ''' ... public class State { ... private int counter = 0; ... public void enter() { counter = 0; } ... } ... ''' >>> FcstmLexer.analyse_text(java_code) 0.0 >>> # Rust code - should score low >>> rust_code = ''' ... struct State { ... counter: i32, ... } ... impl State { ... fn enter(&mut self) { self.counter = 0; } ... } ... ''' >>> FcstmLexer.analyse_text(rust_code) 0.0
- filenames = ['*.fcstm']
A list of fnmatch patterns that match filenames which contain content for this lexer. The patterns in this list should be unique among all lexers.
- mimetypes = ['text/x-fcstm']
A list of MIME types for content that can be lexed with this lexer.
- name = 'FCSTM'
Full name of the lexer, in human-readable form
- tokens = {'comment-multiline': [('[^*/]+', ('Comment', 'Multiline')), ('/\\*', ('Comment', 'Multiline'), '#push'), ('\\*/', ('Comment', 'Multiline'), '#pop'), ('[*/]', ('Comment', 'Multiline'))], 'comments': [('/\\*', ('Comment', 'Multiline'), 'comment-multiline'), ('//[^\\r\\n]*', ('Comment', 'Single')), ('#[^\\r\\n]*', ('Comment', 'Single'))], 'import-block': ['whitespace', 'comments', ('\\bdef\\b', ('Keyword', 'Declaration'), 'import-def-selector'), ('\\bevent\\b', ('Keyword', 'Declaration')), ('\\bnamed\\b', ('Keyword', 'Declaration')), ('->', ('Operator',)), ('\\$\\{[0-9]+\\}|\\$[0-9]+', ('Name', 'Variable')), ('[a-zA-Z_][a-zA-Z0-9_]*\\*(?:[a-zA-Z0-9_*]*)', ('Name', 'Variable')), ('\\*', ('Operator',)), ('/', ('Operator',)), ('\\.', ('Punctuation',)), (';', ('Punctuation',)), ('\\{', ('Punctuation',)), ('\\}', ('Punctuation',), '#pop'), ('"([^"\\\\]|\\\\[btnfr"\\\'\\\\]|\\\\[0-7]{1,3}|\\\\u[0-9a-fA-F]{4}|\\\\x[0-9a-fA-F]{2})*"', ('Literal', 'String', 'Double')), ('\'([^\'\\\\]|\\\\[btnfr\\"\'\\\\]|\\\\[0-7]{1,3}|\\\\u[0-9a-fA-F]{4}|\\\\x[0-9a-fA-F]{2})*\'', ('Literal', 'String', 'Single')), ('[a-zA-Z_][a-zA-Z0-9_]*', ('Name',))], 'import-def-selector': ['whitespace', 'comments', ('->', ('Operator',), 'import-def-target'), ('\\{', ('Punctuation',)), ('\\}', ('Punctuation',)), (',', ('Punctuation',)), ('[a-zA-Z_][a-zA-Z0-9_]*\\*(?:[a-zA-Z0-9_*]*)', ('Name', 'Variable')), ('\\*[a-zA-Z0-9_][a-zA-Z0-9_*]*', ('Name', 'Variable')), ('[a-zA-Z_][a-zA-Z0-9_]*', ('Name',)), ('\\*', ('Operator',))], 'import-def-target': ['whitespace', 'comments', (';', ('Punctuation',), ('#pop', '#pop')), ('\\$\\{[0-9]+\\}|\\$[0-9]+', ('Name', 'Variable')), ('[a-zA-Z_][a-zA-Z0-9_]*(?:(?:\\$\\{[0-9]+\\}|\\$[0-9]+|\\*)(?:[a-zA-Z0-9_]*))*', ('Name', 'Variable')), ('\\*', ('Operator',)), ('[a-zA-Z_][a-zA-Z0-9_]*', ('Name',))], 'import-header': ['whitespace', 'comments', ('\\b(?:as)\\b', ('Keyword', 'Declaration')), ('\\bnamed\\b', ('Keyword', 'Declaration')), ('"([^"\\\\]|\\\\[btnfr"\\\'\\\\]|\\\\[0-7]{1,3}|\\\\u[0-9a-fA-F]{4}|\\\\x[0-9a-fA-F]{2})*"', ('Literal', 'String', 'Double')), ('\'([^\'\\\\]|\\\\[btnfr\\"\'\\\\]|\\\\[0-7]{1,3}|\\\\u[0-9a-fA-F]{4}|\\\\x[0-9a-fA-F]{2})*\'', ('Literal', 'String', 'Single')), ('\\{', ('Punctuation',), ('#pop', 'import-block')), (';', ('Punctuation',), '#pop'), ('[a-zA-Z_][a-zA-Z0-9_]*', ('Name',))], 'root': ['whitespace', 'comments', ('\\bimport\\b', ('Keyword', 'Declaration'), 'import-header'), (<pygments.lexer.words object>, ('Keyword', 'Declaration')), (<pygments.lexer.words object>, ('Keyword', 'Reserved')), (<pygments.lexer.words object>, ('Keyword', 'Namespace')), (<pygments.lexer.words object>, ('Keyword', 'Type')), (<pygments.lexer.words object>, ('Keyword', 'Reserved')), (<pygments.lexer.words object>, ('Operator', 'Word')), (<pygments.lexer.words object>, ('Keyword', 'Constant')), (<pygments.lexer.words object>, ('Name', 'Constant')), (<pygments.lexer.words object>, ('Name', 'Builtin')), ('>>', ('Operator', 'Word')), ('->', ('Operator',)), ('\\[\\*\\]', ('Keyword', 'Pseudo')), ('::', ('Operator',)), (':', ('Punctuation',)), ('/', ('Operator',)), ('0x[0-9a-fA-F]+', ('Literal', 'Number', 'Hex')), ('[0-9]+\\.[0-9]*([eE][+-]?[0-9]+)?', ('Literal', 'Number', 'Float')), ('\\.[0-9]+([eE][+-]?[0-9]+)?', ('Literal', 'Number', 'Float')), ('[0-9]+[eE][+-]?[0-9]+', ('Literal', 'Number', 'Float')), ('[0-9]+', ('Literal', 'Number', 'Integer')), ('\\*\\*', ('Operator',)), ('<<', ('Operator',)), ('<=|>=|==|!=', ('Operator',)), ('&&|\\|\\|', ('Operator',)), ('!', ('Operator', 'Word')), ('[+\\-*/%&|^~<>]', ('Operator',)), ('=|\\?', ('Operator',)), ('[{}()\\[\\];,.]', ('Punctuation',)), ('"([^"\\\\]|\\\\[btnfr"\\\'\\\\]|\\\\[0-7]{1,3}|\\\\u[0-9a-fA-F]{4}|\\\\x[0-9a-fA-F]{2})*"', ('Literal', 'String', 'Double')), ('\'([^\'\\\\]|\\\\[btnfr\\"\'\\\\]|\\\\[0-7]{1,3}|\\\\u[0-9a-fA-F]{4}|\\\\x[0-9a-fA-F]{2})*\'', ('Literal', 'String', 'Single')), ('[a-zA-Z_][a-zA-Z0-9_]*', ('Name',))], 'whitespace': [('\\s+', ('Text', 'Whitespace'))]}
At all time there is a stack of states. Initially, the stack contains a single state ‘root’. The top of the stack is called “the current state”.
Dict of
{'state': [(regex, tokentype, new_state), ...], ...}new_statecan be omitted to signify no state transition. Ifnew_stateis a string, it is pushed on the stack. This ensure the new current state isnew_state. Ifnew_stateis a tuple of strings, all of those strings are pushed on the stack and the current state will be the last element of the list.new_statecan also becombined('state1', 'state2', ...)to signify a new, anonymous state combined from the rules of two or more existing ones. Furthermore, it can be ‘#pop’ to signify going back one step in the state stack, or ‘#push’ to push the current state on the stack again. Note that if you push while in a combined state, the combined state itself is pushed, and not only the state in which the rule is defined.The tuple can also be replaced with
include('state'), in which case the rules from the state named by the string are included in the current one.