pyfcstm.utils.text
String normalization utilities for converting arbitrary strings to valid identifiers.
This module provides helper functions to normalize strings into valid identifier formats that can be used in programming contexts. It converts non-ASCII characters using transliteration, replaces invalid characters with underscores, and optionally enforces identifier rules such as not starting with a digit.
The module contains the following main components:
normalize()- Convenience wrapper for non-strict identifier conversionto_identifier()- Full identifier conversion with strict mode supportto_c_identifier()- Conservative C/C++-safe identifier conversionto_python_identifier()- Python-safe identifier conversionto_java_identifier()- Java-safe identifier conversionto_ruby_identifier()- Ruby-safe identifier conversionto_ts_identifier()- TypeScript-safe identifier conversionto_js_identifier()- JavaScript-safe identifier conversionto_rust_identifier()- Rust-safe identifier conversionto_go_identifier()- Go-safe identifier conversion
Example:
>>> from pyfcstm.utils.text import normalize, to_identifier
>>> normalize("Hello World!")
'Hello_World'
>>> to_identifier("123 Test", strict_mode=True)
'_123_Test'
>>> to_identifier("class", keyword_safe_for=['python', 'java'])
'class_'
IdentifierKeywordLanguage
- pyfcstm.utils.text.IdentifierKeywordLanguage
alias of
Literal[‘c’, ‘cpp’, ‘python’, ‘java’, ‘ruby’, ‘ts’, ‘js’, ‘rust’, ‘go’]
normalize
- pyfcstm.utils.text.normalize(input_string: str, keyword_safe_for: List[Literal['c', 'cpp', 'python', 'java', 'ruby', 'ts', 'js', 'rust', 'go']] | None = None) str[source]
Normalize a string to a valid identifier format.
This is a convenience wrapper around
to_identifier()withstrict_modeset toFalse. It replaces non-alphanumeric characters with underscores while allowing identifiers to start with digits and allowing empty input to return an empty string. When requested, it also avoids reserved words for selected target languages.- Parameters:
input_string (str) – The string to be normalized
keyword_safe_for (Optional[List[Literal['c', 'cpp', 'python', 'java', 'ruby', 'ts', 'js', 'rust', 'go']]], optional) – Optional target-language list whose reserved words should be avoided conservatively. Supported values are
'c','cpp','python','java','ruby','ts','js','rust', and'go'.
- Returns:
A normalized identifier string
- Return type:
str
- Raises:
TypeError – If
input_stringis not a stringValueError – If an unsupported language is listed in
keyword_safe_for.
Example:
>>> normalize("Hello World!") 'Hello_World' >>> normalize("123 Test") '123_Test' >>> normalize("class", keyword_safe_for=['python']) 'class_'
to_identifier
- pyfcstm.utils.text.to_identifier(input_string: str, strict_mode: bool = True, keyword_safe_for: List[Literal['c', 'cpp', 'python', 'java', 'ruby', 'ts', 'js', 'rust', 'go']] | None = None) str[source]
Convert any string to a valid identifier format
[0-9a-zA-Z_]+.Rules:
Preserve all letters and numbers after transliteration
Convert spaces and special characters to underscores
If
strict_modeisTrue, ensure the first character is not a numberIf
strict_modeisTrue, handle empty strings by returning"_empty"Avoid multiple consecutive underscores by collapsing them
Optionally avoid reserved words for selected target languages
- Parameters:
input_string (str) – The string to be converted
strict_mode (bool, optional) – When
True, applies additional rules to ensure identifier validity across most languages. WhenFalse, allows empty strings and identifiers starting with numbers.keyword_safe_for (Optional[List[Literal['c', 'cpp', 'python', 'java', 'ruby', 'ts', 'js', 'rust', 'go']]], optional) – Optional target-language list whose reserved words should be avoided conservatively. Supported values are
'c','cpp','python','java','ruby','ts','js','rust', and'go'.
- Returns:
A valid identifier string
- Return type:
str
- Raises:
TypeError – If
input_stringis not a stringValueError – If an unsupported language is listed in
keyword_safe_for.
Example:
>>> to_identifier("Hello World!", strict_mode=True) 'Hello_World' >>> to_identifier("123 Test", strict_mode=True) '_123_Test' >>> to_identifier("", strict_mode=True) '_empty' >>> to_identifier("class", keyword_safe_for=['python', 'java']) 'class_'
to_c_identifier
- pyfcstm.utils.text.to_c_identifier(input_string: str, strict_mode: bool = True) str[source]
Convert a string to an identifier that is safe for both C and C++ codegen.
This function first normalizes the value with
to_identifier(), then avoids identifiers that collide with common C/C++ reserved words by appending a trailing underscore when necessary.- Parameters:
input_string (str) – Source text to normalize.
strict_mode (bool, optional) – Whether to apply strict identifier normalization.
- Returns:
A C/C++-safe identifier.
- Return type:
str
to_cpp_identifier
- pyfcstm.utils.text.to_cpp_identifier(input_string: str, strict_mode: bool = True) str[source]
Convert a string to an identifier that is safe for C/C++ codegen.
The
cppselector is treated as an alias of the conservative C/C++ reserved-word superset used byto_c_identifier().- Parameters:
input_string (str) – Source text to normalize.
strict_mode (bool, optional) – Whether to apply strict identifier normalization.
- Returns:
A C/C++-safe identifier.
- Return type:
str
to_python_identifier
- pyfcstm.utils.text.to_python_identifier(input_string: str, strict_mode: bool = True) str[source]
Convert a string to an identifier that avoids Python reserved words.
- Parameters:
input_string (str) – Source text to normalize.
strict_mode (bool, optional) – Whether to apply strict identifier normalization.
- Returns:
A Python-safe identifier.
- Return type:
str
to_java_identifier
- pyfcstm.utils.text.to_java_identifier(input_string: str, strict_mode: bool = True) str[source]
Convert a string to an identifier that avoids Java reserved words.
- Parameters:
input_string (str) – Source text to normalize.
strict_mode (bool, optional) – Whether to apply strict identifier normalization.
- Returns:
A Java-safe identifier.
- Return type:
str
to_ruby_identifier
- pyfcstm.utils.text.to_ruby_identifier(input_string: str, strict_mode: bool = True) str[source]
Convert a string to an identifier that avoids Ruby reserved words.
- Parameters:
input_string (str) – Source text to normalize.
strict_mode (bool, optional) – Whether to apply strict identifier normalization.
- Returns:
A Ruby-safe identifier.
- Return type:
str
to_ts_identifier
- pyfcstm.utils.text.to_ts_identifier(input_string: str, strict_mode: bool = True) str[source]
Convert a string to an identifier that avoids TypeScript reserved words.
- Parameters:
input_string (str) – Source text to normalize.
strict_mode (bool, optional) – Whether to apply strict identifier normalization.
- Returns:
A TypeScript-safe identifier.
- Return type:
str
to_js_identifier
- pyfcstm.utils.text.to_js_identifier(input_string: str, strict_mode: bool = True) str[source]
Convert a string to an identifier that avoids JavaScript reserved words.
- Parameters:
input_string (str) – Source text to normalize.
strict_mode (bool, optional) – Whether to apply strict identifier normalization.
- Returns:
A JavaScript-safe identifier.
- Return type:
str
to_rust_identifier
- pyfcstm.utils.text.to_rust_identifier(input_string: str, strict_mode: bool = True) str[source]
Convert a string to an identifier that avoids Rust reserved words.
- Parameters:
input_string (str) – Source text to normalize.
strict_mode (bool, optional) – Whether to apply strict identifier normalization.
- Returns:
A Rust-safe identifier.
- Return type:
str
to_go_identifier
- pyfcstm.utils.text.to_go_identifier(input_string: str, strict_mode: bool = True) str[source]
Convert a string to an identifier that avoids Go reserved words.
- Parameters:
input_string (str) – Source text to normalize.
strict_mode (bool, optional) – Whether to apply strict identifier normalization.
- Returns:
A Go-safe identifier.
- Return type:
str