pyfcstm.utils.text
String normalization utilities for converting arbitrary strings to valid identifiers.
This module provides helper functions to normalize strings into valid identifier formats that can be used in programming contexts. It converts non-ASCII characters using transliteration, replaces invalid characters with underscores, and optionally enforces identifier rules such as not starting with a digit.
The module contains the following main components:
normalize()- Convenience wrapper for non-strict identifier conversionto_identifier()- Full identifier conversion with strict mode support
Example:
>>> from pyfcstm.utils.text import normalize, to_identifier
>>> normalize("Hello World!")
'Hello_World'
>>> to_identifier("123 Test", strict_mode=True)
'_123_Test'
normalize
- pyfcstm.utils.text.normalize(input_string: str) str[source]
Normalize a string to a valid identifier format.
This is a convenience wrapper around
to_identifier()withstrict_modeset toFalse. It replaces non-alphanumeric characters with underscores while allowing identifiers to start with digits and allowing empty input to return an empty string.- Parameters:
input_string (str) – The string to be normalized
- Returns:
A normalized identifier string
- Return type:
str
- Raises:
TypeError – If
input_stringis not a string
Example:
>>> normalize("Hello World!") 'Hello_World' >>> normalize("123 Test") '123_Test'
to_identifier
- pyfcstm.utils.text.to_identifier(input_string: str, strict_mode: bool = True) str[source]
Convert any string to a valid identifier format
[0-9a-zA-Z_]+.Rules:
Preserve all letters and numbers after transliteration
Convert spaces and special characters to underscores
If
strict_modeisTrue, ensure the first character is not a numberIf
strict_modeisTrue, handle empty strings by returning"_empty"Avoid multiple consecutive underscores by collapsing them
- Parameters:
input_string (str) – The string to be converted
strict_mode (bool, optional) – When
True, applies additional rules to ensure identifier validity across most languages. WhenFalse, allows empty strings and identifiers starting with numbers.
- Returns:
A valid identifier string
- Return type:
str
- Raises:
TypeError – If
input_stringis not a string
Example:
>>> to_identifier("Hello World!", strict_mode=True) 'Hello_World' >>> to_identifier("123 Test", strict_mode=True) '_123_Test' >>> to_identifier("", strict_mode=True) '_empty'