A simple, deterministic, and extensible approach to inverse text normalization (ITN) for numbers.
This package converts raw spoken-form text (speech recognition output) into user-friendly written-form text. It works best for converting spoken numbers into numerical digits, or other translation tasks that do not modify word ordering. A csv file is provided to define the basic rules for transforming spoken tokens into written tokens, and extra pre/post-processing may be applied for more specific formatting requirements, i.e. dates, measurements, money, etc.

These examples were produced by running this script.
This package supports Python versions >= 3.7
To install from PyPI:
pip install itnpy2
To install locally:
pip install -e .
To run tests, use pytest
in the root folder of this repository:
pytest
This package has been verified on a limited set of test-cases. For any translation mistakes, feel free to open a pull request and update failing.csv with the input, expected output, and mistake; thanks!
If you find this work useful, please consider citing it.
@misc{hsu2022itn,
title = {A simple, deterministic, and extensible approach to inverse text normalization for numbers},
author = {Brandhsu},
howpublished = {https://github.com/barseghyanartur/itnpy},
year = {2022}
}