Skip to content

Unix CLI that prints lines that contain non-ASCII characters.

Notifications You must be signed in to change notification settings

mklement0/print-nonascii

Folders and files

NameName
Last commit message
Last commit date

Latest commit

fae7677 · Dec 27, 2022

History

5 Commits
Sep 11, 2017
Sep 11, 2017
Sep 11, 2017
Sep 10, 2017
Sep 10, 2017
Sep 10, 2017
Sep 11, 2017
Sep 10, 2017
Jul 25, 2018
Dec 27, 2022
Sep 11, 2017
Sep 11, 2017

Repository files navigation

npm versionlicense

Contents

print-nonascii: print lines that contain non-ASCII characters.

print-nonascii is a Unix CLI that locates lines in text files or
stdin input that contain non-ASCII characters, which is helpful when
diagnosing character encoding problems.

Lines can be printed as-is and/or using abstract representations of non-ASCII characters in one of several formats; namely:

  • -v, --caret ... the same representation cat -v uses, based on caret notation.
  • --bash ... per-byte two-digit hex. escape sequences such as \xc3
  • --psh ... PowerShell Unicode escape sequences such as `u{20ac} for

Note: --psh only works correctly with properly UTF-8-encoded input.

Line numbers can be prepended on request, and output for multiple input files is by default preceded with headers identifying each input file.

Caveat: For now, no automated tests are run before releases.

Examples

# Create a test file with 1 line containing a non-ASCII character.
$ cat <<'EOF' > /tmp/test.txt
one
twö
three
EOF

# Print only lines that have non-ASCII characters, as-is.
$ print-nonascii /tmp/test.txt
twö

# Print only lines that have non-ASCII characters, with line numbers:
$ print-nonascii -n /tmp/test.txt
2:twö

# Print only lines that have non-ASCII characters, using PowerShell 
# Unicode escape-sequence notation (--psh), preceded by the 
# line as-is (--raw).
# The Unicode code point of character "ö" is U+00F6:
$ print-nonascii --psh --raw /tmp/test.txt
twö
tw`u{f6}

# Ditto with line numbers and per-byte Bash escape sequences:
$ print-nonascii --bash --raw /tmp/test.txt
twö
tw\xc3\xb6

# Simulate input from multiple files by specifying the same file
# twice, so as to show the headers identifying each input file 
# (suppress with -b).
# Note that each header line (invisibly) starts with control 
# character U+0001, so as to allow more predictable
# identification of header lines in the output.
$ print-nonascii -n /tmp/test.txt /tmp/test.txt 
###	/tmp/test.txt
2:twö
�###	/tmp/test.txt
2:twö

Installation

Prerequisites

  • When installing from the npm registry: macOS and Linux
  • When installing manually: any Unix platform with bash that also has perl installed.

Installation from the npm registry

With Node.js installed, install the package as follows:

[sudo] npm install print-nonascii -g

Note:

Note: Even if you don't use Node.js, its package manager, npm, works across platforms and is easy to install; try curl -L https://git.io/n-install | bash

  • Whether you need sudo depends on how you installed Node.js / io.js and whether you've changed permissions later; if you get an EACCES error, try again with sudo.
  • The -g ensures global installation and is needed to put print-nonascii in your system's $PATH.

Manual installation

  • Download the CLI as print-nonascii.
  • Make it executable with chmod +x print-nonascii.
  • Move it or symlink it to a folder in your $PATH, such as /usr/local/bin (macOS) or /usr/bin (Linux).

Usage

Find concise usage information below; for complete documentation, read the manual online, or, once installed, run man print-nonascii (print-nonascii --man if installed manually).

$ print-nonascii --help


Prints lines that contain non-ASCII characters.

    print-nonascii [--<mode> [-r]] [-n] [-b] [file ...]
    print-nonascii -q                        [file ...]

    --<mode> prints abstract representations of non-ASCII chars.; one of:
      --caret, -v ... use caret notation, as cat -v would.
      --bash ... represent non-ASCII bytes as \xhh 
      --psh ... (PowerShell) represent non-ASCII Unicode characters as  
                Unicode escape sequences: <backtick>u{h...}
    
    -r, --raw ... with --<mode>, print each matching line as-is too, first.

    -n, --line-number ... prefix the output lines with their line number from  
     the original file, using format "<line-number>:" - decimal line numbers,  
     no padding, no space before or after the ":"

    -b, --bare ... suppress per-input-filename headers

    -q ... quiet mode: produce no output; signal presence of non-ASCII chars.  
           with exit code 0; exit code 100 signals that there are none.

Standard options: --help, --man, --version, --home

License

Copyright (c) 2017 Michael Klement [email protected] (http://same2u.net), released under the MIT license.

Acknowledgements

This project gratefully depends on the following open-source components, according to the terms of their respective licenses.

npm dependencies below have an optional suffix denoting the type of dependency: the absence of a suffix denotes a required run-time dependency; (D) denotes a development-time-only dependency, (O) an optional dependency, and (P) a peer dependency.

npm dependencies

Changelog

Versioning complies with semantic versioning (semver).

  • v0.0.3 (2017-09-11):

    • [enhancement] Header lines are now only printed for input files that produce at least 1 output line.
  • v0.0.2 (2017-09-10):

    • [fix] Header line is no longer printed twice when --<mode> is combined with --raw.
    • Header line now uses a tab char. to separate prefix ### from the filename.
  • v0.0.1 (2017-09-10):

    • Initial release.

About

Unix CLI that prints lines that contain non-ASCII characters.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published