Finding illegal characters in a directory
- Error message produced by HTML proofer
- Finding all illegal (non-ascii) characters in html files inside a folder
Sometimes you need to grep for occurrences of invalid files in an entire directory. I personally came across this issue when running htmlproofer to validate this blog’s generated files. Let’s take a look at the error message and then the solution.
Error message produced by HTML proofer
1
2
3
(....)
'reencode'
/Users/joaorocha/.rvm/gems/ruby-2.5.1/gems/nokogumbo-2.0.2/lib/nokogumbo/html5.rb:164:in 'encode': "\xC3" on US-ASCII (Encoding::InvalidByteSequenceError)
After that very enlightening error message (sarcasm), I decided to look for the problematic file in my site/
directory, which contains the HTML generated by Jekyll.
Finding all illegal (non-ascii) characters in html files inside a folder
First, install pcregrep on macOS:
1
brew install pcre
Then, we scan all files within the ./_site
directory with an html
extension, and injects their paths in the a pcregrep
command.
1
find "./_site" -name "*.html" | xargs pcregrep --color='auto' -n '[^\x00-\x7F]' {}
Credits for the pcregreg
section here.
Comments
Post comment