Blue Mountain
Resident Skeptical Hobbit
I've finally found some time to update the process to save the retrieved pages in UTF-8. Here's the updated _ae911.bat file:
Unfortunately, it didn't solve as many encoding issues as I had hoped. At one point a lot of the text appears to have been incorrectly converted from UTF-8 to ISO-8859-1. The result of the incorrect conversion was then stored in a Windows database encoded as Windows-1252, with the result that invalid ISO-8859-1 characters got coverted to "?". So that's why a word like "École" now appears as "�?cole" in the spreadsheet.
Code:
REM Sub-process #1 for "get_ae911.bat" to retrieve a JSON file containing
REM data from ae911truth.org and format it into a .txt file
REM Parameters:
REM %1: JSON file to process, without the trailing '.json'
REM %2: Name of file to which output is appended
echo Processing %1 list and writing to %2
set URL=https://ae911truth.org/signatures
powershell "(Invoke-Webrequest -Uri %URL%/%1.json -UseBasicParsing).content | Set-Content -Path _temp.txt"
sed -f _ae911.sed _temp.txt | awk -f _ae911.awk -v list=%1 >>%2
del _temp.txt
Unfortunately, it didn't solve as many encoding issues as I had hoped. At one point a lot of the text appears to have been incorrectly converted from UTF-8 to ISO-8859-1. The result of the incorrect conversion was then stored in a Windows database encoded as Windows-1252, with the result that invalid ISO-8859-1 characters got coverted to "?". So that's why a word like "École" now appears as "�?cole" in the spreadsheet.
Last edited: