-
-
Notifications
You must be signed in to change notification settings - Fork 10
Description
I am running Ubuntu WSL
I was running the following to find files with Windows-1252 encoding and convert them to UTF-8:
LC_ALL=C.UTF-8 find . -type f \( -name '*txt' -or -name '*html' -or -name '*htm' \) -exec grep -laxv '.*' {} + | xargs uchardet | grep WINDOWS-1252 | cut -d: -f1 | xargs -n 1 recode -t 'windows-1252..UTF-8'
Unfortunately it seems that recode does not work properly with NTFS filesystems at all
I ended up with hundreds of these messages:
recode: chmod (./path/path/rec4308.tmp): Operation not permitted
recode: chmod (./path/path/rec4309.tmp): Operation not permitted
recode: chmod (./path/path/rec4310.tmp): Operation not permitted
recode: chmod (./path/path/rec4311.tmp): Operation not permitted
recode: chmod (./path/path/rec4312.tmp): Operation not permitted
recode: chmod (./path/path/rec4313.tmp): Operation not permitted
recode: chmod (./path/path/rec4314.tmp): Operation not permitted
recode: chmod (./path/path/rec4315.tmp): Operation not permitted
recode: chmod (./path/path/rec4316.tmp): Operation not permitted
recode: chmod (./path/path/rec4317.tmp): Operation not permitted
recode: chmod (./path/path/rec4318.tmp): Operation not permitted
recode: chmod (./path/path/rec4319.tmp): Operation not permitted
recode: chmod (./path/path/rec4320.tmp): Operation not permitted
recode: chmod (./path/path/rec4321.tmp): Operation not permitted
All of the original files (and all filename information) are gone
The .tmp files are in fact UTF-8 but there's no way to know what the original filenames were so the files are effectively gone/useless
even if I had the original filenames there's no way to know which .tmp file correlates with which original filename
it's not uncommon for Linux programs to not work perfectly on NTFS but I've never encountered anything this bad before
I "lost" nearly 400 files and it would have been more if I hadn't noticed the errors and aborted the job
Here's an example using a single file:
$ file testfile
testfile: HTML document, ASCII text, with very long lines, with LF, NEL line terminators
$ uchardet testfile
WINDOWS-1250
$ recode -t 'windows-1250..UTF-8' testfile
recode: chmod (rec5087.tmp): Operation not permitted
$ ls testfile
ls: cannot access 'testfile': No such file or directory
$ ls *.tmp
rec5087.tmp
$ file *.tmp
rec5087.tmp: HTML document, UTF-8 Unicode text, with very long lines
$ uchardet *.tmp
WINDOWS-1250
$
With a single file it's not a big deal to rename the .tmp file back to the original filename (as long as you have the original filename) but when many files are affected it seems impossible to recover from, especially if you don't have the original filenames.
I verified the same thing happens even without the -t
I verified that this happens on both NTFS and exFAT but does NOT happen on FAT32
this issue might or might not be specific to WSL systems; a pure Linux system with an NTFS or exFAT filesystem mounted might or might not behave differently; I'm unable to test this