bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

sed UTF-8 processing problem


From: Klaus Dechet
Subject: sed UTF-8 processing problem
Date: Mon, 14 Jun 2021 23:15:31 +0200
User-agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.9.0

Hi GNU team,

I have the following problem:

Running sed in windows 10 cmd terminal.

sed --version
GNU sed version 4.2.1
Copyright (C) 2009 Free Software Foundation, Inc.

In cmd terminal I enter the following:

D:\Temp>chcp 6500
D:\Temp>echo aΣb
aΣb
D:\Temp>echo aΣb > utf82.txt
File utf82.txt is utf-8 encoded and has Σ encoded in 2 bytes (\u03A3)

D:\Temp>echo aΣb | sed s/./X/g
XXXXX

This shows that sed is not processing UTF-8 encoding properly.


D:\Temp>echo aΣb | sed s/./X/g > sedoutput.txt

sedoutput.txt is ANSI-1252 encoded.


Question: How do I get sed to handle and produce UTF-8 encoded files per default?

Additional background: Installed sed and libraries from here:

http://gnuwin32.sourceforge.net/packages/sed.htm

Thank you.

Klaus





reply via email to

[Prev in Thread] Current Thread [Next in Thread]