sed UTF-8 processing problem

bug-gnu-utils

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

sed UTF-8 processing problem

From:	Klaus Dechet
Subject:	sed UTF-8 processing problem
Date:	Mon, 14 Jun 2021 23:15:31 +0200
User-agent:	Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.9.0

Hi GNU team,

I have the following problem:

Running sed in windows 10 cmd terminal.

sed --version
GNU sed version 4.2.1
Copyright (C) 2009 Free Software Foundation, Inc.

In cmd terminal I enter the following:

D:\Temp>chcp 6500
D:\Temp>echo aΣb
aΣb
D:\Temp>echo aΣb > utf82.txt
File utf82.txt is utf-8 encoded and has Σ encoded in 2 bytes (\u03A3)

D:\Temp>echo aΣb | sed s/./X/g
XXXXX

This shows that sed is not processing UTF-8 encoding properly.


D:\Temp>echo aΣb | sed s/./X/g > sedoutput.txt

sedoutput.txt is ANSI-1252 encoded.

Question: How do I get sed to handle and produce UTF-8 encoded files perdefault?


Additional background: Installed sed and libraries from here:

http://gnuwin32.sourceforge.net/packages/sed.htm

Thank you.

Klaus

[Prev in Thread]

Current Thread

[Next in Thread]

sed UTF-8 processing problem, Klaus Dechet <=
- Re: sed UTF-8 processing problem, Eli Zaretskii, 2021/06/15

Next by Date: Re: sed UTF-8 processing problem
Next by thread: Re: sed UTF-8 processing problem
Index(es):
- Date
- Thread