[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-gawk] feature request: iconv/recode dynamic extension
From: |
Wolfgang Laun |
Subject: |
Re: [bug-gawk] feature request: iconv/recode dynamic extension |
Date: |
Sat, 22 Dec 2018 08:34:55 +0100 |
Why don't you write the transliteration as an awk function? It'll certainly
be much faster that calling a subprocess.
BEGIN {
dd["ü"] = "u"; dd["ö"] = "o"; dd["ó"] = "o";
dd["ä"] = "a"; dd["ě"] = "e"; dd["š"] = "s";
dd["č"] = "c"; dd["ř"] = "r"; dd["ž"] = "z";
dd["ý"] = "y"; dd["á"] = "a"; dd["í"] = "i";
dd["é"] = "e"; dd["ú"] = "u"; dd["ů"] = "u";
}
function dedia(s){
r = "";
for( i = 1; i <= length(s); ++i ){
c = substr( s, i, 1 );
if( c > "~" ){
c = dd[c];
}
r = r c;
}
return r;
}
{ print dedia($0); }
$ gawk -f try.awk <<< "üöóäěščřžýáíéúů the quick brown fox"
uooaescrzyaieuu the quick brown fox
On Sat, 22 Dec 2018 at 02:55, Franta Hanzlík <address@hidden> wrote:
> Hello,
> not sure when it is good idea, but I think this may be usefull for
> others also: I'm just doing some word processing in gawk, and it's
> part is two string comparison. These strings are plaintext ASCII
> strings obtained by removing diacritics from the original Latin-1
> and Latin-2 strings - thus I need conversion as
> "äáéěóöščýíüúů" -> "aaeeooscyiuuu".
> For now I solve this by calling external conversion program - as
>
> iconv -f UTF-8 -t US-ASCII//TRANSLIT <<< "üöóäěščřžýáíéúů"
> or
> recode -f u8..flat <<< "üöóäěščřžýáíéúů"
>
> but for thousands strings it is too slow (and resource expensive).
>
> There is perhaps lot of similar text conversions cases, where gawk
> dynamic extension for this needs wil be very useful.
>
> Eventually, when this idea isn't totally bad, I can try to program
> it, but I have no programming skills - thus can You please give me
> some advice on how to do this?
> --
> Thanks in advance, Franta Hanzlik
>
>
Re: [bug-gawk] feature request: iconv/recode dynamic extension, arnold, 2018/12/22