bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] feature request: iconv/recode dynamic extension


From: Wolfgang Laun
Subject: Re: [bug-gawk] feature request: iconv/recode dynamic extension
Date: Sat, 22 Dec 2018 08:34:55 +0100

Why don't you write the transliteration as an awk function? It'll certainly
be much faster that calling a subprocess.

BEGIN {
    dd["ü"] = "u"; dd["ö"] = "o"; dd["ó"] = "o";
    dd["ä"] = "a"; dd["ě"] = "e"; dd["š"] = "s";
    dd["č"] = "c"; dd["ř"] = "r";  dd["ž"] = "z";
    dd["ý"] = "y"; dd["á"] = "a"; dd["í"] = "i";
    dd["é"] = "e"; dd["ú"] = "u"; dd["ů"] = "u";
}

function dedia(s){
    r = "";
    for( i = 1; i <= length(s); ++i ){
        c = substr( s, i, 1 );
        if( c > "~" ){
            c = dd[c];
        }
        r = r c;
    }
    return r;
}

{ print dedia($0); }

$ gawk -f try.awk  <<< "üöóäěščřžýáíéúů the quick brown fox"
uooaescrzyaieuu the quick brown fox


On Sat, 22 Dec 2018 at 02:55, Franta Hanzlík <address@hidden> wrote:

> Hello,
> not sure when it is good idea, but I think this may be usefull for
> others also: I'm just doing some word processing in gawk, and it's
> part is two string comparison. These strings are plaintext ASCII
> strings obtained by removing diacritics from the original Latin-1
> and Latin-2 strings - thus I need conversion as
>  "äáéěóöščýíüúů" -> "aaeeooscyiuuu".
> For now I solve this by calling external conversion program - as
>
> iconv -f UTF-8 -t US-ASCII//TRANSLIT <<< "üöóäěščřžýáíéúů"
>    or
> recode -f u8..flat <<< "üöóäěščřžýáíéúů"
>
> but for thousands strings it is too slow (and resource expensive).
>
> There is perhaps lot of similar text conversions cases, where gawk
> dynamic extension for this needs wil be very useful.
>
> Eventually, when this idea isn't totally bad, I can try to program
> it, but I have no programming skills - thus can You please give me
> some advice on how to do this?
> --
> Thanks in advance, Franta Hanzlik
>
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]