bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: script to convert separators for CSV processing


From: Ed Morton
Subject: Re: script to convert separators for CSV processing
Date: Sat, 11 Nov 2023 10:57:18 -0600
User-agent: Mozilla Thunderbird

Looks like the script got crushed onto 1 line in transit so trying again:

----------
$ cat changeSeps.awk
BEGIN {
    FS = OFS = "\""

    if ( (old == "") || (new == "") ) {
        printf "Error: old=\047%s\047 and/or new=\047%s\047 separator string missing.\n", old, new >"/dev/stderr"         printf "Usage: awk -v old=\047;\047 -v new=\047,\047 -f changeSeps.awk infile [> outfile]\n" >"/dev/stderr"
        err = 1
        exit
    }

    sanitized_old = old
    sanitized_new = new

    # Ensure all regexp and replacement chars get treated as literal
    gsub(/[^^\\]/,"[&]",sanitized_old)  # regexp: char other than ^ or \ -> [char]
    gsub(/\\/,"\\\\",sanitized_old)     # regexp: \ -> \\
    gsub(/\^/,"\\^",sanitized_old)      # regexp: ^ -> \^
    gsub(/[&]/,"\\\\&",sanitized_new)   # replacement: & -> \\&
}
{
    $0 = prev ors $0
    prev = $0
    ors = ORS
}
NF%2 {
    for ( i=1; i<=NF; i+=2 ) {
        cnt += gsub(sanitized_old,sanitized_new,$i)
    }
    print
    prev = ors = ""
}
END {
    if ( !err ) {
        printf "Converted %d \047%s\047 field separators to \047%s\047s.\n", cnt+0, old, new >"/dev/stderr"
    }
    exit err
}
---------

On 11/11/2023 10:54 AM, Ed Morton wrote:
The new `--csv` processing mode is great but since it doesn't handle chars other than commas as the separator, I expect many people will want to know how to convert their TSV, `;`-separated, `|`-separated, etc. files to/from `,`-separated so they can use the new functionality and so here's a suggestion of a script that you could include in the documentation to convert string-separated input into CSV (or other string-separated) output without reading all of the input into memory at once for input files that otherwise follow CSV quoting/separator rules, etc. so that multiple people don't have to try to figure it out:

-------

|$ cat changeSeps.awk BEGIN { FS = OFS = "\"" if ( (old == "") || (new == "") ) { printf "Error: old=\047%s\047 and/or new=\047%s\047 separator string missing.\n", old, new ||>"/dev/stderr"||printf "Usage: awk -v old=\047;\047 -v new=\047,\047 -f changeSeps.awk infile [> outfile]\n" ||>"/dev/stderr"||err = 1 exit } sanitized_old = old sanitized_new = new # Ensure all regexp and replacement chars get treated as literal gsub(/[^^\\]/,"[&]",sanitized_old) # regexp: char other than ^ or \ -> [char] gsub(/\\/,"\\\\",sanitized_old) # regexp: \ -> \\ gsub(/\^/,"\\^",sanitized_old) # regexp: ^ -> \^ gsub(/[&]/,"\\\\&",sanitized_new) # replacement: & -> \\& } { $0 = prev ors $0 prev = $0 ors = ORS } NF%2 { for ( i=1; i<=NF; i+=2 ) { cnt += gsub(sanitized_old,sanitized_new,$i) } print prev = ors = "" } END { if ( !err ) { printf "Converted %d \047%s\047 field separators to \047%s\047s.\n", cnt+0, old, new >"/dev/stderr" } exit err }|

-------

You'd call it as:
-----
awk -v old='<old separator>' -v new='<new separator>' -f changeSeps.awk file
-----

e.g. to convert TSV to CSV:

-----
$ printf '"foo\tbar"\tetc\n'
"foo    bar"    etc

$ printf '"foo\tbar"\tetc\n' | awk -v old='\t' -v new=',' -f changeSeps.awk
"foo    bar",etc
Converted 1 '   ' field separators to ','s.
-----

Regards,

    Ed.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]