|
From: | Eric Fischer |
Subject: | Re: Multibyte support for sort, uniq, join, tr, cut, paste, expand, unexpand, fmt, fold, and pr |
Date: | Wed, 10 Jan 2018 12:20:33 -0800 |
You were right that I needed to pay attention to character widths. My changes in https://github.com/ericfischer/coreutils/tree/multibyte will now handle character widths in all the places where POSIX counts "column positions" instead of characters. I have also introduced a "grapheme" abstraction to handle raw bytes transparently when the input contains character encoding errors. Having this structured character type has also been useful for finding a few additional places that assumed that text was bytes. I think the only work left to do is a little more on tr, to eliminate its need to know the largest possible wide character encoding. I have requested and received the copyright assignment paperwork, but my employer would like to dedicate my changes to the public domain or as CC0 rather than assign or disclaim copyright. Would this be acceptable? Eric
[Prev in Thread] | Current Thread | [Next in Thread] |