split the linebreak module

bug-gnulib
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
split the linebreak module

From:	Bruno Haible
Subject:	split the linebreak module
Date:	Sat, 10 May 2008 14:27:04 +0200
User-agent:	KMail/1.5.4
This commit splits up the 'linebreak' module, introducing separate modules
for each of the provided functions. So that users who don't want the u16_*
or u32_* functions don't need to have them.

Also makes use of 'ucs4_t', 'uint32_t' etc. instead of 'unsigned int' where
applicable.


2008-05-10  Bruno Haible  <address@hidden>

        Split up 'linebreak' module.
        * lib/unilbrk.h: New file, based on lib/linebreak.h.
        * lib/unilbrk/lbrkprop1.h: New file, extracted from lib/lbrkprop.h.
        * lib/unilbrk/lbrkprop2.h: New file, renamed from lib/lbrkprop.h with
        modifications.
        * lib/unilbrk/tables.h: New file, extracted from lib/linebreak.c.
        * lib/unilbrk/tables.c: New file, extracted from lib/linebreak.c.
        * lib/unilbrk/u8-possible-linebreaks.c: New file, extracted from
        lib/linebreak.c.
        * lib/unilbrk/u16-possible-linebreaks.c: New file, extracted from
        lib/linebreak.c.
        * lib/unilbrk/u32-possible-linebreaks.c: New file, extracted from
        lib/linebreak.c.
        * lib/unilbrk/ulc-common.h: New file, extracted from lib/linebreak.c.
        * lib/unilbrk/ulc-common.c: New file, extracted from lib/linebreak.c.
        * lib/unilbrk/ulc-possible-linebreaks.c: New file, extracted from
        lib/linebreak.c.
        * lib/unilbrk/u8-width-linebreaks.c: New file, extracted from
        lib/linebreak.c.
        * lib/unilbrk/u16-width-linebreaks.c: New file, extracted from
        lib/linebreak.c.
        * lib/unilbrk/u32-width-linebreaks.c: New file, extracted from
        lib/linebreak.c.
        * lib/unilbrk/ulc-width-linebreaks.c: New file, extracted from
        lib/linebreak.c.
        * modules/unilbrk/base: New file.
        * modules/unilbrk/tables: New file.
        * modules/unilbrk/u8-possible-linebreaks: New file.
        * modules/unilbrk/u16-possible-linebreaks: New file.
        * modules/unilbrk/u32-possible-linebreaks: New file.
        * modules/unilbrk/ulc-common: New file.
        * modules/unilbrk/ulc-possible-linebreaks: New file.
        * modules/unilbrk/u8-width-linebreaks: New file.
        * modules/unilbrk/u16-width-linebreaks: New file.
        * modules/unilbrk/u32-width-linebreaks: New file.
        * modules/unilbrk/ulc-width-linebreaks: New file.
        * lib/linebreak.h: Remove file.
        * lib/linebreak.c: Remove file.
        * m4/linebreak.m4: Remove file.
        * modules/linebreak: Remove file.
        * NEWS: Mention the changes.

*** NEWS.orig   2008-05-10 14:12:47.000000000 +0200
--- NEWS        2008-05-10 03:53:52.000000000 +0200
***************
*** 6,11 ****
--- 6,17 ----
  
  Date        Modules         Changes
  
+ 2008-05-10  linebreak       The module is split into several modules 
unilbrk/*.
+                             The include file is changed from "linebreak.h" to
+                             "unilbrk.h". Two functions are renamed:
+                             mbs_possible_linebreaks -> ulc_possible_linebreaks
+                             mbs_width_linebreaks -> ulc_width_linebreaks
+ 
  2008-04-28  rpmatch         The include file is now <stdlib.h>.
  
  2008-04-28  inet_ntop       The include file is changed from "inet_ntop.h"
*** lib/unilbrk.h.orig  2003-09-23 19:59:22.000000000 +0200
--- lib/unilbrk.h       2008-05-10 14:08:07.000000000 +0200
***************
*** 0 ****
--- 1,110 ----
+ /* Line breaking of Unicode strings.
+    Copyright (C) 2001-2003, 2005-2008 Free Software Foundation, Inc.
+    Written by Bruno Haible <address@hidden>, 2001.
+ 
+    This program is free software: you can redistribute it and/or modify it
+    under the terms of the GNU Lesser General Public License as published
+    by the Free Software Foundation; either version 3 of the License, or
+    (at your option) any later version.
+ 
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+    Lesser General Public License for more details.
+ 
+    You should have received a copy of the GNU Lesser General Public License
+    along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+ 
+ #ifndef _UNILBRK_H
+ #define _UNILBRK_H
+ 
+ /* Get size_t.  */
+ #include <stddef.h>
+ 
+ #include "unitypes.h"
+ 
+ /* Get locale_charset() declaration.  */
+ #include "localcharset.h"
+ 
+ 
+ #ifdef __cplusplus
+ extern "C" {
+ #endif
+ 
+ 
+ /* These functions are locale dependent.  The encoding argument identifies
+    the encoding (e.g. "ISO-8859-2" for Polish).  */
+ 
+ 
+ /* Line breaking.  */
+ 
+ enum
+ {
+   UC_BREAK_UNDEFINED,
+   UC_BREAK_PROHIBITED,
+   UC_BREAK_POSSIBLE,
+   UC_BREAK_MANDATORY,
+   UC_BREAK_HYPHENATION
+ };
+ 
+ /* Determine the line break points in S, and store the result at p[0..n-1].
+    p[i] = UC_BREAK_MANDATORY means that s[i] is a line break character.
+    p[i] = UC_BREAK_POSSIBLE means that a line break may be inserted between
+           s[i-1] and s[i].
+    p[i] = UC_BREAK_HYPHENATION means that a hyphen and a line break may be
+           inserted between s[i-1] and s[i].  But beware of language dependent
+           hyphenation rules.
+    p[i] = UC_BREAK_PROHIBITED means that s[i-1] and s[i] must not be 
separated.
+  */
+ extern void
+        u8_possible_linebreaks (const uint8_t *s, size_t n,
+                              const char *encoding, char *p);
+ extern void
+        u16_possible_linebreaks (const uint16_t *s, size_t n,
+                               const char *encoding, char *p);
+ extern void
+        u32_possible_linebreaks (const uint32_t *s, size_t n,
+                               const char *encoding, char *p);
+ extern void
+        ulc_possible_linebreaks (const char *s, size_t n,
+                               const char *encoding, char *p);
+ 
+ /* Choose the best line breaks, assuming the uc_width function.
+    The string is s[0..n-1].  The maximum number of columns per line is given
+    as WIDTH.  The starting column of the string is given as START_COLUMN.
+    If the algorithm shall keep room after the last piece, they can be given
+    as AT_END_COLUMNS.
+    o is an optional override; if o[i] != UC_BREAK_UNDEFINED, o[i] takes
+    precedence over p[i] as returned by the *_possible_linebreaks function.
+    The given ENCODING is used for disambiguating widths in uc_width.
+    Return the column after the end of the string, and store the result at
+    p[0..n-1].
+  */
+ extern int
+        u8_width_linebreaks (const uint8_t *s, size_t n, int width,
+                           int start_column, int at_end_columns,
+                           const char *o, const char *encoding,
+                           char *p);
+ extern int
+        u16_width_linebreaks (const uint16_t *s, size_t n, int width,
+                            int start_column, int at_end_columns,
+                            const char *o, const char *encoding,
+                            char *p);
+ extern int
+        u32_width_linebreaks (const uint32_t *s, size_t n, int width,
+                            int start_column, int at_end_columns,
+                            const char *o, const char *encoding,
+                            char *p);
+ extern int
+        ulc_width_linebreaks (const char *s, size_t n, int width,
+                            int start_column, int at_end_columns,
+                            const char *o, const char *encoding,
+                            char *p);
+ 
+ 
+ #ifdef __cplusplus
+ }
+ #endif
+ 
+ 
+ #endif /* _UNILBRK_H */

*** lib/unilbrk/lbrkprop1.h.orig        2003-09-23 19:59:22.000000000 +0200
--- lib/unilbrk/lbrkprop1.h     2008-05-10 12:20:46.000000000 +0200
***************
*** 0 ****
--- 1,32 ----
+ /* Line breaking properties of Unicode characters.  */
+ /* Generated automatically by gen-lbrkprop for Unicode 3.1.0.  */
+ 
+ /* Copyright (C) 2000-2004, 2008 Free Software Foundation, Inc.
+ 
+    This program is free software: you can redistribute it and/or modify it
+    under the terms of the GNU Lesser General Public License as published
+    by the Free Software Foundation; either version 3 of the License, or
+    (at your option) any later version.
+ 
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+    Lesser General Public License for more details.
+ 
+    You should have received a copy of the GNU Lesser General Public License
+    along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+ 
+ #define lbrkprop_header_0 16
+ #define lbrkprop_header_1 15
+ #define lbrkprop_header_2 7
+ #define lbrkprop_header_3 511
+ #define lbrkprop_header_4 127
+ 
+ typedef struct
+   {
+     int level1[15];
+     int level2[4 << 9];
+     unsigned char level3[100 << 7];
+   }
+ lbrkprop_t;
+ extern const lbrkprop_t unilbrkprop;

*** lib/unilbrk/lbrkprop2.h.orig        2003-09-23 19:59:22.000000000 +0200
--- lib/unilbrk/lbrkprop2.h     2008-05-10 12:20:25.000000000 +0200
***************
*** 0 ****
--- 1,1883 ----
+ /* Line breaking properties of Unicode characters.  */
+ /* Generated automatically by gen-lbrkprop for Unicode 3.1.0.  */
+ 
+ /* Copyright (C) 2000-2004, 2008 Free Software Foundation, Inc.
+ 
+    This program is free software: you can redistribute it and/or modify it
+    under the terms of the GNU Lesser General Public License as published
+    by the Free Software Foundation; either version 3 of the License, or
+    (at your option) any later version.
+ 
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+    Lesser General Public License for more details.
+ 
+    You should have received a copy of the GNU Lesser General Public License
+    along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+ 
+ const lbrkprop_t unilbrkprop =
+ ...

*** lib/unilbrk/tables.c.orig   2003-09-23 19:59:22.000000000 +0200
--- lib/unilbrk/tables.c        2008-05-10 12:21:23.000000000 +0200
***************
*** 0 ****
--- 1,53 ----
+ /* Line breaking auxiliary tables.
+    Copyright (C) 2001-2003, 2006-2008 Free Software Foundation, Inc.
+    Written by Bruno Haible <address@hidden>, 2001.
+ 
+    This program is free software: you can redistribute it and/or modify it
+    under the terms of the GNU Lesser General Public License as published
+    by the Free Software Foundation; either version 3 of the License, or
+    (at your option) any later version.
+ 
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+    Lesser General Public License for more details.
+ 
+    You should have received a copy of the GNU Lesser General Public License
+    along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+ 
+ #include <config.h>
+ 
+ /* Specification.  */
+ #include "unilbrk/tables.h"
+ 
+ /* Define unilbrkprop, table of line breaking properties.  */
+ #include "unilbrk/lbrkprop2.h"
+ 
+ const unsigned char unilbrk_table[19][19] =
+ {
+                               /* after */
+       /* ZW IN GL BA BB B2 HY NS OP CL QU EX ID NU IS SY AL PR PO */
+ /* ZW */ { P, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, },
+ /* IN */ { P, I, I, I, D, D, I, I, D, P, I, P, D, D, P, P, D, D, D, },
+ /* GL */ { P, I, I, I, I, I, I, I, I, P, I, P, I, I, P, P, I, I, I, },
+ /* BA */ { P, D, I, I, D, D, I, I, D, P, I, P, D, D, P, P, D, D, D, },
+ /* BB */ { P, I, I, I, I, I, I, I, I, P, I, P, I, I, P, P, I, I, I, },
+ /* B2 */ { P, D, I, I, D, P, I, I, D, P, I, P, D, D, P, P, D, D, D, },
+ /* HY */ { P, D, I, I, D, D, I, I, D, P, I, P, D, D, P, P, D, D, D, },
+ /* NS */ { P, D, I, I, D, D, I, I, D, P, I, P, D, D, P, P, D, D, D, },
+ /* OP */ { P, P, P, P, P, P, P, P, P, P, P, P, P, P, P, P, P, P, P, },
+ /* CL */ { P, D, I, I, D, D, I, P, D, P, I, P, D, D, P, P, D, D, I, },
+ /* QU */ { P, I, I, I, I, I, I, I, P, P, I, P, I, I, P, P, I, I, I, },
+ /* EX */ { P, D, I, I, D, D, I, I, D, P, I, P, D, D, P, P, D, D, D, },
+ /* ID */ { P, I, I, I, D, D, I, I, D, P, I, P, D, D, P, P, D, D, I, },
+ /* NU */ { P, I, I, I, D, D, I, I, D, P, I, P, D, I, P, P, I, D, I, },
+ /* IS */ { P, D, I, I, D, D, I, I, D, P, I, P, D, I, P, P, D, D, D, },
+ /* SY */ { P, D, I, I, D, D, I, I, D, P, I, P, D, I, P, P, D, D, D, },
+ /* AL */ { P, I, I, I, D, D, I, I, D, P, I, P, D, I, P, P, I, D, D, },
+ /* PR */ { P, D, I, I, D, D, I, I, I, P, I, P, I, I, P, P, I, D, D, },
+ /* PO */ { P, D, I, I, D, D, I, I, D, P, I, P, D, D, P, P, D, D, D, },
+ /* "" */
+ /* before */
+ };
+ /* Note: The (B2,B2) entry should probably be D instead of P.  */
+ /* Note: The (PR,ID) entry should probably be D instead of I.  */

*** lib/unilbrk/tables.h.orig   2003-09-23 19:59:22.000000000 +0200
--- lib/unilbrk/tables.h        2008-05-10 14:11:29.000000000 +0200
***************
*** 0 ****
--- 1,87 ----
+ /* Line breaking auxiliary tables.
+    Copyright (C) 2001-2003, 2006-2008 Free Software Foundation, Inc.
+    Written by Bruno Haible <address@hidden>, 2001.
+ 
+    This program is free software: you can redistribute it and/or modify it
+    under the terms of the GNU Lesser General Public License as published
+    by the Free Software Foundation; either version 3 of the License, or
+    (at your option) any later version.
+ 
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+    Lesser General Public License for more details.
+ 
+    You should have received a copy of the GNU Lesser General Public License
+    along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+ 
+ #include "unitypes.h"
+ 
+ /* Line breaking classification.  */
+ 
+ enum
+ {
+   /* Values >= 20 are resolved at run time. */
+   LBP_BK =  0, /* mandatory break */
+ /*LBP_CR,         carriage return - not used here because it's a DOSism */
+ /*LBP_LF,         line feed - not used here because it's a DOSism */
+   LBP_CM = 20, /* attached characters and combining marks */
+ /*LBP_SG,         surrogates - not used here because they are not characters 
*/
+   LBP_ZW =  1, /* zero width space */
+   LBP_IN =  2, /* inseparable */
+   LBP_GL =  3, /* non-breaking (glue) */
+   LBP_CB = 22, /* contingent break opportunity */
+   LBP_SP = 21, /* space */
+   LBP_BA =  4, /* break opportunity after */
+   LBP_BB =  5, /* break opportunity before */
+   LBP_B2 =  6, /* break opportunity before and after */
+   LBP_HY =  7, /* hyphen */
+   LBP_NS =  8, /* non starter */
+   LBP_OP =  9, /* opening punctuation */
+   LBP_CL = 10, /* closing punctuation */
+   LBP_QU = 11, /* ambiguous quotation */
+   LBP_EX = 12, /* exclamation/interrogation */
+   LBP_ID = 13, /* ideographic */
+   LBP_NU = 14, /* numeric */
+   LBP_IS = 15, /* infix separator (numeric) */
+   LBP_SY = 16, /* symbols allowing breaks */
+   LBP_AL = 17, /* ordinary alphabetic and symbol characters */
+   LBP_PR = 18, /* prefix (numeric) */
+   LBP_PO = 19, /* postfix (numeric) */
+   LBP_SA = 23, /* complex context (South East Asian) */
+   LBP_AI = 24, /* ambiguous (alphabetic or ideograph) */
+   LBP_XX = 25  /* unknown */
+ };
+ 
+ #include "lbrkprop1.h"
+ 
+ static inline unsigned char
+ unilbrkprop_lookup (ucs4_t uc)
+ {
+   unsigned int index1 = uc >> lbrkprop_header_0;
+   if (index1 < lbrkprop_header_1)
+     {
+       int lookup1 = unilbrkprop.level1[index1];
+       if (lookup1 >= 0)
+       {
+         unsigned int index2 = (uc >> lbrkprop_header_2) & lbrkprop_header_3;
+         int lookup2 = unilbrkprop.level2[lookup1 + index2];
+         if (lookup2 >= 0)
+           {
+             unsigned int index3 = uc & lbrkprop_header_4;
+             return unilbrkprop.level3[lookup2 + index3];
+           }
+       }
+     }
+   return LBP_XX;
+ }
+ 
+ /* Table indexed by two line breaking classifications.  */
+ #define D 1  /* direct break opportunity, empty in table 7.3 of UTR #14 */
+ #define I 2  /* indirect break opportunity, '%' in table 7.3 of UTR #14 */
+ #define P 3  /* prohibited break,           '^' in table 7.3 of UTR #14 */
+ 
+ extern const unsigned char unilbrk_table[19][19];
+ 
+ /* We don't support line breaking of complex-context dependent characters
+    (Thai, Lao, Myanmar, Khmer) yet, because it requires dictionary lookup. */

*** lib/unilbrk/u16-possible-linebreaks.c.orig  2003-09-23 19:59:22.000000000 
+0200
--- lib/unilbrk/u16-possible-linebreaks.c       2008-05-10 11:38:12.000000000 
+0200
***************
*** 0 ****
--- 1,140 ----
+ /* Line breaking of UTF-16 strings.
+    Copyright (C) 2001-2003, 2006-2008 Free Software Foundation, Inc.
+    Written by Bruno Haible <address@hidden>, 2001.
+ 
+    This program is free software: you can redistribute it and/or modify it
+    under the terms of the GNU Lesser General Public License as published
+    by the Free Software Foundation; either version 3 of the License, or
+    (at your option) any later version.
+ 
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+    Lesser General Public License for more details.
+ 
+    You should have received a copy of the GNU Lesser General Public License
+    along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+ 
+ #include <config.h>
+ 
+ /* Specification.  */
+ #include "unilbrk.h"
+ 
+ #include <stdlib.h>
+ #include <string.h>
+ 
+ #include "unilbrk/tables.h"
+ #include "uniwidth/cjk.h"
+ #include "unistr.h"
+ 
+ void
+ u16_possible_linebreaks (const uint16_t *s, size_t n, const char *encoding, 
char *p)
+ {
+   int LBP_AI_REPLACEMENT = (is_cjk_encoding (encoding) ? LBP_ID : LBP_AL);
+   const uint16_t *s_end = s + n;
+   int last_prop = LBP_BK; /* line break property of last non-space character 
*/
+   char *seen_space = NULL; /* Was a space seen after the last non-space 
character? */
+   char *seen_space2 = NULL; /* At least two spaces after the last non-space? 
*/
+ 
+   /* Don't break inside multibyte characters.  */
+   memset (p, UC_BREAK_PROHIBITED, n);
+ 
+   while (s < s_end)
+     {
+       ucs4_t uc;
+       int count = u16_mbtouc_unsafe (&uc, s, s_end - s);
+       int prop = unilbrkprop_lookup (uc);
+ 
+       if (prop == LBP_BK)
+       {
+         /* Mandatory break.  */
+         *p = UC_BREAK_MANDATORY;
+         last_prop = LBP_BK;
+         seen_space = NULL;
+         seen_space2 = NULL;
+       }
+       else
+       {
+         char *q;
+ 
+         /* Resolve property values whose behaviour is not fixed.  */
+         switch (prop)
+           {
+           case LBP_AI:
+             /* Resolve ambiguous.  */
+             prop = LBP_AI_REPLACEMENT;
+             break;
+           case LBP_CB:
+             /* This is arbitrary.  */
+             prop = LBP_ID;
+             break;
+           case LBP_SA:
+             /* We don't handle complex scripts yet.
+                Treat LBP_SA like LBP_XX.  */
+           case LBP_XX:
+             /* This is arbitrary.  */
+             prop = LBP_AL;
+             break;
+           }
+ 
+         /* Deal with combining characters.  */
+         q = p;
+         if (prop == LBP_CM)
+           {
+             /* Don't break just before a combining character.  */
+             *p = UC_BREAK_PROHIBITED;
+             /* A combining character turns a preceding space into LBP_AL.  */
+             if (seen_space != NULL)
+               {
+                 q = seen_space;
+                 seen_space = seen_space2;
+                 prop = LBP_AL;
+                 goto lookup_via_table;
+               }
+           }
+         else if (prop == LBP_SP)
+           {
+             /* Don't break just before a space.  */
+             *p = UC_BREAK_PROHIBITED;
+             seen_space2 = seen_space;
+             seen_space = p;
+           }
+         else
+           {
+            lookup_via_table:
+             /* prop must be usable as an index for table 7.3 of UTR #14.  */
+             if (!(prop >= 1 && prop <= sizeof (unilbrk_table) / sizeof 
(unilbrk_table[0])))
+               abort ();
+ 
+             if (last_prop == LBP_BK)
+               {
+                 /* Don't break at the beginning of a line.  */
+                 *q = UC_BREAK_PROHIBITED;
+               }
+             else
+               {
+                 switch (unilbrk_table [last_prop-1] [prop-1])
+                   {
+                   case D:
+                     *q = UC_BREAK_POSSIBLE;
+                     break;
+                   case I:
+                     *q = (seen_space != NULL ? UC_BREAK_POSSIBLE : 
UC_BREAK_PROHIBITED);
+                     break;
+                   case P:
+                     *q = UC_BREAK_PROHIBITED;
+                     break;
+                   default:
+                     abort ();
+                   }
+               }
+             last_prop = prop;
+             seen_space = NULL;
+             seen_space2 = NULL;
+           }
+       }
+ 
+       s += count;
+       p += count;
+     }
+ }

*** lib/unilbrk/u16-width-linebreaks.c.orig     2003-09-23 19:59:22.000000000 
+0200
--- lib/unilbrk/u16-width-linebreaks.c  2008-05-10 11:38:21.000000000 +0200
***************
*** 0 ****
--- 1,108 ----
+ /* Line breaking of UTF-16 strings.
+    Copyright (C) 2001-2003, 2006-2008 Free Software Foundation, Inc.
+    Written by Bruno Haible <address@hidden>, 2001.
+ 
+    This program is free software: you can redistribute it and/or modify it
+    under the terms of the GNU Lesser General Public License as published
+    by the Free Software Foundation; either version 3 of the License, or
+    (at your option) any later version.
+ 
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+    Lesser General Public License for more details.
+ 
+    You should have received a copy of the GNU Lesser General Public License
+    along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+ 
+ #include <config.h>
+ 
+ /* Specification.  */
+ #include "unilbrk.h"
+ 
+ #include "unistr.h"
+ #include "uniwidth.h"
+ 
+ int
+ u16_width_linebreaks (const uint16_t *s, size_t n,
+                     int width, int start_column, int at_end_columns,
+                     const char *o, const char *encoding,
+                     char *p)
+ {
+   const uint16_t *s_end;
+   char *last_p;
+   int last_column;
+   int piece_width;
+ 
+   u16_possible_linebreaks (s, n, encoding, p);
+ 
+   s_end = s + n;
+   last_p = NULL;
+   last_column = start_column;
+   piece_width = 0;
+   while (s < s_end)
+     {
+       ucs4_t uc;
+       int count = u16_mbtouc_unsafe (&uc, s, s_end - s);
+ 
+       /* Respect the override.  */
+       if (o != NULL && *o != UC_BREAK_UNDEFINED)
+       *p = *o;
+ 
+       if (*p == UC_BREAK_POSSIBLE || *p == UC_BREAK_MANDATORY)
+       {
+         /* An atomic piece of text ends here.  */
+         if (last_p != NULL && last_column + piece_width > width)
+           {
+             /* Insert a line break.  */
+             *last_p = UC_BREAK_POSSIBLE;
+             last_column = 0;
+           }
+       }
+ 
+       if (*p == UC_BREAK_MANDATORY)
+       {
+         /* uc is a line break character.  */
+         /* Start a new piece at column 0.  */
+         last_p = NULL;
+         last_column = 0;
+         piece_width = 0;
+       }
+       else
+       {
+         /* uc is not a line break character.  */
+         int w;
+ 
+         if (*p == UC_BREAK_POSSIBLE)
+           {
+             /* Start a new piece.  */
+             last_p = p;
+             last_column += piece_width;
+             piece_width = 0;
+             /* No line break for the moment, may be turned into
+                UC_BREAK_POSSIBLE later, via last_p. */
+           }
+ 
+         *p = UC_BREAK_PROHIBITED;
+ 
+         w = uc_width (uc, encoding);
+         if (w >= 0) /* ignore control characters in the string */
+           piece_width += w;
+       }
+ 
+       s += count;
+       p += count;
+       if (o != NULL)
+       o += count;
+     }
+ 
+   /* The last atomic piece of text ends here.  */
+   if (last_p != NULL && last_column + piece_width + at_end_columns > width)
+     {
+       /* Insert a line break.  */
+       *last_p = UC_BREAK_POSSIBLE;
+       last_column = 0;
+     }
+ 
+   return last_column + piece_width;
+ }

*** lib/unilbrk/u32-possible-linebreaks.c.orig  2003-09-23 19:59:22.000000000 
+0200
--- lib/unilbrk/u32-possible-linebreaks.c       2008-05-10 11:38:30.000000000 
+0200
***************
*** 0 ****
--- 1,134 ----
+ /* Line breaking of UTF-32 strings.
+    Copyright (C) 2001-2003, 2006-2008 Free Software Foundation, Inc.
+    Written by Bruno Haible <address@hidden>, 2001.
+ 
+    This program is free software: you can redistribute it and/or modify it
+    under the terms of the GNU Lesser General Public License as published
+    by the Free Software Foundation; either version 3 of the License, or
+    (at your option) any later version.
+ 
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+    Lesser General Public License for more details.
+ 
+    You should have received a copy of the GNU Lesser General Public License
+    along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+ 
+ #include <config.h>
+ 
+ /* Specification.  */
+ #include "unilbrk.h"
+ 
+ #include <stdlib.h>
+ 
+ #include "unilbrk/tables.h"
+ #include "uniwidth/cjk.h"
+ 
+ void
+ u32_possible_linebreaks (const uint32_t *s, size_t n, const char *encoding, 
char *p)
+ {
+   int LBP_AI_REPLACEMENT = (is_cjk_encoding (encoding) ? LBP_ID : LBP_AL);
+   const uint32_t *s_end = s + n;
+   int last_prop = LBP_BK; /* line break property of last non-space character 
*/
+   char *seen_space = NULL; /* Was a space seen after the last non-space 
character? */
+   char *seen_space2 = NULL; /* At least two spaces after the last non-space? 
*/
+ 
+   while (s < s_end)
+     {
+       ucs4_t uc = *s;
+       int prop = unilbrkprop_lookup (uc);
+ 
+       if (prop == LBP_BK)
+       {
+         /* Mandatory break.  */
+         *p = UC_BREAK_MANDATORY;
+         last_prop = LBP_BK;
+         seen_space = NULL;
+         seen_space2 = NULL;
+       }
+       else
+       {
+         char *q;
+ 
+         /* Resolve property values whose behaviour is not fixed.  */
+         switch (prop)
+           {
+           case LBP_AI:
+             /* Resolve ambiguous.  */
+             prop = LBP_AI_REPLACEMENT;
+             break;
+           case LBP_CB:
+             /* This is arbitrary.  */
+             prop = LBP_ID;
+             break;
+           case LBP_SA:
+             /* We don't handle complex scripts yet.
+                Treat LBP_SA like LBP_XX.  */
+           case LBP_XX:
+             /* This is arbitrary.  */
+             prop = LBP_AL;
+             break;
+           }
+ 
+         /* Deal with combining characters.  */
+         q = p;
+         if (prop == LBP_CM)
+           {
+             /* Don't break just before a combining character.  */
+             *p = UC_BREAK_PROHIBITED;
+             /* A combining character turns a preceding space into LBP_AL.  */
+             if (seen_space != NULL)
+               {
+                 q = seen_space;
+                 seen_space = seen_space2;
+                 prop = LBP_AL;
+                 goto lookup_via_table;
+               }
+           }
+         else if (prop == LBP_SP)
+           {
+             /* Don't break just before a space.  */
+             *p = UC_BREAK_PROHIBITED;
+             seen_space2 = seen_space;
+             seen_space = p;
+           }
+         else
+           {
+            lookup_via_table:
+             /* prop must be usable as an index for table 7.3 of UTR #14.  */
+             if (!(prop >= 1 && prop <= sizeof (unilbrk_table) / sizeof 
(unilbrk_table[0])))
+               abort ();
+ 
+             if (last_prop == LBP_BK)
+               {
+                 /* Don't break at the beginning of a line.  */
+                 *q = UC_BREAK_PROHIBITED;
+               }
+             else
+               {
+                 switch (unilbrk_table [last_prop-1] [prop-1])
+                   {
+                   case D:
+                     *q = UC_BREAK_POSSIBLE;
+                     break;
+                   case I:
+                     *q = (seen_space != NULL ? UC_BREAK_POSSIBLE : 
UC_BREAK_PROHIBITED);
+                     break;
+                   case P:
+                     *q = UC_BREAK_PROHIBITED;
+                     break;
+                   default:
+                     abort ();
+                   }
+               }
+             last_prop = prop;
+             seen_space = NULL;
+             seen_space2 = NULL;
+           }
+       }
+ 
+       s++;
+       p++;
+     }
+ }

*** lib/unilbrk/u32-width-linebreaks.c.orig     2003-09-23 19:59:22.000000000 
+0200
--- lib/unilbrk/u32-width-linebreaks.c  2008-05-10 11:38:40.000000000 +0200
***************
*** 0 ****
--- 1,106 ----
+ /* Line breaking of UTF-32 strings.
+    Copyright (C) 2001-2003, 2006-2008 Free Software Foundation, Inc.
+    Written by Bruno Haible <address@hidden>, 2001.
+ 
+    This program is free software: you can redistribute it and/or modify it
+    under the terms of the GNU Lesser General Public License as published
+    by the Free Software Foundation; either version 3 of the License, or
+    (at your option) any later version.
+ 
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+    Lesser General Public License for more details.
+ 
+    You should have received a copy of the GNU Lesser General Public License
+    along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+ 
+ #include <config.h>
+ 
+ /* Specification.  */
+ #include "unilbrk.h"
+ 
+ #include "uniwidth.h"
+ 
+ int
+ u32_width_linebreaks (const uint32_t *s, size_t n,
+                     int width, int start_column, int at_end_columns,
+                     const char *o, const char *encoding,
+                     char *p)
+ {
+   const uint32_t *s_end;
+   char *last_p;
+   int last_column;
+   int piece_width;
+ 
+   u32_possible_linebreaks (s, n, encoding, p);
+ 
+   s_end = s + n;
+   last_p = NULL;
+   last_column = start_column;
+   piece_width = 0;
+   while (s < s_end)
+     {
+       ucs4_t uc = *s;
+ 
+       /* Respect the override.  */
+       if (o != NULL && *o != UC_BREAK_UNDEFINED)
+       *p = *o;
+ 
+       if (*p == UC_BREAK_POSSIBLE || *p == UC_BREAK_MANDATORY)
+       {
+         /* An atomic piece of text ends here.  */
+         if (last_p != NULL && last_column + piece_width > width)
+           {
+             /* Insert a line break.  */
+             *last_p = UC_BREAK_POSSIBLE;
+             last_column = 0;
+           }
+       }
+ 
+       if (*p == UC_BREAK_MANDATORY)
+       {
+         /* uc is a line break character.  */
+         /* Start a new piece at column 0.  */
+         last_p = NULL;
+         last_column = 0;
+         piece_width = 0;
+       }
+       else
+       {
+         /* uc is not a line break character.  */
+         int w;
+ 
+         if (*p == UC_BREAK_POSSIBLE)
+           {
+             /* Start a new piece.  */
+             last_p = p;
+             last_column += piece_width;
+             piece_width = 0;
+             /* No line break for the moment, may be turned into
+                UC_BREAK_POSSIBLE later, via last_p. */
+           }
+ 
+         *p = UC_BREAK_PROHIBITED;
+ 
+         w = uc_width (uc, encoding);
+         if (w >= 0) /* ignore control characters in the string */
+           piece_width += w;
+       }
+ 
+       s++;
+       p++;
+       if (o != NULL)
+       o++;
+     }
+ 
+   /* The last atomic piece of text ends here.  */
+   if (last_p != NULL && last_column + piece_width + at_end_columns > width)
+     {
+       /* Insert a line break.  */
+       *last_p = UC_BREAK_POSSIBLE;
+       last_column = 0;
+     }
+ 
+   return last_column + piece_width;
+ }

*** lib/unilbrk/u8-possible-linebreaks.c.orig   2003-09-23 19:59:22.000000000 
+0200
--- lib/unilbrk/u8-possible-linebreaks.c        2008-05-10 13:37:13.000000000 
+0200
***************
*** 0 ****
--- 1,237 ----
+ /* Line breaking of UTF-8 strings.
+    Copyright (C) 2001-2003, 2006-2008 Free Software Foundation, Inc.
+    Written by Bruno Haible <address@hidden>, 2001.
+ 
+    This program is free software: you can redistribute it and/or modify it
+    under the terms of the GNU Lesser General Public License as published
+    by the Free Software Foundation; either version 3 of the License, or
+    (at your option) any later version.
+ 
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+    Lesser General Public License for more details.
+ 
+    You should have received a copy of the GNU Lesser General Public License
+    along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+ 
+ #include <config.h>
+ 
+ /* Specification.  */
+ #include "unilbrk.h"
+ 
+ #include <stdlib.h>
+ #include <string.h>
+ 
+ #include "unilbrk/tables.h"
+ #include "uniwidth/cjk.h"
+ #include "unistr.h"
+ 
+ void
+ u8_possible_linebreaks (const uint8_t *s, size_t n, const char *encoding, 
char *p)
+ {
+   int LBP_AI_REPLACEMENT = (is_cjk_encoding (encoding) ? LBP_ID : LBP_AL);
+   const uint8_t *s_end = s + n;
+   int last_prop = LBP_BK; /* line break property of last non-space character 
*/
+   char *seen_space = NULL; /* Was a space seen after the last non-space 
character? */
+   char *seen_space2 = NULL; /* At least two spaces after the last non-space? 
*/
+ 
+   /* Don't break inside multibyte characters.  */
+   memset (p, UC_BREAK_PROHIBITED, n);
+ 
+   while (s < s_end)
+     {
+       ucs4_t uc;
+       int count = u8_mbtouc_unsafe (&uc, s, s_end - s);
+       int prop = unilbrkprop_lookup (uc);
+ 
+       if (prop == LBP_BK)
+       {
+         /* Mandatory break.  */
+         *p = UC_BREAK_MANDATORY;
+         last_prop = LBP_BK;
+         seen_space = NULL;
+         seen_space2 = NULL;
+       }
+       else
+       {
+         char *q;
+ 
+         /* Resolve property values whose behaviour is not fixed.  */
+         switch (prop)
+           {
+           case LBP_AI:
+             /* Resolve ambiguous.  */
+             prop = LBP_AI_REPLACEMENT;
+             break;
+           case LBP_CB:
+             /* This is arbitrary.  */
+             prop = LBP_ID;
+             break;
+           case LBP_SA:
+             /* We don't handle complex scripts yet.
+                Treat LBP_SA like LBP_XX.  */
+           case LBP_XX:
+             /* This is arbitrary.  */
+             prop = LBP_AL;
+             break;
+           }
+ 
+         /* Deal with combining characters.  */
+         q = p;
+         if (prop == LBP_CM)
+           {
+             /* Don't break just before a combining character.  */
+             *p = UC_BREAK_PROHIBITED;
+             /* A combining character turns a preceding space into LBP_AL.  */
+             if (seen_space != NULL)
+               {
+                 q = seen_space;
+                 seen_space = seen_space2;
+                 prop = LBP_AL;
+                 goto lookup_via_table;
+               }
+           }
+         else if (prop == LBP_SP)
+           {
+             /* Don't break just before a space.  */
+             *p = UC_BREAK_PROHIBITED;
+             seen_space2 = seen_space;
+             seen_space = p;
+           }
+         else
+           {
+            lookup_via_table:
+             /* prop must be usable as an index for table 7.3 of UTR #14.  */
+             if (!(prop >= 1 && prop <= sizeof (unilbrk_table) / sizeof 
(unilbrk_table[0])))
+               abort ();
+ 
+             if (last_prop == LBP_BK)
+               {
+                 /* Don't break at the beginning of a line.  */
+                 *q = UC_BREAK_PROHIBITED;
+               }
+             else
+               {
+                 switch (unilbrk_table [last_prop-1] [prop-1])
+                   {
+                   case D:
+                     *q = UC_BREAK_POSSIBLE;
+                     break;
+                   case I:
+                     *q = (seen_space != NULL ? UC_BREAK_POSSIBLE : 
UC_BREAK_PROHIBITED);
+                     break;
+                   case P:
+                     *q = UC_BREAK_PROHIBITED;
+                     break;
+                   default:
+                     abort ();
+                   }
+               }
+             last_prop = prop;
+             seen_space = NULL;
+             seen_space2 = NULL;
+           }
+       }
+ 
+       s += count;
+       p += count;
+     }
+ }
+ 
+ 
+ #ifdef TEST
+ 
+ #include <stdio.h>
+ #include <string.h>
+ 
+ /* Read the contents of an input stream, and return it, terminated with a NUL
+    byte. */
+ char *
+ read_file (FILE *stream)
+ {
+ #define BUFSIZE 4096
+   char *buf = NULL;
+   int alloc = 0;
+   int size = 0;
+   int count;
+ 
+   while (! feof (stream))
+     {
+       if (size + BUFSIZE > alloc)
+       {
+         alloc = alloc + alloc / 2;
+         if (alloc < size + BUFSIZE)
+           alloc = size + BUFSIZE;
+         buf = realloc (buf, alloc);
+         if (buf == NULL)
+           {
+             fprintf (stderr, "out of memory\n");
+             exit (1);
+           }
+       }
+       count = fread (buf + size, 1, BUFSIZE, stream);
+       if (count == 0)
+       {
+         if (ferror (stream))
+           {
+             perror ("fread");
+             exit (1);
+           }
+       }
+       else
+       size += count;
+     }
+   buf = realloc (buf, size + 1);
+   if (buf == NULL)
+     {
+       fprintf (stderr, "out of memory\n");
+       exit (1);
+     }
+   buf[size] = '\0';
+   return buf;
+ #undef BUFSIZE
+ }
+ 
+ int
+ main (int argc, char * argv[])
+ {
+   if (argc == 1)
+     {
+       /* Display all the break opportunities in the input string.  */
+       char *input = read_file (stdin);
+       int length = strlen (input);
+       char *breaks = malloc (length);
+       int i;
+ 
+       u8_possible_linebreaks ((uint8_t *) input, length, "UTF-8", breaks);
+ 
+       for (i = 0; i < length; i++)
+       {
+         switch (breaks[i])
+           {
+           case UC_BREAK_POSSIBLE:
+             /* U+2027 in UTF-8 encoding */
+             putc (0xe2, stdout); putc (0x80, stdout); putc (0xa7, stdout);
+             break;
+           case UC_BREAK_MANDATORY:
+             /* U+21B2 (or U+21B5) in UTF-8 encoding */
+             putc (0xe2, stdout); putc (0x86, stdout); putc (0xb2, stdout);
+             break;
+           case UC_BREAK_PROHIBITED:
+             break;
+           default:
+             abort ();
+           }
+         putc (input[i], stdout);
+       }
+ 
+       free (breaks);
+ 
+       return 0;
+     }
+   else
+     return 1;
+ }
+ 
+ #endif /* TEST */

*** lib/unilbrk/u8-width-linebreaks.c.orig      2003-09-23 19:59:22.000000000 
+0200
--- lib/unilbrk/u8-width-linebreaks.c   2008-05-10 13:36:34.000000000 +0200
***************
*** 0 ****
--- 1,204 ----
+ /* Line breaking of UTF-8 strings.
+    Copyright (C) 2001-2003, 2006-2008 Free Software Foundation, Inc.
+    Written by Bruno Haible <address@hidden>, 2001.
+ 
+    This program is free software: you can redistribute it and/or modify it
+    under the terms of the GNU Lesser General Public License as published
+    by the Free Software Foundation; either version 3 of the License, or
+    (at your option) any later version.
+ 
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+    Lesser General Public License for more details.
+ 
+    You should have received a copy of the GNU Lesser General Public License
+    along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+ 
+ #include <config.h>
+ 
+ /* Specification.  */
+ #include "unilbrk.h"
+ 
+ #include "unistr.h"
+ #include "uniwidth.h"
+ 
+ int
+ u8_width_linebreaks (const uint8_t *s, size_t n,
+                    int width, int start_column, int at_end_columns,
+                    const char *o, const char *encoding,
+                    char *p)
+ {
+   const uint8_t *s_end;
+   char *last_p;
+   int last_column;
+   int piece_width;
+ 
+   u8_possible_linebreaks (s, n, encoding, p);
+ 
+   s_end = s + n;
+   last_p = NULL;
+   last_column = start_column;
+   piece_width = 0;
+   while (s < s_end)
+     {
+       ucs4_t uc;
+       int count = u8_mbtouc_unsafe (&uc, s, s_end - s);
+ 
+       /* Respect the override.  */
+       if (o != NULL && *o != UC_BREAK_UNDEFINED)
+       *p = *o;
+ 
+       if (*p == UC_BREAK_POSSIBLE || *p == UC_BREAK_MANDATORY)
+       {
+         /* An atomic piece of text ends here.  */
+         if (last_p != NULL && last_column + piece_width > width)
+           {
+             /* Insert a line break.  */
+             *last_p = UC_BREAK_POSSIBLE;
+             last_column = 0;
+           }
+       }
+ 
+       if (*p == UC_BREAK_MANDATORY)
+       {
+         /* uc is a line break character.  */
+         /* Start a new piece at column 0.  */
+         last_p = NULL;
+         last_column = 0;
+         piece_width = 0;
+       }
+       else
+       {
+         /* uc is not a line break character.  */
+         int w;
+ 
+         if (*p == UC_BREAK_POSSIBLE)
+           {
+             /* Start a new piece.  */
+             last_p = p;
+             last_column += piece_width;
+             piece_width = 0;
+             /* No line break for the moment, may be turned into
+                UC_BREAK_POSSIBLE later, via last_p. */
+           }
+ 
+         *p = UC_BREAK_PROHIBITED;
+ 
+         w = uc_width (uc, encoding);
+         if (w >= 0) /* ignore control characters in the string */
+           piece_width += w;
+       }
+ 
+       s += count;
+       p += count;
+       if (o != NULL)
+       o += count;
+     }
+ 
+   /* The last atomic piece of text ends here.  */
+   if (last_p != NULL && last_column + piece_width + at_end_columns > width)
+     {
+       /* Insert a line break.  */
+       *last_p = UC_BREAK_POSSIBLE;
+       last_column = 0;
+     }
+ 
+   return last_column + piece_width;
+ }
+ 
+ 
+ #ifdef TEST
+ 
+ #include <stdio.h>
+ #include <stdlib.h>
+ #include <string.h>
+ 
+ /* Read the contents of an input stream, and return it, terminated with a NUL
+    byte. */
+ char *
+ read_file (FILE *stream)
+ {
+ #define BUFSIZE 4096
+   char *buf = NULL;
+   int alloc = 0;
+   int size = 0;
+   int count;
+ 
+   while (! feof (stream))
+     {
+       if (size + BUFSIZE > alloc)
+       {
+         alloc = alloc + alloc / 2;
+         if (alloc < size + BUFSIZE)
+           alloc = size + BUFSIZE;
+         buf = realloc (buf, alloc);
+         if (buf == NULL)
+           {
+             fprintf (stderr, "out of memory\n");
+             exit (1);
+           }
+       }
+       count = fread (buf + size, 1, BUFSIZE, stream);
+       if (count == 0)
+       {
+         if (ferror (stream))
+           {
+             perror ("fread");
+             exit (1);
+           }
+       }
+       else
+       size += count;
+     }
+   buf = realloc (buf, size + 1);
+   if (buf == NULL)
+     {
+       fprintf (stderr, "out of memory\n");
+       exit (1);
+     }
+   buf[size] = '\0';
+   return buf;
+ #undef BUFSIZE
+ }
+ 
+ int
+ main (int argc, char * argv[])
+ {
+   if (argc == 2)
+     {
+       /* Insert line breaks for a given width.  */
+       int width = atoi (argv[1]);
+       char *input = read_file (stdin);
+       int length = strlen (input);
+       char *breaks = malloc (length);
+       int i;
+ 
+       u8_width_linebreaks ((uint8_t *) input, length, width, 0, 0, NULL, 
"UTF-8", breaks);
+ 
+       for (i = 0; i < length; i++)
+       {
+         switch (breaks[i])
+           {
+           case UC_BREAK_POSSIBLE:
+             putc ('\n', stdout);
+             break;
+           case UC_BREAK_MANDATORY:
+             break;
+           case UC_BREAK_PROHIBITED:
+             break;
+           default:
+             abort ();
+           }
+         putc (input[i], stdout);
+       }
+ 
+       free (breaks);
+ 
+       return 0;
+     }
+   else
+     return 1;
+ }
+ 
+ #endif /* TEST */

*** lib/unilbrk/ulc-common.c.orig       2003-09-23 19:59:22.000000000 +0200
--- lib/unilbrk/ulc-common.c    2008-05-10 11:54:22.000000000 +0200
***************
*** 0 ****
--- 1,169 ----
+ /* Line breaking auxiliary functions.
+    Copyright (C) 2001-2003, 2006-2008 Free Software Foundation, Inc.
+    Written by Bruno Haible <address@hidden>, 2001.
+ 
+    This program is free software: you can redistribute it and/or modify it
+    under the terms of the GNU Lesser General Public License as published
+    by the Free Software Foundation; either version 3 of the License, or
+    (at your option) any later version.
+ 
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+    Lesser General Public License for more details.
+ 
+    You should have received a copy of the GNU Lesser General Public License
+    along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+ 
+ #include <config.h>
+ 
+ /* Specification.  */
+ #include "unilbrk/ulc-common.h"
+ 
+ #include <stdlib.h>
+ 
+ #include "c-ctype.h"
+ #include "streq.h"
+ 
+ int
+ is_utf8_encoding (const char *encoding)
+ {
+   if (STREQ (encoding, "UTF-8", 'U', 'T', 'F', '-', '8', 0, 0, 0 ,0))
+     return 1;
+   return 0;
+ }
+ 
+ #if HAVE_ICONV
+ 
+ # include <errno.h>
+ 
+ size_t
+ iconv_string_length (iconv_t cd, const char *s, size_t n)
+ {
+ # define TMPBUFSIZE 4096
+   size_t count = 0;
+   char tmpbuf[TMPBUFSIZE];
+   const char *inptr = s;
+   size_t insize = n;
+ 
+   while (insize > 0)
+     {
+       char *outptr = tmpbuf;
+       size_t outsize = TMPBUFSIZE;
+       size_t res = iconv (cd, (ICONV_CONST char **) &inptr, &insize, &outptr, 
&outsize);
+       if (res == (size_t)(-1) && errno != E2BIG
+ # if !defined _LIBICONV_VERSION && !defined __GLIBC__
+         /* Irix iconv() inserts a NUL byte if it cannot convert.
+            NetBSD iconv() inserts a question mark if it cannot convert.
+            Only GNU libiconv and GNU libc are known to prefer to fail rather
+            than doing a lossy conversion.  */
+         || res > 0
+ # endif
+        )
+       return (size_t)(-1);
+       count += outptr - tmpbuf;
+     }
+   /* Avoid glibc-2.1 bug and Solaris 7 through 9 bug.  */
+ # if defined _LIBICONV_VERSION \
+      || !((__GLIBC__ - 0 == 2 && __GLIBC_MINOR__ - 0 <= 1) || defined __sun)
+   {
+     char *outptr = tmpbuf;
+     size_t outsize = TMPBUFSIZE;
+     size_t res = iconv (cd, NULL, NULL, &outptr, &outsize);
+     if (res == (size_t)(-1))
+       return (size_t)(-1);
+     count += outptr - tmpbuf;
+   }
+   /* Return to the initial state.  */
+   iconv (cd, NULL, NULL, NULL, NULL);
+ # endif
+   return count;
+ # undef TMPBUFSIZE
+ }
+ 
+ void
+ iconv_string_keeping_offsets (iconv_t cd, const char *s, size_t n,
+                             size_t *offtable, char *t, size_t m)
+ {
+   size_t i;
+   const char *s_end;
+   const char *inptr;
+   char *outptr;
+   size_t outsize;
+   /* Avoid glibc-2.1 bug.  */
+ # if !defined _LIBICONV_VERSION && (__GLIBC__ - 0 == 2 && __GLIBC_MINOR__ - 0 
<= 1)
+   const size_t extra = 1;
+ # else
+   const size_t extra = 0;
+ # endif
+ 
+   for (i = 0; i < n; i++)
+     offtable[i] = (size_t)(-1);
+ 
+   s_end = s + n;
+   inptr = s;
+   outptr = t;
+   outsize = m + extra;
+   while (inptr < s_end)
+     {
+       const char *saved_inptr;
+       size_t insize;
+       size_t res;
+ 
+       offtable[inptr - s] = outptr - t;
+ 
+       saved_inptr = inptr;
+       res = (size_t)(-1);
+       for (insize = 1; inptr + insize <= s_end; insize++)
+       {
+         res = iconv (cd, (ICONV_CONST char **) &inptr, &insize, &outptr, 
&outsize);
+         if (!(res == (size_t)(-1) && errno == EINVAL))
+           break;
+         /* We expect that no input bytes have been consumed so far.  */
+         if (inptr != saved_inptr)
+           abort ();
+       }
+       /* After we verified the convertibility and computed the translation's
+        size m, there shouldn't be any conversion error here. */
+       if (res == (size_t)(-1)
+ # if !defined _LIBICONV_VERSION && !defined __GLIBC__
+         /* Irix iconv() inserts a NUL byte if it cannot convert.
+            NetBSD iconv() inserts a question mark if it cannot convert.
+            Only GNU libiconv and GNU libc are known to prefer to fail rather
+            than doing a lossy conversion.  */
+         || res > 0
+ # endif
+        )
+       abort ();
+     }
+   /* Avoid glibc-2.1 bug and Solaris 7 bug.  */
+ # if defined _LIBICONV_VERSION \
+      || !((__GLIBC__ - 0 == 2 && __GLIBC_MINOR__ - 0 <= 1) || defined __sun)
+   if (iconv (cd, NULL, NULL, &outptr, &outsize) == (size_t)(-1))
+     abort ();
+ # endif
+   /* We should have produced exactly m output bytes.  */
+   if (outsize != extra)
+     abort ();
+ }
+ 
+ #endif /* HAVE_ICONV */
+ 
+ #if C_CTYPE_ASCII
+ 
+ /* Tests whether a string is entirely ASCII.  Returns 1 if yes.
+    Returns 0 if the string is in an 8-bit encoding or an ISO-2022 encoding.  
*/
+ int
+ is_all_ascii (const char *s, size_t n)
+ {
+   for (; n > 0; s++, n--)
+     {
+       unsigned char c = (unsigned char) *s;
+ 
+       if (!(c_isprint (c) || c_isspace (c)))
+       return 0;
+     }
+   return 1;
+ }
+ 
+ #endif /* C_CTYPE_ASCII */

*** lib/unilbrk/ulc-common.h.orig       2003-09-23 19:59:22.000000000 +0200
--- lib/unilbrk/ulc-common.h    2008-05-10 11:44:56.000000000 +0200
***************
*** 0 ****
--- 1,47 ----
+ /* Line breaking auxiliary functions.
+    Copyright (C) 2001-2003, 2006-2008 Free Software Foundation, Inc.
+    Written by Bruno Haible <address@hidden>, 2001.
+ 
+    This program is free software: you can redistribute it and/or modify it
+    under the terms of the GNU Lesser General Public License as published
+    by the Free Software Foundation; either version 3 of the License, or
+    (at your option) any later version.
+ 
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+    Lesser General Public License for more details.
+ 
+    You should have received a copy of the GNU Lesser General Public License
+    along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+ 
+ /* Get size_t.  */
+ #include <stddef.h>
+ 
+ #include "c-ctype.h"
+ 
+ #define is_utf8_encoding unilbrk_is_utf8_encoding
+ extern int is_utf8_encoding (const char *encoding);
+ 
+ #if HAVE_ICONV
+ 
+ # include <iconv.h>
+ 
+ /* Luckily, the encoding's name is platform independent.  */
+ # define UTF8_NAME "UTF-8"
+ 
+ /* Return the length of a string after conversion through an iconv_t.  */
+ # define iconv_string_length unilbrk_iconv_string_length
+ extern size_t iconv_string_length (iconv_t cd, const char *s, size_t n);
+ 
+ # define iconv_string_keeping_offsets unilbrk_iconv_string_keeping_offsets
+ extern void iconv_string_keeping_offsets (iconv_t cd, const char *s, size_t 
n, size_t *offtable, char *t, size_t m);
+ 
+ #endif /* HAVE_ICONV */
+ 
+ #if C_CTYPE_ASCII
+ 
+ # define is_all_ascii unilbrk_is_all_ascii
+ extern int is_all_ascii (const char *s, size_t n);
+ 
+ #endif /* C_CTYPE_ASCII */

*** lib/unilbrk/ulc-possible-linebreaks.c.orig  2003-09-23 19:59:22.000000000 
+0200
--- lib/unilbrk/ulc-possible-linebreaks.c       2008-05-10 12:28:16.000000000 
+0200
***************
*** 0 ****
--- 1,233 ----
+ /* Line breaking of strings.
+    Copyright (C) 2001-2003, 2006-2008 Free Software Foundation, Inc.
+    Written by Bruno Haible <address@hidden>, 2001.
+ 
+    This program is free software: you can redistribute it and/or modify it
+    under the terms of the GNU Lesser General Public License as published
+    by the Free Software Foundation; either version 3 of the License, or
+    (at your option) any later version.
+ 
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+    Lesser General Public License for more details.
+ 
+    You should have received a copy of the GNU Lesser General Public License
+    along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+ 
+ #include <config.h>
+ 
+ /* Specification.  */
+ #include "unilbrk.h"
+ 
+ #include <stdlib.h>
+ #if HAVE_ICONV
+ # include <iconv.h>
+ #endif
+ 
+ #include "c-ctype.h"
+ #include "streq.h"
+ #include "xsize.h"
+ #include "unilbrk/ulc-common.h"
+ 
+ /* Line breaking of a string in an arbitrary encoding.
+ 
+    We convert the input string to Unicode.
+ 
+    The standardized Unicode encodings are UTF-8, UCS-2, UCS-4, UTF-16,
+    UTF-16BE, UTF-16LE, UTF-7.  UCS-2 supports only characters up to
+    \U0000FFFF.  UTF-16 and variants support only characters up to
+    \U0010FFFF.  UTF-7 is way too complex and not supported by glibc-2.1.
+    UCS-4 specification leaves doubts about endianness and byte order mark.
+    glibc currently interprets it as big endian without byte order mark,
+    but this is not backed by an RFC.  So we use UTF-8. It supports
+    characters up to \U7FFFFFFF and is unambiguously defined.  */
+ 
+ void
+ ulc_possible_linebreaks (const char *s, size_t n, const char *encoding,
+                        char *p)
+ {
+   if (n == 0)
+     return;
+   if (is_utf8_encoding (encoding))
+     u8_possible_linebreaks ((const uint8_t *) s, n, encoding, p);
+   else
+     {
+ #if HAVE_ICONV
+       iconv_t to_utf8;
+       /* Avoid glibc-2.1 bug with EUC-KR.  */
+ # if (__GLIBC__ - 0 == 2 && __GLIBC_MINOR__ - 0 <= 1) && !defined 
_LIBICONV_VERSION
+       if (STREQ (encoding, "EUC-KR", 'E', 'U', 'C', '-', 'K', 'R', 0, 0, 0))
+       to_utf8 = (iconv_t)(-1);
+       else
+ # endif
+       /* Avoid Solaris 9 bug with GB2312, EUC-TW, BIG5, BIG5-HKSCS, GBK,
+        GB18030.  */
+ # if defined __sun && !defined _LIBICONV_VERSION
+       if (   STREQ (encoding, "GB2312", 'G', 'B', '2', '3', '1', '2', 0, 0, 0)
+         || STREQ (encoding, "EUC-TW", 'E', 'U', 'C', '-', 'T', 'W', 0, 0, 0)
+         || STREQ (encoding, "BIG5", 'B', 'I', 'G', '5', 0, 0, 0, 0, 0)
+         || STREQ (encoding, "BIG5-HKSCS", 'B', 'I', 'G', '5', '-', 'H', 'K', 
'S', 'C')
+         || STREQ (encoding, "GBK", 'G', 'B', 'K', 0, 0, 0, 0, 0, 0)
+         || STREQ (encoding, "GB18030", 'G', 'B', '1', '8', '0', '3', '0', 0, 
0))
+       to_utf8 = (iconv_t)(-1);
+       else
+ # endif
+       to_utf8 = iconv_open (UTF8_NAME, encoding);
+       if (to_utf8 != (iconv_t)(-1))
+       {
+         /* Determine the length of the resulting UTF-8 string.  */
+         size_t m = iconv_string_length (to_utf8, s, n);
+         if (m != (size_t)(-1))
+           {
+             /* Convert the string to UTF-8 and build a translation table
+                from offsets into s to offsets into the translated string.  */
+             size_t memory_size = xsum3 (xtimes (n, sizeof (size_t)), m, m);
+             char *memory =
+               (size_in_bounds_p (memory_size) ? malloc (memory_size) : NULL);
+             if (memory != NULL)
+               {
+                 size_t *offtable = (size_t *) memory;
+                 char *t = (char *) (offtable + n);
+                 char *q = (char *) (t + m);
+                 size_t i;
+ 
+                 iconv_string_keeping_offsets (to_utf8, s, n, offtable, t, m);
+ 
+                 /* Determine the possible line breaks of the UTF-8 string.  */
+                 u8_possible_linebreaks ((const uint8_t *) t, m, encoding, q);
+ 
+                 /* Translate the result back to the original string.  */
+                 memset (p, UC_BREAK_PROHIBITED, n);
+                 for (i = 0; i < n; i++)
+                   if (offtable[i] != (size_t)(-1))
+                     p[i] = q[offtable[i]];
+ 
+                 free (memory);
+                 iconv_close (to_utf8);
+                 return;
+               }
+           }
+         iconv_close (to_utf8);
+       }
+ #endif
+       /* Impossible to convert.  */
+ #if C_CTYPE_ASCII
+       if (is_all_ascii (s, n))
+       {
+         /* ASCII is a subset of UTF-8.  */
+         u8_possible_linebreaks ((const uint8_t *) s, n, encoding, p);
+         return;
+       }
+ #endif
+       /* We have a non-ASCII string and cannot convert it.
+        Don't produce line breaks except those already present in the
+        input string.  All we assume here is that the encoding is
+        minimally ASCII compatible.  */
+       {
+       const char *s_end = s + n;
+       while (s < s_end)
+         {
+           *p = (*s == '\n' ? UC_BREAK_MANDATORY : UC_BREAK_PROHIBITED);
+           s++;
+           p++;
+         }
+       }
+     }
+ }
+ 
+ 
+ #ifdef TEST
+ 
+ #include <stdio.h>
+ #include <locale.h>
+ #include <string.h>
+ 
+ /* Read the contents of an input stream, and return it, terminated with a NUL
+    byte. */
+ char *
+ read_file (FILE *stream)
+ {
+ #define BUFSIZE 4096
+   char *buf = NULL;
+   int alloc = 0;
+   int size = 0;
+   int count;
+ 
+   while (! feof (stream))
+     {
+       if (size + BUFSIZE > alloc)
+       {
+         alloc = alloc + alloc / 2;
+         if (alloc < size + BUFSIZE)
+           alloc = size + BUFSIZE;
+         buf = realloc (buf, alloc);
+         if (buf == NULL)
+           {
+             fprintf (stderr, "out of memory\n");
+             exit (1);
+           }
+       }
+       count = fread (buf + size, 1, BUFSIZE, stream);
+       if (count == 0)
+       {
+         if (ferror (stream))
+           {
+             perror ("fread");
+             exit (1);
+           }
+       }
+       else
+       size += count;
+     }
+   buf = realloc (buf, size + 1);
+   if (buf == NULL)
+     {
+       fprintf (stderr, "out of memory\n");
+       exit (1);
+     }
+   buf[size] = '\0';
+   return buf;
+ #undef BUFSIZE
+ }
+ 
+ int
+ main (int argc, char * argv[])
+ {
+   setlocale (LC_CTYPE, "");
+   if (argc == 1)
+     {
+       /* Display all the break opportunities in the input string.  */
+       char *input = read_file (stdin);
+       int length = strlen (input);
+       char *breaks = malloc (length);
+       int i;
+ 
+       ulc_possible_linebreaks (input, length, locale_charset (), breaks);
+ 
+       for (i = 0; i < length; i++)
+       {
+         switch (breaks[i])
+           {
+           case UC_BREAK_POSSIBLE:
+             putc ('|', stdout);
+             break;
+           case UC_BREAK_MANDATORY:
+             break;
+           case UC_BREAK_PROHIBITED:
+             break;
+           default:
+             abort ();
+           }
+         putc (input[i], stdout);
+       }
+ 
+       free (breaks);
+ 
+       return 0;
+     }
+   else
+     return 1;
+ }
+ 
+ #endif /* TEST */

*** lib/unilbrk/ulc-width-linebreaks.c.orig     2003-09-23 19:59:22.000000000 
+0200
--- lib/unilbrk/ulc-width-linebreaks.c  2008-05-10 12:03:49.000000000 +0200
***************
*** 0 ****
--- 1,256 ----
+ /* Line breaking of strings.
+    Copyright (C) 2001-2003, 2006-2008 Free Software Foundation, Inc.
+    Written by Bruno Haible <address@hidden>, 2001.
+ 
+    This program is free software: you can redistribute it and/or modify it
+    under the terms of the GNU Lesser General Public License as published
+    by the Free Software Foundation; either version 3 of the License, or
+    (at your option) any later version.
+ 
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+    Lesser General Public License for more details.
+ 
+    You should have received a copy of the GNU Lesser General Public License
+    along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+ 
+ #include <config.h>
+ 
+ /* Specification.  */
+ #include "unilbrk.h"
+ 
+ #include <stdlib.h>
+ #include <string.h>
+ #if HAVE_ICONV
+ # include <iconv.h>
+ #endif
+ 
+ #include "c-ctype.h"
+ #include "streq.h"
+ #include "xsize.h"
+ #include "unilbrk/ulc-common.h"
+ 
+ /* Line breaking of a string in an arbitrary encoding.
+ 
+    We convert the input string to Unicode.
+ 
+    The standardized Unicode encodings are UTF-8, UCS-2, UCS-4, UTF-16,
+    UTF-16BE, UTF-16LE, UTF-7.  UCS-2 supports only characters up to
+    \U0000FFFF.  UTF-16 and variants support only characters up to
+    \U0010FFFF.  UTF-7 is way too complex and not supported by glibc-2.1.
+    UCS-4 specification leaves doubts about endianness and byte order mark.
+    glibc currently interprets it as big endian without byte order mark,
+    but this is not backed by an RFC.  So we use UTF-8. It supports
+    characters up to \U7FFFFFFF and is unambiguously defined.  */
+ 
+ int
+ ulc_width_linebreaks (const char *s, size_t n,
+                     int width, int start_column, int at_end_columns,
+                     const char *o, const char *encoding,
+                     char *p)
+ {
+   if (n == 0)
+     return start_column;
+   if (is_utf8_encoding (encoding))
+     return u8_width_linebreaks ((const uint8_t *) s, n, width, start_column, 
at_end_columns, o, encoding, p);
+   else
+     {
+ #if HAVE_ICONV
+       iconv_t to_utf8;
+       /* Avoid glibc-2.1 bug with EUC-KR.  */
+ # if (__GLIBC__ - 0 == 2 && __GLIBC_MINOR__ - 0 <= 1) && !defined 
_LIBICONV_VERSION
+       if (STREQ (encoding, "EUC-KR", 'E', 'U', 'C', '-', 'K', 'R', 0, 0, 0))
+       to_utf8 = (iconv_t)(-1);
+       else
+ # endif
+       /* Avoid Solaris 9 bug with GB2312, EUC-TW, BIG5, BIG5-HKSCS, GBK,
+        GB18030.  */
+ # if defined __sun && !defined _LIBICONV_VERSION
+       if (   STREQ (encoding, "GB2312", 'G', 'B', '2', '3', '1', '2', 0, 0, 0)
+         || STREQ (encoding, "EUC-TW", 'E', 'U', 'C', '-', 'T', 'W', 0, 0, 0)
+         || STREQ (encoding, "BIG5", 'B', 'I', 'G', '5', 0, 0, 0, 0, 0)
+         || STREQ (encoding, "BIG5-HKSCS", 'B', 'I', 'G', '5', '-', 'H', 'K', 
'S', 'C')
+         || STREQ (encoding, "GBK", 'G', 'B', 'K', 0, 0, 0, 0, 0, 0)
+         || STREQ (encoding, "GB18030", 'G', 'B', '1', '8', '0', '3', '0', 0, 
0))
+       to_utf8 = (iconv_t)(-1);
+       else
+ # endif
+       to_utf8 = iconv_open (UTF8_NAME, encoding);
+       if (to_utf8 != (iconv_t)(-1))
+       {
+         /* Determine the length of the resulting UTF-8 string.  */
+         size_t m = iconv_string_length (to_utf8, s, n);
+         if (m != (size_t)(-1))
+           {
+             /* Convert the string to UTF-8 and build a translation table
+                from offsets into s to offsets into the translated string.  */
+             size_t memory_size =
+               xsum4 (xtimes (n, sizeof (size_t)), m, m,
+                      (o != NULL ? m : 0));
+             char *memory =
+               (char *)
+               (size_in_bounds_p (memory_size) ? malloc (memory_size) : NULL);
+             if (memory != NULL)
+               {
+                 size_t *offtable = (size_t *) memory;
+                 char *t = (char *) (offtable + n);
+                 char *q = (char *) (t + m);
+                 char *o8 = (o != NULL ? (char *) (q + m) : NULL);
+                 int res_column;
+                 size_t i;
+ 
+                 iconv_string_keeping_offsets (to_utf8, s, n, offtable, t, m);
+ 
+                 /* Translate the overrides to the UTF-8 string.  */
+                 if (o != NULL)
+                   {
+                     memset (o8, UC_BREAK_UNDEFINED, m);
+                     for (i = 0; i < n; i++)
+                       if (offtable[i] != (size_t)(-1))
+                         o8[offtable[i]] = o[i];
+                   }
+ 
+                 /* Determine the line breaks of the UTF-8 string.  */
+                 res_column =
+                   u8_width_linebreaks ((const uint8_t *) t, m, width, 
start_column, at_end_columns, o8, encoding, q);
+ 
+                 /* Translate the result back to the original string.  */
+                 memset (p, UC_BREAK_PROHIBITED, n);
+                 for (i = 0; i < n; i++)
+                   if (offtable[i] != (size_t)(-1))
+                     p[i] = q[offtable[i]];
+ 
+                 free (memory);
+                 iconv_close (to_utf8);
+                 return res_column;
+               }
+           }
+         iconv_close (to_utf8);
+       }
+ #endif
+       /* Impossible to convert.  */
+ #if C_CTYPE_ASCII
+       if (is_all_ascii (s, n))
+       {
+         /* ASCII is a subset of UTF-8.  */
+         return u8_width_linebreaks ((const uint8_t *) s, n, width, 
start_column, at_end_columns, o, encoding, p);
+       }
+ #endif
+       /* We have a non-ASCII string and cannot convert it.
+        Don't produce line breaks except those already present in the
+        input string.  All we assume here is that the encoding is
+        minimally ASCII compatible.  */
+       {
+       const char *s_end = s + n;
+       while (s < s_end)
+         {
+           *p = ((o != NULL && *o == UC_BREAK_MANDATORY) || *s == '\n'
+                 ? UC_BREAK_MANDATORY
+                 : UC_BREAK_PROHIBITED);
+           s++;
+           p++;
+           if (o != NULL)
+             o++;
+         }
+       /* We cannot compute widths in this case.  */
+       return start_column;
+       }
+     }
+ }
+ 
+ 
+ #ifdef TEST
+ 
+ #include <stdio.h>
+ #include <locale.h>
+ 
+ /* Read the contents of an input stream, and return it, terminated with a NUL
+    byte. */
+ char *
+ read_file (FILE *stream)
+ {
+ #define BUFSIZE 4096
+   char *buf = NULL;
+   int alloc = 0;
+   int size = 0;
+   int count;
+ 
+   while (! feof (stream))
+     {
+       if (size + BUFSIZE > alloc)
+       {
+         alloc = alloc + alloc / 2;
+         if (alloc < size + BUFSIZE)
+           alloc = size + BUFSIZE;
+         buf = realloc (buf, alloc);
+         if (buf == NULL)
+           {
+             fprintf (stderr, "out of memory\n");
+             exit (1);
+           }
+       }
+       count = fread (buf + size, 1, BUFSIZE, stream);
+       if (count == 0)
+       {
+         if (ferror (stream))
+           {
+             perror ("fread");
+             exit (1);
+           }
+       }
+       else
+       size += count;
+     }
+   buf = realloc (buf, size + 1);
+   if (buf == NULL)
+     {
+       fprintf (stderr, "out of memory\n");
+       exit (1);
+     }
+   buf[size] = '\0';
+   return buf;
+ #undef BUFSIZE
+ }
+ 
+ int
+ main (int argc, char * argv[])
+ {
+   setlocale (LC_CTYPE, "");
+   if (argc == 2)
+     {
+       /* Insert line breaks for a given width.  */
+       int width = atoi (argv[1]);
+       char *input = read_file (stdin);
+       int length = strlen (input);
+       char *breaks = malloc (length);
+       int i;
+ 
+       ulc_width_linebreaks (input, length, width, 0, 0, NULL, locale_charset 
(), breaks);
+ 
+       for (i = 0; i < length; i++)
+       {
+         switch (breaks[i])
+           {
+           case UC_BREAK_POSSIBLE:
+             putc ('\n', stdout);
+             break;
+           case UC_BREAK_MANDATORY:
+             break;
+           case UC_BREAK_PROHIBITED:
+             break;
+           default:
+             abort ();
+           }
+         putc (input[i], stdout);
+       }
+ 
+       free (breaks);
+ 
+       return 0;
+     }
+   else
+     return 1;
+ }
+ 
+ #endif /* TEST */
*** modules/unilbrk/base.orig   2003-09-23 19:59:22.000000000 +0200
--- modules/unilbrk/base        2008-05-10 03:24:03.000000000 +0200
***************
*** 0 ****
--- 1,23 ----
+ Description:
+ Base layer for line breaking.
+ 
+ Files:
+ lib/unilbrk.h
+ 
+ Depends-on:
+ unitypes
+ localcharset
+ 
+ configure.ac:
+ 
+ Makefile.am:
+ 
+ Include:
+ "unilbrk.h"
+ 
+ License:
+ LGPL
+ 
+ Maintainer:
+ Bruno Haible
+ 

*** modules/unilbrk/tables.orig 2003-09-23 19:59:22.000000000 +0200
--- modules/unilbrk/tables      2008-05-10 14:12:15.000000000 +0200
***************
*** 0 ****
--- 1,26 ----
+ Description:
+ Line breaking auxiliary tables.
+ 
+ Files:
+ lib/unilbrk/tables.h
+ lib/unilbrk/tables.c
+ lib/unilbrk/lbrkprop1.h
+ lib/unilbrk/lbrkprop2.h
+ 
+ Depends-on:
+ 
+ configure.ac:
+ AC_REQUIRE([AC_C_INLINE])
+ 
+ Makefile.am:
+ lib_SOURCES += unilbrk/tables.c
+ 
+ Include:
+ "unilbrk/tables.h"
+ 
+ License:
+ LGPL
+ 
+ Maintainer:
+ Bruno Haible
+ 

*** modules/unilbrk/u16-possible-linebreaks.orig        2003-09-23 
19:59:22.000000000 +0200
--- modules/unilbrk/u16-possible-linebreaks     2008-05-10 05:02:05.000000000 
+0200
***************
*** 0 ****
--- 1,27 ----
+ Description:
+ Line breaking of UTF-16 strings.
+ 
+ Files:
+ lib/unilbrk/u16-possible-linebreaks.c
+ lib/uniwidth/cjk.h
+ 
+ Depends-on:
+ unilbrk/base
+ unilbrk/tables
+ utf16-ucs4-unsafe
+ streq 
+ 
+ configure.ac:
+ 
+ Makefile.am:
+ lib_SOURCES += unilbrk/u16-possible-linebreaks.c
+ 
+ Include:
+ "unilbrk.h"
+ 
+ License:
+ LGPL
+ 
+ Maintainer:
+ Bruno Haible
+ 

*** modules/unilbrk/u16-width-linebreaks.orig   2003-09-23 19:59:22.000000000 
+0200
--- modules/unilbrk/u16-width-linebreaks        2008-05-10 04:21:55.000000000 
+0200
***************
*** 0 ****
--- 1,26 ----
+ Description:
+ Line breaking of UTF-16 strings.
+ 
+ Files:
+ lib/unilbrk/u16-width-linebreaks.c
+ 
+ Depends-on:
+ unilbrk/base
+ unilbrk/u16-possible-linebreaks
+ uniwidth/width
+ utf16-ucs4-unsafe
+ 
+ configure.ac:
+ 
+ Makefile.am:
+ lib_SOURCES += unilbrk/u16-width-linebreaks.c
+ 
+ Include:
+ "unilbrk.h"
+ 
+ License:
+ LGPL
+ 
+ Maintainer:
+ Bruno Haible
+ 

*** modules/unilbrk/u32-possible-linebreaks.orig        2003-09-23 
19:59:22.000000000 +0200
--- modules/unilbrk/u32-possible-linebreaks     2008-05-10 05:02:06.000000000 
+0200
***************
*** 0 ****
--- 1,26 ----
+ Description:
+ Line breaking of UTF-32 strings.
+ 
+ Files:
+ lib/unilbrk/u32-possible-linebreaks.c
+ lib/uniwidth/cjk.h
+ 
+ Depends-on:
+ unilbrk/base
+ unilbrk/tables
+ streq 
+ 
+ configure.ac:
+ 
+ Makefile.am:
+ lib_SOURCES += unilbrk/u32-possible-linebreaks.c
+ 
+ Include:
+ "unilbrk.h"
+ 
+ License:
+ LGPL
+ 
+ Maintainer:
+ Bruno Haible
+ 

*** modules/unilbrk/u32-width-linebreaks.orig   2003-09-23 19:59:22.000000000 
+0200
--- modules/unilbrk/u32-width-linebreaks        2008-05-10 04:21:56.000000000 
+0200
***************
*** 0 ****
--- 1,25 ----
+ Description:
+ Line breaking of UTF-32 strings.
+ 
+ Files:
+ lib/unilbrk/u32-width-linebreaks.c
+ 
+ Depends-on:
+ unilbrk/base
+ unilbrk/u32-possible-linebreaks
+ uniwidth/width
+ 
+ configure.ac:
+ 
+ Makefile.am:
+ lib_SOURCES += unilbrk/u32-width-linebreaks.c
+ 
+ Include:
+ "unilbrk.h"
+ 
+ License:
+ LGPL
+ 
+ Maintainer:
+ Bruno Haible
+ 

*** modules/unilbrk/u8-possible-linebreaks.orig 2003-09-23 19:59:22.000000000 
+0200
--- modules/unilbrk/u8-possible-linebreaks      2008-05-10 05:02:04.000000000 
+0200
***************
*** 0 ****
--- 1,27 ----
+ Description:
+ Line breaking of UTF-8 strings.
+ 
+ Files:
+ lib/unilbrk/u8-possible-linebreaks.c
+ lib/uniwidth/cjk.h
+ 
+ Depends-on:
+ unilbrk/base
+ unilbrk/tables
+ utf8-ucs4-unsafe
+ streq 
+ 
+ configure.ac:
+ 
+ Makefile.am:
+ lib_SOURCES += unilbrk/u8-possible-linebreaks.c
+ 
+ Include:
+ "unilbrk.h"
+ 
+ License:
+ LGPL
+ 
+ Maintainer:
+ Bruno Haible
+ 

*** modules/unilbrk/u8-width-linebreaks.orig    2003-09-23 19:59:22.000000000 
+0200
--- modules/unilbrk/u8-width-linebreaks 2008-05-10 04:21:53.000000000 +0200
***************
*** 0 ****
--- 1,26 ----
+ Description:
+ Line breaking of UTF-8 strings.
+ 
+ Files:
+ lib/unilbrk/u8-width-linebreaks.c
+ 
+ Depends-on:
+ unilbrk/base
+ unilbrk/u8-possible-linebreaks
+ uniwidth/width
+ utf8-ucs4-unsafe
+ 
+ configure.ac:
+ 
+ Makefile.am:
+ lib_SOURCES += unilbrk/u8-width-linebreaks.c
+ 
+ Include:
+ "unilbrk.h"
+ 
+ License:
+ LGPL
+ 
+ Maintainer:
+ Bruno Haible
+ 

*** modules/unilbrk/ulc-common.orig     2003-09-23 19:59:22.000000000 +0200
--- modules/unilbrk/ulc-common  2008-05-10 03:47:41.000000000 +0200
***************
*** 0 ****
--- 1,26 ----
+ Description:
+ Line breaking auxiliary functions.
+ 
+ Files:
+ lib/unilbrk/ulc-common.h
+ lib/unilbrk/ulc-common.c
+ 
+ Depends-on:
+ c-ctype
+ iconv
+ streq
+ 
+ configure.ac:
+ 
+ Makefile.am:
+ lib_SOURCES += unilbrk/ulc-common.c
+ 
+ Include:
+ "unilbrk/ulc-common.h"
+ 
+ License:
+ LGPL
+ 
+ Maintainer:
+ Bruno Haible
+ 

*** modules/unilbrk/ulc-possible-linebreaks.orig        2003-09-23 
19:59:22.000000000 +0200
--- modules/unilbrk/ulc-possible-linebreaks     2008-05-10 04:05:22.000000000 
+0200
***************
*** 0 ****
--- 1,29 ----
+ Description:
+ Line breaking of strings.
+ 
+ Files:
+ lib/unilbrk/ulc-possible-linebreaks.c
+ 
+ Depends-on:
+ unilbrk/base
+ unilbrk/u8-possible-linebreaks
+ unilbrk/ulc-common
+ c-ctype
+ iconv_open
+ streq
+ xsize
+ 
+ configure.ac:
+ 
+ Makefile.am:
+ lib_SOURCES += unilbrk/ulc-possible-linebreaks.c
+ 
+ Include:
+ "unilbrk.h"
+ 
+ License:
+ LGPL
+ 
+ Maintainer:
+ Bruno Haible
+ 

*** modules/unilbrk/ulc-width-linebreaks.orig   2003-09-23 19:59:22.000000000 
+0200
--- modules/unilbrk/ulc-width-linebreaks        2008-05-10 04:05:46.000000000 
+0200
***************
*** 0 ****
--- 1,29 ----
+ Description:
+ Line breaking of strings.
+ 
+ Files:
+ lib/unilbrk/ulc-width-linebreaks.c
+ 
+ Depends-on:
+ unilbrk/base
+ unilbrk/u8-width-linebreaks
+ unilbrk/ulc-common
+ c-ctype
+ iconv_open
+ streq
+ xsize
+ 
+ configure.ac:
+ 
+ Makefile.am:
+ lib_SOURCES += unilbrk/ulc-width-linebreaks.c
+ 
+ Include:
+ "unilbrk.h"
+ 
+ License:
+ LGPL
+ 
+ Maintainer:
+ Bruno Haible
+
[Prev in Thread]
Current Thread
[Next in Thread]
split the linebreak module, Bruno Haible <=
- Re: split the linebreak module, Bruno Haible, 2008/05/10
- Re: split the linebreak module, Bruno Haible, 2008/05/10
- Re: split the linebreak module, Bruno Haible, 2008/05/11
- unilbrk: update to Unicode 5.0, Bruno Haible, 2008/05/12
  - Re: unilbrk: update to Unicode 5.0, Bruno Haible, 2008/05/12
Prev by Date: Re: xstring.h
Next by Date: Re: split the linebreak module
Previous by thread: Re: chdir-safer: port to mingw
Next by thread: Re: split the linebreak module
Index(es):
- Date
- Thread