groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Groff] Spanish hyphenation


From: Paco Andres Verdu
Subject: Re: [Groff] Spanish hyphenation
Date: Sun, 10 Sep 2000 21:16:13 +0200 (CEST)

Hi,

On Thu, 7 Sep 2000 address@hidden wrote:

> Does anyone out there understand TeX hyphenation files well enough
> to say what a Spanish hyphenation file for groff -- say "hyphen.es"
> (corresponding to groff's US hypehantion file ../groff/tmac/hyphen.us,
> identical to  the patterns file ushyph1.tex in TeX) should be like?
> 
> I have been contemplating TeX's sphyph.tex without being able to make
> too much sense of it.

    The shyphen.sh shell script, which is used to generate the sphyph.tex file 
usually shipped with TeX distributions is attached to this message, you can get
the original versions of both files at
ftp://ftp.cdrom.com/pub/tex/ctan/languages/spanish/hyphen/

     Maybe you find easier to modify the script that generates the
hyphenation file rather than the hyphenation file itself.

     I've used groff for Spanish documents but without hyphenation, and
I've never used hyphenation neither in groff nor TeX, so I'll be unable
to help with the development. But if it is helpful I can test your 
hyphenation file with some of my Spanish language groff files and tell you 
if the result is right, or which are the flaws.


Paco

 
--
                              Saludos
-----------------------------------------------------------------------------
Paco Andrés Verdú                                     address@hidden
Alicante (Spain)                                              


#!/bin/sh
# file: shyphen.sh Version: 1.2
#          Got at:     91/09/25 13:23:04
#          Delta made: 91/09/25 13:23:03
version=1.2
#    This script generates TeX hyphenation patterns for Spanish
#    This script is Copyright (c) GMV, 1991
#    The copyright notice below applies to this script as well,
#    read it before using this software.
#
# Usage: script [TeX] [ftc] [isolatin1] [ugly] [hiatus]
#
#       TeX     diacritics are done as in plain TeX and LaTeX but
#               without the escape character: 'a 'e 'i 'o 'u "u ~n
#       ftc     diacritics for the above are specified using the
#               ftc conventions: 'a 'e 'i 'o 'u :u 'n
#       isolatin1 means using the respective character codes in
#               IS 8859/1 (ISO Latin Alphabet 1)
#       ugly    will prevent legal but undesirable breaks.
#       hiatus  Allow break between strong vowels. Don't do it.
#
# Default is no support for diacritics. You can use combinations
# of the above and the number of patterns will grow fast.
#
# Recommended options:
#
# isolatin1 ugly        if you have TeX 3.0 with DC/EC fonts or ML-TeX
# TeX ugly              if you don't have the above
# ftc ugly              if you are used to ftc and don't have the above
#
# h is not here.
consonants="b c d f g j k l m n p q r s t v w x y z"
# Open vowels: a e o plus all accented letters
vop="a e o"
# Closed vowels: i u plus diaeresis-u
vcl="i u"
# Groups that cannot be broken. Deleted tl.
legal="ch ll rr bl br cl cr dr fl fr gl gr kl kr pl pr tr"
isolatin1=0
ftc=0
TeX=0
ugly=0
hiatus=0
common=0
options="basic"
for i
do
        if [ $i = "ftc" ]
        then
                common=1
                ftc=1
                options="$options ftc"
        elif [ $i = "TeX" ]
        then
                common=1
                TeX=1
                options="$options TeX"
        elif [ $i = "isolatin1" ]
        then
                isolatin1=1
                options="$options isolatin1"
        elif [ $i = "ugly" ]
        then
                ugly=1
                options="$options ugly"
        elif [ $i = "hiatus" ]
        then
                hiatus=1
                options="$options hiatus"
        else
                echo -n Usage: `basename $0`
                echo " [TeX] [ftc] [isolatin1] [ugly] [hiatus]"
                exit 1
        fi
done
if [ $common -ne 0 ]
then
        vop="$vop 'a 'e 'i 'o 'u"
fi
if [ $ftc -ne 0 ]
then
        vcl="$vcl :u"
        consonants="$consonants 'n"
fi
if [ $TeX -ne 0 ]
then
        vcl="$vcl \"u"
        consonants="$consonants ~n"
fi
if [ $isolatin1 -ne 0 ]
then
        vop="$vop ^^e1 ^^e9 ^^ed ^^f3 ^^fa"
        vcl="$vcl ^^fc"
        consonants="$consonants ^^f1"
fi
vowels="$vop $vcl"
echo "\
% Hyphenation patterns for Spanish.
% Compiled by Julio Sanchez (address@hidden) on September 1991.
%
% These patterns have been derived from \"On Word Division in Spanish\",
% Jos'e A. Ma~nas, Communications of the ACM, and implemented in his
% package ftc. You can get ftc and a draft of the abovementioned
% paper from goya.dit.upm.es in src/text.proc/ftc.Z. FTP access may
% be available. Otherwise, send "help" to address@hidden for
% details on use of the mail server.
%
% Rules mentioned below are those described in that paper. After
% several unsatisfactory attempts to pretend I knew better, these 
% patterns closely follow that paper. Pattern 'tl' is not considered. 
% It is conflictive and ftc does not use it either.
%
% These patterns have been generated by shyphen.sh version $version, 
% shyphen.sh is a sh script that allows a number of choices. 
% Full benefit from some of these options can only be
% obtained if appropriate fonts are available.
%
% Follows a copyright notice. This is not in the public domain,
% but the copyright is essentially a hold-harmless clause. That
% is, use it at will, but don't sue me if you don't like it.
%
%                       COPYRIGHT NOTICE
%
% These patterns and the generating sh script are Copyright (c) GMV 1991
% These patterns were developed for internal GMV use and are made
% public in the hope that they will benefit others. Also, spreading
% these patterns throughout the Spanish-language TeX community is
% expected to provide back-benefits to GMV in that it can help keeping
% GMV in the mainstream of spanish users. However, this is given
% for free and WITHOUT ANY WARRANTY. Under no circumstances can Julio
% Sanchez, GMV, Jos'e A. Ma~nas or any agents or representatives thereof 
% be held responsible for any errors in this software nor for any damages
% derived from its use, even in case any of the above has been notified
% of the possibility of such damages. If any such situation arises, you
% responsible for repair. Use of this software is an explicit
% acceptance of these conditions. 
% 
% You can use this software for any purpose. You cannot delete this
% copyright notice. If you change this software, you must include 
% comments explaining who, when and why. You are kindly requested to 
% send any changes to address@hidden If you change the generating 
% script, you must include code in it such that any output is clearly
% labeled as generated by a modified script.
%
% Despite the lack of warranty, we would like to hear about any
% problem you find. Please report problems to address@hidden
%
%               END OF COPYRIGHT NOTICE
%
% Options included in this set: $options
% Open vowels: $vop
% Closed vowels: $vcl
% Consonants: $consonants
%
% Some of the patterns below represent combinations that never
% happen in Spanish. Would they happen, they would be hyphenated
% according to the rules."
echo
echo "\
% This keeps {cat|lc}code changes, if any, local. Nice to users of
% multilingual versions. These are the minimum changes needed to process
% the patterns. These and other changes will have to be re-enacted when
% Spanish be established as the current language. See the babel docs if
% you don't understand this.
\begingroup"
if [ $common -ne 0 ]
then
        echo "\catcode\`'=12 \lccode\`'=\`'"
fi
if [ $ftc -ne 0 ]
then
        echo "\catcode\`:=12 \lccode\`:=\`:"
fi
if [ $TeX -ne 0 ]
then
        echo "\catcode\`\"=12 \lccode\`\"=\`\""
        echo "\catcode\`~=12 \lccode\`~=\`~"
fi
if [ $isolatin1 -ne 0 ]
then
        echo "\
\catcode\`\^^e1=11 \lccode\`\^^e1=\`\^^e1    % 'a
\catcode\`\^^e9=11 \lccode\`\^^e9=\`\^^e9    % 'e
\catcode\`\^^ed=11 \lccode\`\^^ed=\`\^^ed    % 'i
\catcode\`\^^f1=11 \lccode\`\^^f1=\`\^^f1    % 'o
\catcode\`\^^f3=11 \lccode\`\^^f3=\`\^^f3    % ~n
\catcode\`\^^fa=11 \lccode\`\^^fa=\`\^^fa    % 'u
\catcode\`\^^fc=11 \lccode\`\^^fc=\`\^^fc    % \"u"
fi
echo "\
\patterns{
% Rule SR1
% Vowels are kept together by the defaults"
if [ $hiatus -ne 0 ]
then
echo "\
% We break here diphthongs and the like"
for i in $vop
do
        for j in $vop
        do
                echo -n ${i}1${j}" "
        done
        echo
        for j in $vop
        do
                echo -n ${i}1h${j}" "
        done
        echo
done
fi
echo "\
% Rule SR2
% Attach vowel groups to left consonant"
for i in $consonants
do
        for j in $vowels
        do
                echo -n 1${i}${j}" "
        done
        echo
done
echo "\
% Rule SR3
% Build legal consonant groups, leave other consonants bound to 
% the previous group. This overrides part of the SR2 pattern
% group."
for i in $legal
do
        set `echo $i | sed -e 's/^./& /'`
        for j in $vowels
        do
                echo -n 1${1}2${2}${j}" "
        done
        echo
done
echo "\
% Rule SR4 is implicitly implemented by the default values
% Rule HE1 is implemented by TeX parameters \lefthyphenmin and
% \righthyphenmin. Help yourself. The correct values for
% Spanish are 2 and 2. If you set them below these values,
% incorrect breaks will happen.
% Rule HE2
% Break between a consonant and an h"
for i in `echo $consonants | sed -e 's/c//'`
do
        echo -n ${i}1h" "
done
echo
echo "\
% We now avoid some problematic breaks.
su2b2r su2b2l"
if [ $ugly -ne 0 ]
then
echo "\
% These are included here to avoid ugly, though legal, breaks
% They were taken from the sphyphen.tex (silaba.tex) produced
% by Aurion Tecnologia and other sources.
2caca. 2cacas.
2caga. 2cagas.
2cago. 2cerdo
2cola. 2colas.
2culo. 2culos.
2cular.
2loco. 2locos. 2loca. 2locas.
2moco. 2mocos.
2mula. 2mulas.
2pedo. 2pedos. 2peda. 2pedas.
2pito. 2pitos.
2puto. 2putos. 2puta. 2putas.
.caca2"
fi
echo "}"
echo "\endgroup"

reply via email to

[Prev in Thread] Current Thread [Next in Thread]