emacs-diffs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Emacs-diffs] emacs/doc/lispref nonascii.texi


From: Eli Zaretskii
Subject: [Emacs-diffs] emacs/doc/lispref nonascii.texi
Date: Sat, 29 Nov 2008 17:03:54 +0000

CVSROOT:        /cvsroot/emacs
Module name:    emacs
Changes by:     Eli Zaretskii <eliz>    08/11/29 17:03:54

Modified files:
        doc/lispref    : nonascii.texi 

Log message:
        (Character Properties): New Section.
        (Specifying Coding Systems): Document `coding-system-priority-list',
        `set-coding-system-priority', and `with-coding-priority'.
        (Lisp and Coding Systems): Document `check-coding-systems-region' and
        `coding-system-charset-list'.
        (Coding System Basics): Document `coding-system-aliases'.

CVSWeb URLs:
http://cvs.savannah.gnu.org/viewcvs/emacs/doc/lispref/nonascii.texi?cvsroot=emacs&r1=1.12&r2=1.13

Patches:
Index: nonascii.texi
===================================================================
RCS file: /cvsroot/emacs/emacs/doc/lispref/nonascii.texi,v
retrieving revision 1.12
retrieving revision 1.13
diff -u -b -r1.12 -r1.13
--- nonascii.texi       29 Nov 2008 12:18:14 -0000      1.12
+++ nonascii.texi       29 Nov 2008 17:03:54 -0000      1.13
@@ -19,6 +19,8 @@
 * Selecting a Representation::  Treating a byte sequence as unibyte or multi.
 * Character Codes::         How unibyte and multibyte relate to
                                 codes of individual characters.
+* Character Properties::    Character attributes that define their
+                                behavior and handling.
 * Character Sets::          The space of possible character codes
                                 is divided into various character sets.
 * Scanning Charsets::       Which character sets are used in a buffer?
@@ -344,6 +346,184 @@
 string instead of the current buffer.
 @end defun
 
address@hidden Character Properties
address@hidden Character Properties
address@hidden character properties
+A @dfn{character property} is a named attribute of a character that
+specifies how the character behaves and how it should be handled
+during text processing and display.  Thus, character properties are an
+important part of specifying the character's semantics.
+
+  Emacs generally follows the Unicode Standard in its implementation
+of character properties.  In particular, Emacs supports the
address@hidden://www.unicode.org/reports/tr23/, Unicode Character Property
+Model}, and the Emacs character property database is derived from the
+Unicode Character Database (@acronym{UCD}).  See the
address@hidden://www.unicode.org/versions/Unicode5.0.0/ch04.pdf, Character
+Properties chapter of the Unicode Standard}, for more details about
+Unicode character properties and their meaning.
+
+  The facilities documented in this section are useful for setting and
+retrieving properties of characters.
+
+  In Emacs, each property has a name, which is a symbol, and a set of
+possible values, whose types depend on the property.  Here's the full
+list of character properties that Emacs knows about:
+
address@hidden @code
address@hidden name
+The character's canonical unique name.  The value of the property is a
+string consisting of upper-case Latin letters A to Z, digits, spaces,
+and hyphen @samp{-} characters.
+
address@hidden general-category
+This property assigns the character to one of the major classes, such
+as letters, punctuation, and symbols, and its important subclasses.
+The value is a symbol whose name is a 2-letter abbreviation.  The
+first letter specifies the character's major class and the second
+letter designates a subclass of that major class.
+
address@hidden canonical-combining-class
+This property classifies combining characters into several classes,
+depending on the details of their behavior in sequences of combining
+characters.  The property's value is an integer number.
+
address@hidden bidi-class
+This property specifies character attributes required for correct
+display of @dfn{bidirectional text} used by right-to-left scripts,
+such as Arabic and Hebrew.  The value is a symbol whose name is the
+Unicode @dfn{directional type} of the character.
+
address@hidden decomposition
+This property defines a mapping from a character to a sequence of one
+or more characters that is a canonical or compatibility equivalent to
+it.  The value is a list, whose first element may be a symbol
+representing a compatibility formatting tag, such as @code{<small>};
+the other elements are characters that give the compatibility
+decomposition sequence.
+
address@hidden decimal-digit-value
+This property specifies a numeric value of characters that represent
+decimal digits.  The value is an integer number.
+
address@hidden digit
+This property specifies a numeric value of characters that represent
+digits, but not necessarily decimal.  Examples include compatibility
+subscript and superscript digits.  The value is an integer number.
+
address@hidden numeric-value
+This property specifies whether the character represents a number.
+Examples of characters that do include fractions, subscripts,
+superscripts, Roman numerals, currency numerators, and encircled
+numbers.  The value is a symbol whose name gives the numeric value;
+for example, the value of this property for the character
address@hidden (@sc{vulgar fraction one fifth}) is the symbol
address@hidden/5}.
+
address@hidden mirrored
+This is a property of characters such as parentheses, which need to be
+mirrored horizontally in right to left scripts.  The value is a
+symbol, either @samp{Y} or @samp{N}.
+
address@hidden old-name
+This property's value specifies the name, if any, of the character in
+the old version 1.0 of the Unicode Standard.  The value is a string.
+
address@hidden iso-10646-comment
+This character's comment field from the ISO 10646 standard.  The value
+is a string, or @code{nil} if there's no comment.
+
address@hidden uppercase
+If this character has an upper-case equivalent that is a single
+character, then the value of this property is that upper-case
+equivalent.  Otherwise, the value is @code{nil}.
+
address@hidden lowercase
+If this character has an lower-case equivalent that is a single
+character, then the value of this property is that lower-case
+equivalent.  Otherwise, the value is @code{nil}.
+
address@hidden titlecase
address@hidden case} is a special form of a character used when the first
+character of a word needs to be capitalized.  If a character has a
+title-case equivalent that is a single character, then the value of
+this property is that title-case equivalent.  Otherwise, the value is
address@hidden
address@hidden table
+
address@hidden get-char-code-property char propname
+This function returns the value of @var{char}'s @var{propname} property.
+
address@hidden
address@hidden
+(get-char-code-property ?  'general-category)
+     @result{} Zs
address@hidden group
address@hidden
+(get-char-code-property ?1  'general-category)
+     @result{} Nd
address@hidden group
address@hidden
+(get-char-code-property ?\u2084 'digit-value) ; subscript 4
+     @result{} 4
address@hidden group
address@hidden
+(get-char-code-property ?\u2155 'numeric-value) ; one fifth
+     @result{} 1/5
address@hidden group
address@hidden
+(get-char-code-property ?\u2163 'numeric-value) ; Roman IV
+     @result{} \4
address@hidden group
address@hidden example
address@hidden defun
+
address@hidden char-code-property-description prop value
+This function returns the description string of property @var{prop}'s
address@hidden, or @code{nil} if @var{value} has no description.
+
address@hidden
address@hidden
+(char-code-property-description 'general-category 'Zs)
+     @result{} "Separator, Space"
address@hidden group
address@hidden
+(char-code-property-description 'general-category 'Nd)
+     @result{} "Number, Decimal Digit"
address@hidden group
address@hidden
+(char-code-property-description 'numeric-value '1/5)
+     @result{} nil
address@hidden group
address@hidden example
address@hidden defun
+
address@hidden put-char-code-property char propname value
+This function stores @var{value} as the value of the property
address@hidden for the character @var{char}.
address@hidden defun
+
address@hidden char-script-table
+The value of this variable is a char-table (@pxref{Char-Tables}) that
+specifies, for each character, a symbol whose name is the script to
+which the character belongs, according to the Unicode Standard
+classification of the Unicode code space into script-specific blocks.
+This char-table has a single extra slot whose value is the list of all
+script symbols.
address@hidden defvar
+
address@hidden char-width-table
+The value of this variable is a char-table that specifies the width of
+each character in columns that it will occupy on the screen.
address@hidden defvar
+
address@hidden printable-chars
+The value of this variable is a char-table that specifies, for each
+character, whether it is printable or not.  That is, if evaluating
address@hidden(aref printable-chars char)} results in @code{t}, the character
+is printable, and if it results in @code{nil}, it is not.
address@hidden defvar
+
 @node Character Sets
 @section Character Sets
 @cindex character sets
@@ -692,6 +872,10 @@
 as an alias for the coding system.
 @end defun
 
address@hidden coding-system-aliases coding-system
+This function returns the list of aliases of @var{coding-system}.
address@hidden defun
+
 @node Encoding and I/O
 @subsection Encoding and I/O
 
@@ -865,6 +1049,22 @@
 encode all the character sets in the list @var{charsets}.
 @end defun
 
address@hidden check-coding-systems-region start end coding-system-list
+This function checks whether coding systems in the list
address@hidden can encode all the characters in the region
+between @var{start} and @var{end}.  If all of the coding systems in
+the list can encode the specified text, the function returns
address@hidden  If some coding systems cannot encode some of the
+characters, the value is an alist, each element of which has the form
address@hidden(@var{coding-system1} @var{pos1} @var{pos2} @dots{})}, meaning
+that @var{coding-system1} cannot encode characters at buffer positions
address@hidden, @var{pos2}, @enddots{}.
+
address@hidden may be a string, in which case @var{end} is ignored and
+the returned value references string indices instead of buffer
+positions.
address@hidden defun
+
 @defun detect-coding-region start end &optional highest
 This function chooses a plausible coding system for decoding the text
 from @var{start} to @var{end}.  This text should be a byte sequence,
@@ -888,6 +1088,26 @@
 operates on the contents of @var{string} instead of bytes in the buffer.
 @end defun
 
address@hidden coding-system-charset-list coding-system
+This function returns the list of character sets (@pxref{Character
+Sets}) supported by @var{coding-system}.  Some coding systems that
+support too many character sets to list them all yield special values:
address@hidden @bullet
address@hidden
+If @var{coding-system} supports all the ISO-2022 charsets, the value
+is @code{iso-2022}.
address@hidden
+If @var{coding-system} supports all Emacs characters, the value is
address@hidden(emacs)}.
address@hidden
+If @var{coding-system} supports all emacs-mule characters, the value
+is @code{emacs-mule}.
address@hidden
+If @var{coding-system} supports all Unicode characters, the value is
address@hidden(unicode)}.
address@hidden itemize
address@hidden defun
+
   @xref{Coding systems for a subprocess,, Process Information}, in
 particular the description of the functions
 @code{process-coding-system} and @code{set-process-coding-system}, for
@@ -1179,6 +1399,33 @@
 decoding functions (@pxref{Explicit Encoding}).
 @end defvar
 
address@hidden priority order of coding systems
address@hidden coding systems, priority
+  Sometimes, you need to prefer several coding systems for some
+operation, rather than fix a single one.  Emacs lets you specify a
+priority order for using coding systems.  This ordering affects the
+sorting of lists of coding sysems returned by functions such as
address@hidden (@pxref{Lisp and Coding Systems}).
+
address@hidden coding-system-priority-list &optional highestp
+This function returns the list of coding systems in the order of their
+current priorities.  Optional argument @var{highestp}, if
address@hidden, means return only the highest priority coding system.
address@hidden defun
+
address@hidden set-coding-system-priority &rest coding-systems
+This function puts @var{coding-systems} at the beginning of the
+priority list for coding systems, thus making their priority higher
+than all the rest.
address@hidden defun
+
address@hidden with-coding-priority coding-systems &rest address@hidden
+This macro execute @var{body}, like @code{progn} does
+(@pxref{Sequencing, progn}), with @var{coding-systems} at the front of
+the priority list for coding systems.  @var{coding-systems} should be
+a list of coding systems to prefer during execution of @var{body}.
address@hidden defmac
+
 @node Explicit Encoding
 @subsection Explicit Encoding and Decoding
 @cindex encoding in coding systems




reply via email to

[Prev in Thread] Current Thread [Next in Thread]