octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #55195] "help" for core functions contains odd


From: Markus Mützel
Subject: [Octave-bug-tracker] [bug #55195] "help" for core functions contains odd symbols for non-ASCII characters
Date: Mon, 10 Dec 2018 12:48:55 -0500 (EST)
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:64.0) Gecko/20100101 Firefox/64.0

URL:
  <https://savannah.gnu.org/bugs/?55195>

                 Summary: "help" for core functions contains odd symbols for
non-ASCII characters
                 Project: GNU Octave
            Submitted by: mmuetzel
            Submitted on: Mon 10 Dec 2018 05:48:53 PM UTC
                Category: Interpreter
                Severity: 4 - Important
                Priority: 5 - Normal
              Item Group: Regression
                  Status: None
             Assigned to: None
         Originator Name: 
        Originator Email: 
             Open/Closed: Open
         Discussion Lock: Any
                 Release: dev
        Operating System: Any

    _______________________________________________________

Details:

TL;DR:
Character encoding strikes again. Does the lexer keep track of whether .m
files are from core?



When Octave is configured to use an mfile_encoding other than UTF-8, help text
of function files that are encoded in UTF-8 is displayed with odd characters.
On Windows, this happens with Octave's default settings. Other systems aren't
affected by default (but only if the user configures to use a different
encoding).

E.g.: "help sym" displays a lot of scrambled characters. That is because that
file is encoded in UTF-8 but we assume it to be encoded in the configured
mfile_encoding. Converting it from SYSTEM (CP1252 in my case) to UTF-8 creates
these odd characters.

This is a regression. (Before, we didn't worry about encoding but had problems
handling string vectors from user functions or interacting with the file
system.)

That conversion is done in input.cc in function "file_reader::get_input".
Can we differentiate between .m files from the core or packages (which
probably always are UTF-8) on the one hand and user created .m files (which
could have any encoding) on the other hand at that point? Does the lexer keep
track of this?

What about texinfo settings such as "@documentencoding UTF-8"? Should we parse
for them and do the conversion only conditionally?
Should we skip the help text in the conversion completely? In that case, we
might have to move the conversion elsewhere (to the lexer?).

Alternatively, we could revert the conversion in "help.m" (only if we discover
an "@documentencoding" command?) for functions from the core or from
packages.

But text in strings in functions from core Octave or from packages are
probably encoded in UTF-8 as well (independent from the current
mfile_encoding). So we shouldn't convert functions from core or packages at
all and only do the codepage conversion on user functions.

This might also affect how we should open function files from core Octave (or
from packages) in the embedded editor.





    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?55195>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]