chicken-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: cannot open file if name contains accented characters


From: Mátyás Seress
Subject: Re: cannot open file if name contains accented characters
Date: Tue, 26 Dec 2023 16:40:52 +0100

Thanks for the responses. I tried what I could, but it still doesn't work. I wrote some code to test if I can open and close a file. The C code works but the Chicken code doesn't.

---------- test.c ----------

#include <stdio.h>
#include <wchar.h>

// this file needs to be saved with explicit BOM (byte order mark) otherwise it won't work
int main()
{
    FILE* fileHandle = _wfopen(L"c:\\temp\\íűőúöüóéá.txt", L"r");
    printf("File has been opened at: %p\n", fileHandle);
    printf("Closing file\n");
    fclose(fileHandle);
    return 0;
}

---------- test.scm ---------- 

(import (chicken foreign))

(foreign-declare "#include <stdio.h>")
(foreign-declare "#include <wchar.h>")

(define chicken_wfopen
    (foreign-lambda (c-pointer "FILE") "_wfopen" (c-pointer "wchar_t") (c-pointer "wchar_t")))
(define chicken_fclose
    (foreign-lambda int "fclose" (c-pointer "FILE")))

(let ([file-handle (chicken_wfopen "c:\\temp\\íűőúöüóéá.txt" "r")])
    (print "File has been opened at: " (number->string file-handle))
    (print "Closing file.")
    (chicken_fclose file-handle))

The Chicken code fails with the error message: Error: unbound variable: 
It doesn't say the name of the variable, and it returns the error code 70.
Note that in C the string needs to be prefixed with an L to make it wide character. In Chicken I don't know how to do that.
Does anybody have a clue which variable is unbound?

Op ma 25 dec 2023 om 20:19 schreef John Cowan <cowan@ccil.org>:


On Mon, Dec 25, 2023 at 6:07 AM <felix.winkelmann@bevuta.com> wrote:
 
I'm not too familiar with the way Windows handles non-ASCII characters
in operating system calls, but I assume that what gets passed to the C
library runtime functions like fopen(3), etc. assumes a particular encoding.

Basically, there are two modes, one that assumes a particular encoding, as you say (that's the default) and one that assumes wchar_t, which is always UTF-16LE.  Which encoding is used in the first mode depends on the locale setting.

From a quick glance at the Windows docs[1] it seems one needs to use
"_fwopen" with a wchar_t string argument to pass extended characters.

Indeed, except that it's _wfopen, not _fwopen. Note that _fopen can involve 8-bit, 16-bit, or 8/16-bit mode depending on the encoding.

Sorry, if this is not overly helpful. We are currently in the process of improving
the unicode support for the next major version of CHICKEN.

This makes me realize that posixwin needs to be changed in C6 so that it always uses the second mode.  A simple way to do this is to use a UTF-8 to UTF-16BE converter (and vice versa for things like dirread) right before calling _fwopen.



felix



reply via email to

[Prev in Thread] Current Thread [Next in Thread]