On sbcl:
(length "â³â³â³")
3
On lispworks's UI:
(length "â³â³â³")
3
On lispworks' console:
(length "â³â³â³")
9
What is the proper way to make lispworks' console's default external format to utf-8, as it is in its gui?
Note: "â³" is ninth letter of the Greek alphabet aka "iota". Lispworks on linux, using xterm; setting LC_CTYPE=en_US.UTF-8 doesn't change the behavior.
/dev/stdin utf-8
-
- Posts: 166
- Joined: Sun Nov 28, 2010 4:21 pm
Re: /dev/stdin utf-8
I've never used lispworks, but I was bored, so here I go!
It's probably worth confirming that your string is getting raw utf8 bytes. According to babel:
try:
You may be able to use your init file[1] to change the default external format[2]
There is also FLI:SET-LOCALE[3]
And if all else fails I'd just use babel's OCTETS-TO-STRING
[1] http://www.lispworks.com/documentation/ ... fId-890282
[2] http://www.lispworks.com/documentation/ ... fId-889817
[3] http://www.lispworks.com/documentation/ ... fId-888827
It's probably worth confirming that your string is getting raw utf8 bytes. According to babel:
Code: Select all
cl-user>(babel:string-to-octets "â³")
#(226 141 179)
Code: Select all
(map 'vector #'char-code s)
There is also FLI:SET-LOCALE[3]
And if all else fails I'd just use babel's OCTETS-TO-STRING
[1] http://www.lispworks.com/documentation/ ... fId-890282
[2] http://www.lispworks.com/documentation/ ... fId-889817
[3] http://www.lispworks.com/documentation/ ... fId-888827
Re: /dev/stdin utf-8
I understand, but here is the behavior on lispworks's console vs sbcl:
LW:
(ql:quickload :babel)
[...output omited...]
(fli:set-locale)
"en_US.UTF-8"
(babel:string-to-octets "â³")
#(195 162 194 141 194 179)
(length (babel:string-to-octets "â³"))
6
(describe "â³")
"â³" is a SIMPLE-BASE-STRING
0 #�
1 #U+008D
2 #�
sbcl:
(ql:quickload :babel)
[...output omited...]
(babel:string-to-octets "â³")
#(226 141 179)
(length (babel:string-to-octets "â³"))
3
(describe "â³")
"â³"
[simple-string]
Element-type: CHARACTER
Length: 1
How could I possibly make LW output 3?
LW:
(ql:quickload :babel)
[...output omited...]
(fli:set-locale)
"en_US.UTF-8"
(babel:string-to-octets "â³")
#(195 162 194 141 194 179)
(length (babel:string-to-octets "â³"))
6
(describe "â³")
"â³" is a SIMPLE-BASE-STRING
0 #�
1 #U+008D
2 #�
sbcl:
(ql:quickload :babel)
[...output omited...]
(babel:string-to-octets "â³")
#(226 141 179)
(length (babel:string-to-octets "â³"))
3
(describe "â³")
"â³"
[simple-string]
Element-type: CHARACTER
Length: 1
How could I possibly make LW output 3?
-
- Posts: 166
- Joined: Sun Nov 28, 2010 4:21 pm
Re: /dev/stdin utf-8
I apologise for taking so long to reply, but you are getting the support you paid for
Firstly I must point out that you didn't do what I asked - post the result of:
But what you did post mostly confirmed my suspicions:
Firstly I must point out that you didn't do what I asked - post the result of:
Code: Select all
(map 'vector #'char-code s)
- A SIMPLE-BASE-STRING is a BASE-STRING
- A BASE-STRING contains BASE-CHARACTERs
- A BASE-CHARACTER is 8-bit
- -> multi-byte unicode code points can't fit in one BASE-CHARACTER
Code: Select all
(defun lispworks-is-dumb (s)
(if (typep s 'base-string)
(babel:octets-to-string (map 'vector #'char-code s))
s))