Page 1 of 1

/dev/stdin utf-8

PostPosted: Sun Jan 14, 2018 11:21 pm
by genera
On sbcl:
(length "⍳⍳⍳")
3

On lispworks's UI:
(length "⍳⍳⍳")
3

On lispworks' console:
(length "⍳⍳⍳")
9

What is the proper way to make lispworks' console's default external format to utf-8, as it is in its gui?

Note: "⍳" is ninth letter of the Greek alphabet aka "iota". Lispworks on linux, using xterm; setting LC_CTYPE=en_US.UTF-8 doesn't change the behavior.

Re: /dev/stdin utf-8

PostPosted: Wed Jan 17, 2018 4:36 am
by pjstirling
I've never used lispworks, but I was bored, so here I go!

It's probably worth confirming that your string is getting raw utf8 bytes. According to babel:

Code: Select all
cl-user>(babel:string-to-octets "⍳")
#(226 141 179)


try:

Code: Select all
(map 'vector #'char-code s)


You may be able to use your init file[1] to change the default external format[2]

There is also FLI:SET-LOCALE[3]

And if all else fails I'd just use babel's OCTETS-TO-STRING

[1] http://www.lispworks.com/documentation/ ... fId-890282
[2] http://www.lispworks.com/documentation/ ... fId-889817
[3] http://www.lispworks.com/documentation/ ... fId-888827

Re: /dev/stdin utf-8

PostPosted: Wed Jan 17, 2018 12:51 pm
by genera
I understand, but here is the behavior on lispworks's console vs sbcl:

LW:
(ql:quickload :babel)
[...output omited...]
(fli:set-locale)
"en_US.UTF-8"
(babel:string-to-octets "⍳")
#(195 162 194 141 194 179)
(length (babel:string-to-octets "⍳"))
6
(describe "⍳")
"⍳" is a SIMPLE-BASE-STRING
0 #�
1 #U+008D
2 #�

sbcl:
(ql:quickload :babel)
[...output omited...]
(babel:string-to-octets "⍳")
#(226 141 179)
(length (babel:string-to-octets "⍳"))
3
(describe "⍳")
"⍳"
[simple-string]
Element-type: CHARACTER
Length: 1


How could I possibly make LW output 3?

Re: /dev/stdin utf-8

PostPosted: Mon Jan 22, 2018 5:22 am
by pjstirling
I apologise for taking so long to reply, but you are getting the support you paid for :)

Firstly I must point out that you didn't do what I asked - post the result of:

Code: Select all
(map 'vector #'char-code s)


But what you did post mostly confirmed my suspicions:

  • A SIMPLE-BASE-STRING is a BASE-STRING
  • A BASE-STRING contains BASE-CHARACTERs
  • A BASE-CHARACTER is 8-bit
  • -> multi-byte unicode code points can't fit in one BASE-CHARACTER

The last thing that it would confirm is that lispworks is keeping your string as utf-8, if that's true, then you can do (modulo lispworks having completely neutered unicode support somehow):

Code: Select all
(defun lispworks-is-dumb (s)
  (if (typep s 'base-string)
      (babel:octets-to-string (map 'vector #'char-code s))
      s))