This is the mail archive of the guile@cygnus.com mailing list for the guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: i18n; wide characters; Guile


   Date: Sat, 18 Oct 1997 20:30:59 -0400
   From: Jim Blandy <jimb@red-bean.com>
   Cc: Guile Discussion <guile@cygnus.com>
   Sender: owner-guile@cygnus.com
   Precedence: bulk
   X-UIDL: 4b452add811891c323b850fab666f543

   Thus, my current inclinations:
   - Use 16-bit characters in strings throughout.

That sounds clean.

   - Prescribe the use of Unicode throughout.
   - Provide functions to convert between Unicode character strings
     all other widely-used formats: UTF-8, UTF-7, Latin-1, and the JIS
     variants, as well as anything else people would like to contribute.
   - Provide a separate "byte array" type, for applications which
     genuinely want this.

Byte arrays should be supported.  In fact, I have implemented `byte'
(Scheme source appended) as a datatype for my projects which need this
size data object.  The BYTE-type skirts the inconvenience of
constantly converting between integers and chars.

The need for byte-sized data is suprisingly widespread.  It is a
popular size for hardware registers.  Bytes are the primary datatype
in the WB b-tree implementation (Aubrey Jaffer, Jonathan Finger, and
Roland Zito-Wolf).  I even use bytes in JACAL in order to set the
precedence of variable-names and to speed the sorting of complicated
expressions.

Going from a range of 256 to 65536 is too large a ratio to scale for
many uses.  In the b-trees, 16-bit data would not work well; all data
packets have a byte length preceeding them, and it is important the
packet sizes be much less than that of the four kilo-bytes disk
blocks.

   We may implement the 16-bit character strings in odd ways that save
   space when the upper bytes of all the characters are zero, but that's
   a separate issue.

Text buffers are objects different from strings; so buffers can take
advantage of 8-bit characters without hairing up the string code.  I
think it reasonable to make this the only exception.

			       -=-=-=-

(define (byte-ref str ind) (char->integer (string-ref str ind)))
(define (byte-set! str ind val) (string-set! str ind (integer->char val)))
(define (make-bytes len . opt)
  (if (null? opt) (make-string len)
      (make-string len (integer->char (car opt)))))
(define (write-byte byt . opt) (apply write-char (integer->char byt) opt))
(define (read-byte . opt)
  (let ((c (apply read-char opt)))
    (if (eof-object? c) c (char->integer c))))
(define (bytes . args) (list->bytes args))
(define (bytes->list bts) (map char->integer (string->list bts)))
(define (list->bytes lst) (list->string (map integer->char lst)))