This is the mail archive of the guile@cygnus.com mailing list for the guile project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Date: Sat, 18 Oct 1997 20:30:59 -0400 From: Jim Blandy <jimb@red-bean.com> Cc: Guile Discussion <guile@cygnus.com> Sender: owner-guile@cygnus.com Precedence: bulk X-UIDL: 4b452add811891c323b850fab666f543 Thus, my current inclinations: - Use 16-bit characters in strings throughout. That sounds clean. - Prescribe the use of Unicode throughout. - Provide functions to convert between Unicode character strings all other widely-used formats: UTF-8, UTF-7, Latin-1, and the JIS variants, as well as anything else people would like to contribute. - Provide a separate "byte array" type, for applications which genuinely want this. Byte arrays should be supported. In fact, I have implemented `byte' (Scheme source appended) as a datatype for my projects which need this size data object. The BYTE-type skirts the inconvenience of constantly converting between integers and chars. The need for byte-sized data is suprisingly widespread. It is a popular size for hardware registers. Bytes are the primary datatype in the WB b-tree implementation (Aubrey Jaffer, Jonathan Finger, and Roland Zito-Wolf). I even use bytes in JACAL in order to set the precedence of variable-names and to speed the sorting of complicated expressions. Going from a range of 256 to 65536 is too large a ratio to scale for many uses. In the b-trees, 16-bit data would not work well; all data packets have a byte length preceeding them, and it is important the packet sizes be much less than that of the four kilo-bytes disk blocks. We may implement the 16-bit character strings in odd ways that save space when the upper bytes of all the characters are zero, but that's a separate issue. Text buffers are objects different from strings; so buffers can take advantage of 8-bit characters without hairing up the string code. I think it reasonable to make this the only exception. -=-=-=- (define (byte-ref str ind) (char->integer (string-ref str ind))) (define (byte-set! str ind val) (string-set! str ind (integer->char val))) (define (make-bytes len . opt) (if (null? opt) (make-string len) (make-string len (integer->char (car opt))))) (define (write-byte byt . opt) (apply write-char (integer->char byt) opt)) (define (read-byte . opt) (let ((c (apply read-char opt))) (if (eof-object? c) c (char->integer c)))) (define (bytes . args) (list->bytes args)) (define (bytes->list bts) (map char->integer (string->list bts))) (define (list->bytes lst) (list->string (map integer->char lst)))