This is the mail archive of the guile@cygnus.com mailing list for the Guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

New shared substring implementation


Lately, i was wondering why there was the need for (explicitely)
shared substrings at all.
As hinted in guile-ref, a copy-on-write strategy might make
make-shared-substring useless, since normal substring could
return a shared substring which will be copied on the first
write. This would add a simple check on each string mutation
function, but could increase performance drastically otherwise
and retain a very simple interface.

I liked the idea and wanted to implement it in guile. Taking a
look at strings.[ch] (and symbol.h), i noticed that the length of
a string is encoded in it's SCM object, while the SCM obj's CDR
has to point to the char* directly.

If i wanted to change this, i had to change the implementation of
symbols as well (and probably take in alot of incompatibilities
with other parts of guile that use strings)

I assume that such pointer plays - negative indices to store
information, etc. - are common in guile. One question arises:
why?

Is (SCM_CAR(val)>>8) that much faster than SCM_CDR(val)->length?
Is SCM_CDR(val)[-1] that much faster than SCM_CDR(val)->something?

Well, my thoughts about the copy-on-write idea stumbled on one
problem.
(define str1 "foo")		   ; a string is allocated as usual
(define str2 (substring str1 0 1)) ; str2 is now a shared
			           ; substring of str1
(string-set! str1 0 #\b)	   ; since str1 is not a shared
				   ; substring, it's not copied,
				   ; and thus str2 => "b"

A solution to this problem would be to have (substring) mark the
string it's applied to to be a "substring" as well. The problem
then is that:
(define str1 "foo")
(define str2 (substring str1 2 3)) ; str1 and str2 are marked as
			           ; shared substrings
(string-set! str1 0 #\b)	   ; str1 now points to a newly
				   ; malloc'd copy of the string,
				   ; and str2 points to the
				   ; original string[2]
(string-set! str2 0 #\b)	   ; str2 is now a copy of the
				   ; string as well, and there's
				   ; no pointer to the original
				   ; string left :]

Does anyone have an idea on how to solve this elegantly?
(one *could* use a reference count there. I don't see any
possibility of how a circular dependency might come up there)
        -forcer

-- 
((email . "forcer@mindless.com")       (www . "http://forcix.cx/")
 (irc   . "forcer@#linux.de (IRCnet)") (gpg . "/other/forcer.gpg"))

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]