This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH 08/18] malloc: Add specialized dynarray for C strings
On 17/08/2017 07:12, Florian Weimer wrote:
> On 08/11/2017 04:50 PM, Adhemerval Zanella wrote:
>> This patch adds an specialized dynarray to manage C strings using the
>> dynarray internal implementation. It uses some private fields from
>> dynarray and thus it provided specific files to access and manage
>> the internal string buffer.
>
> There is a lot of complexity in this code to maintain the invariant that
> the stored array is NUL-terminated. It also means that the dynarray
> functions must not be used directly.
Not really, however some won't follow the NUL-terminated semantic (I use
the provided free function for instance). Unfortunately we do not have
access modifiers as for c++ to avoid issues calling these functions, but
since it is an internal API I think we can live with it.
>
> std::string in C++ has a c_str() method which adds the null termination
> on demand. This means that char_array_str could fail due to memory
> allocation, but I think it would simplify the code in general. I tried
> to write the dynarray functions in such a way that errors are sticky, so
> one error check (char_array_str returns NULL) could cover all the
> previous manipulations.
That is not my understanding checking on libstd++v3 code:
include/bits/basic_string.h:
2222 const _CharT*
2223 c_str() const _GLIBCXX_NOEXCEPT
2224 { return _M_data(); }
158 pointer
159 _M_data() const
160 { return _M_dataplus._M_p; }
AFAIK current std::string does not add the NUL on demand, but rather keep
the internal buffer NUL-terminated. Some basic checking with gdb does
hold it true:
(gdb) l
3
4 int main ()
5 {
6 std::string str("test");
7
8 printf ("%s\n", str.c_str());
9 return 0;
10 }
(gdb) p str
$10 = {static npos = 18446744073709551615, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p = 0x7fffffffd6d0 "test"},
_M_string_length = 4, {_M_local_buf = "test", '\000' <repeats 11 times>, _M_allocated_capacity = 1953719668}}
(gdb) x/6c str._M_dataplus._M_p
0x7fffffffd6d0: 116 't' 101 'e' 115 's' 116 't' 0 '\000' 0 '\000'
Also, c++ does not try to use the same API for both containers and string
because they do differ exactly regarding NUL-termination and C-string
handling. And I think this make sense to provide different ABI to
different data types and hiding all the required internal handling.
Also, it will still require some char_array wrappers to hiding the C-string
manipulation (append/prepend/initialization) and it will add complications
if char_array_str adds an emplace element at the end. I assume it will
add the element in place instead of returning a copy (since the idea is
to actually have inplace data for char_array manipulation), so all other
functions will need to actually check if the char_array buffer has the
NULL-byte before manipulating. It only adds complexity and more possible
code patch for internal buffer manipulation.