This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 08/18] malloc: Add specialized dynarray for C strings



On 17/08/2017 07:12, Florian Weimer wrote:
> On 08/11/2017 04:50 PM, Adhemerval Zanella wrote:
>> This patch adds an specialized dynarray to manage C strings using the
>> dynarray internal implementation.  It uses some private fields from
>> dynarray and thus it provided specific files to access and manage
>> the internal string buffer.
> 
> There is a lot of complexity in this code to maintain the invariant that
> the stored array is NUL-terminated.  It also means that the dynarray
> functions must not be used directly.

Not really, however some won't follow the NUL-terminated semantic (I use
the provided free function for instance).  Unfortunately we do not have
access modifiers as for c++ to avoid issues calling these functions, but
since it is an internal API I think we can live with it. 

> 
> std::string in C++ has a c_str() method which adds the null termination
> on demand.  This means that char_array_str could fail due to memory
> allocation, but I think it would simplify the code in general.  I tried
> to write the dynarray functions in such a way that errors are sticky, so
> one error check (char_array_str returns NULL) could cover all the
> previous manipulations.

That is not my understanding checking on libstd++v3 code:

include/bits/basic_string.h:

2222       const _CharT*
2223       c_str() const _GLIBCXX_NOEXCEPT
2224       { return _M_data(); }

 158       pointer
 159       _M_data() const
 160       { return _M_dataplus._M_p; }

AFAIK current std::string does not add the NUL on demand, but rather keep
the internal buffer NUL-terminated.  Some basic checking with gdb does
hold it true:

(gdb) l
3
4       int main ()
5       {
6         std::string str("test");
7
8         printf ("%s\n", str.c_str());
9         return 0;
10      }
(gdb) p str
$10 = {static npos = 18446744073709551615, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p = 0x7fffffffd6d0 "test"}, 
  _M_string_length = 4, {_M_local_buf = "test", '\000' <repeats 11 times>, _M_allocated_capacity = 1953719668}}
(gdb) x/6c str._M_dataplus._M_p
0x7fffffffd6d0: 116 't' 101 'e' 115 's' 116 't' 0 '\000'        0 '\000'

Also, c++ does not try to use the same API for both containers and string
because they do differ exactly regarding NUL-termination and C-string
handling.  And I think this make sense to provide different ABI to 
different data types and hiding all the required internal handling.

Also, it will still require some char_array wrappers to hiding the C-string
manipulation (append/prepend/initialization) and it will add complications
if char_array_str adds an emplace element at the end. I assume it will 
add the element in place instead of returning a copy (since the idea is 
to actually have inplace data for char_array manipulation), so all other 
functions will need to actually check if the char_array buffer has the 
NULL-byte before manipulating. It only adds complexity and more possible
code patch for internal buffer manipulation.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]