Bug 22908 - iconv(3) fails on utf8->ascii with ILSEQ on valid seq
Summary: iconv(3) fails on utf8->ascii with ILSEQ on valid seq
Status: RESOLVED INVALID
Alias: None
Product: glibc
Classification: Unclassified
Component: locale (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-03-01 18:17 UTC by Steffen Nurpmeso
Modified: 2018-03-31 12:32 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Steffen Nurpmeso 2018-03-01 18:17:44 UTC

    
Comment 1 Steffen Nurpmeso 2018-03-01 18:21:06 UTC
oh, sorry, hit RETURN to get rid of all the autocompletions.
The following program fails with

 $/tmp/zt
 Converting 4 <aäc>
 Fail <Invalid or incomplete multibyte or wide character>

whereas musl gives

 $./zt
 Converting 4 <aäc>
 GOT <a*c>

I.e., i would expect replacement to happen, but instead (size_t)-1 is returned.

#include <stdio.h> 
#include <string.h>
#include <iconv.h>
#include <errno.h>
int main(void){
   char inb[16], oub[16], *inbp, *oubp;
   iconv_t id;
   size_t inl, oul;

   memcpy(inbp = inb, "a\303\244c", sizeof("a\303\244c"));
   inl = sizeof("a\303\244c") -1;
   oul = sizeof oub;
   oubp = oub;

   if((id = iconv_open("ascii", "utf8")) == (iconv_t)-1)
     return 1;
   fprintf(stderr, "Converting %lu <%s>\n",(unsigned long)inl, inbp);
   if(iconv(id, &inbp, &inl, &oubp, &oul) == (size_t)-1){
      fprintf(stderr, "Fail <%s>\n", strerror(errno));
      return 2;
   }  
   fprintf(stderr, "GOT <%s>\n", oub);
   iconv_close(id);
   return 0;
}
Comment 2 Florian Weimer 2018-03-03 16:32:32 UTC
Have you tried to convert using //TRANSLIT?
Comment 3 Steffen Nurpmeso 2018-03-03 21:27:08 UTC
No.  (Why, the input is valid and complete, no error should occur thus?  But 
likely you are looking at the actual libc code...)

I am in fact anything but a fan of these // modifiers, because iconv(3) is already a blackbox and therefore programs have no chance to perform cleanup on character set names; in effect they anything found in kinds of charset="" parameters "as-is" when calling iconv_open() (sic!).
Protection against to be implemented one day, search for slash and terminate, or so.

iconv should instead offer interfaces for character set name normalization, offer access to the charsets actually used by the successfully iconv_open()ed object, replacement character configuration, etc. etc., all via an explicit function.  But it cannot be helped.
A nice Sunday i wish.
Comment 4 Florian Weimer 2018-03-05 09:22:04 UTC
With

   setlocale( LC_ALL, "");

and

   if((id = iconv_open("ascii//TRANSLIT", "utf8")) == (iconv_t)-1)

transliteration occurs, as expected.

(Note that transliteration is locale-dependent, so you need to configure a locale.)