Missing file from cygwin's catdoc
Reini Urban
rurban@x-ray.at
Thu Oct 2 11:50:00 GMT 2008
2008/10/1 Reini Urban:
> 2008/10/1 Bernt Røskar Brenna:
>> I believe that the Cygwin package catdoc is missing the file /etc/catdocrc
>>
>> Without the settings in the file, running catdoc against Word
>> documents with Norwegian characters produces strange results.
>>
>> I have used the following /etc/catdocrc:
>> charset_path=/usr/share/catdoc
>> map_path=/usr/share/catdoc
>> source_charset=cp1252
>> target_charset=8859-1
>> unknown_char='?'
>
> Thanks for this info.
>
>> There is another quite strange matter:
>> 'catdoc test_catdoc.doc' and 'catdoc -d8859-1 test_catdoc.doc'
>> produces different results (with the config file above, that has
>> 8859-1 as default). How is that possible?
>
> Because the defaults are insane.
> in: cp1251 out: koi8-r
>
> ----- version 0.94.2-2 -----
> * Added --with-input=cp1252 --with-output=8859-1
> Was cp1251 to koi8-r as default
> * Added /etc/catdocrc (thanks to Bernt Røskar Brenna)
>
>> $ catdoc test_catdoc.doc
>> aeoa
>>
>> $ catdoc -d8859-1 test_catdoc.doc
>> æøå
I was wrong before. The source charset is almost always unicode
and the target charset is falsely detected on cygwin as US-ASCII.
You get "aeoa" because "æøå" translated to US-ASCII is "aeoa".
You can override this with
use_locale=no
in the catdocrc.
> I'm just having problems with the new cygport or autoreconfig,
> so it doesn't build yet.
> I hope I can fix it soon.
I also found the build problem and fixed it.
The new release will come this evening.
--
Reini Urban
http://phpwiki.org/ http://murbreak.at/
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Problem reports: http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/
More information about the Cygwin
mailing list