I think that wcscmp must work correctly for 4 - unaligned cases as well. For example: char *a; wchar_t *ptr=(wchar_t *)(a+1); New assembler implementation I will submit supports these cases.