The attached patch makes sure that non-Unicode characters are replaced with U+FFFD REPLACEMENT CHARACTER instead of a space when converting to UTF-16. (The space is from all evidence a historical accident.) This change is required for one possible solution of bug#42904. We can do without this patch, but it fixes a clear bug. For some reason, unpaired surrogates aren't affected despite not being encodable in UTF-16 -- another bug, but not one addressed here.