TransWikia.com

Same locale, different distrib, inconsistent behavior

Unix & Linux Asked by ssssteffff on November 21, 2021

I migrate a system from a CentOS 6.9 VM to a Debian 10 Docker container, and I can’t explain why thousands separator differs. Same locale (fr_FR.UTF-8), same version of the locale yet different separator:

CentOS 6.9 VM:

[user@host ~]$ cat /etc/redhat-release
CentOS release 6.9 (Final)

[user@host ~]$ locale -v -a fr_FR.UTF-8 | grep -A10 fr_FR.utf8
locale: fr_FR.utf8      archive: /usr/lib/locale/locale-archive
-------------------------------------------------------------------------------
    title | French locale for France
   source | RAP
  contact | Traduc.org
    email | [email protected]
 language | French
territory | France
 revision | 1.0
     date | 2008-03-15
  codeset | UTF-8

[user@host ~]$ yum list installed | grep libc
glibc.x86_64                         2.12-1.209.el6_9.2                @updates 
glibc-common.x86_64                  2.12-1.209.el6_9.2                @updates 
glibc-devel.x86_64                   2.12-1.209.el6_9.2                @updates 
glibc-headers.x86_64                 2.12-1.209.el6_9.2                @updates
[...]

[user@host ~]$ grep "thousands_sep" /usr/share/i18n/locales/fr_FR
mon_thousands_sep         "<U0020>"
thousands_sep             "<U0020>"

[user@host ~]$ LC_NUMERIC="fr_FR" printf "%'.fn" 1234 | hexdump -C
00000000  31 20 32 33 34 0a                                 |1 234.|
00000006

Debian 10 container:

root@240c7f7ca3a1:~# cat /etc/issue.net
Debian GNU/Linux 10

root@240c7f7ca3a1:~# locale -v -a fr_FR.UTF-8 | grep -A10 fr_FR.utf8
locale: fr_FR.utf8      archive: /usr/lib/locale/locale-archive
-------------------------------------------------------------------------------
    title | French locale for France
   source | RAP
  contact | Traduc.org
    email | [email protected]
 language | French
territory | France
 revision | 1.0
     date | 2008-03-15
  codeset | UTF-8

root@240c7f7ca3a1:~# apt list --installed | grep libc    
libc-bin/stable,now 2.28-10 amd64  [installé, automatique]
libc-l10n/stable,now 2.28-10 all  [installé, automatique]
libc6/stable,now 2.28-10 amd64  [installé]
[...]

root@240c7f7ca3a1:~# grep "thousands_sep" /usr/share/i18n/locales/fr_FR
mon_thousands_sep         "<U202F>"
thousands_sep             "<U202F>"

root@240c7f7ca3a1:~# LC_NUMERIC="fr_FR" printf "%'.fn" 1234 | hexdump -C
00000000  31 e2 80 af 32 33 34 0a                           |1...234.|
00000008

As you can see, in the first case I get a normal space (<U0020> / 20), while in the latter case I get a narrow non breakable space (<U202F> / e2 80 af).

I understand that the NNBSP is the legit character for french locale (according to several source including Wikipedia), but this changes my application behavior when PDF reports are getting generated (this character does not exist in every font).

I see a lot of debates about what character it should be on GNU/Glibc/JDK mailing lists, but can’t find where it’s been changed in Glibc changelog.

I could simply replace all NNBSP with standard space (or simple NBSP) directly in my code to fix the application, but this seems a bit messy to me.
I guess I can modify locale file and recompile it ?
Is there a better solution?

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP