Declare Character Encoding for files exported by TXM portals
Description
Currently, concordances and other results data files exported by TXM portals use UTF-8 encoding without BOM. Most web browsers don't recognize automatically the encoding and non ASCII characters are not displayed correctly.
The problem is reported in Firefox (125.0), Chrome (124) and Edge (124).
Firefox (125.0) inspector provides the following message :
L’encodage du document n’était pas déclaré et a donc été deviné à partir du contenu. Il est nécessaire de déclarer l’encodage des caractères dans l’entête HTTP Content-Type ou par l’usage de l’indicateur d’ordre des octets (BOM).
To Reproduce
Files tested with Firefox (125.0):
- broken - No BOM: https://txm-bfm.huma-num.fr/txm/files/test-no-bom.txt
- OK - with BOM: - https://txm-bfm.huma-num.fr/txm/files/test-bom.txt
Hypothesis
Adding a BOM mark could solve the issue.
Solutions
- a) declare character encoding in the Content-Type HTTP header
- content-type: text/html; charset=UTF-8
- b)
use a meta tag (not possible in a .txt) - c) use a byte order mark (BOM: https://en.wikipedia.org/wiki/Byte_order_mark)
-
EF BB BF
for UTF-8 -
FE FF
for UTF-16
-