General Linux question about file & directory names

  • johne53
Posted: Sat, 09/06/2008 - 07:30
I've seen several Linux programmes (mostly written in C or C++) which offer multi-language support through the use of UTF-8. Therefore dialog messages etc can be displayed in any supported language. However, there often seems to be an inbuilt assumption that file system names (i.e. files and folders) will only ever use the standard English character set. I just wondered if there's an actual 'rule' about this? Can people in Germany or Greece or Japan use file and folder names with non-English characters?

Thanks for the links gabrbedd

  • johne53
  • 10/07/07
  • Sat, 09/06/2008 - 13:59
Thanks for the links gabrbedd but they don't entirely answer my question (mainly, because I didn't phrase it very well). Let me phrase it a bit better.... Unicode systems (such as UTF-8) typically employ either a fixed or variable number of bytes to represent a character. For example, Microsoft's version of Unicode uses 2 bytes per character. Under UTF-8, some characters are represented by just one byte; some characters by two bytes and some by three bytes. However, whenever I've seen code for reading file names, there seems to be an implicit assumption that the names will have one byte per character. I was wrong to call this the "English" character set because of course, different locales can use whatever character set they like. However, for file name handling there seems to be an assumption that one character will be represented by one byte. I just wondered whether this was a universal rule - or am I just looking at badly written programs?


  • gabrbedd
  • 06/29/08
  • Sat, 09/06/2008 - 12:07
You can have non-ascii characters in file names. Some filesystems don't support it, but ext2 and ext3 (default for Linux) supports everything but a NULL character and a forward slash (/).

BTW, I realise this isn't

  • johne53
  • 10/07/07
  • Sat, 09/06/2008 - 07:31
BTW, I realise this isn't the right place to ask this sort of question but it's something I often wonder about and I figured there might be somebody here who's got some insight into it.