KeePass Help Center KeePass Home | Downloads | Translations | Plugins | Donate 
Help Center Home | Forums | Awards | Links 







Console Character Encoding

Information on character encodings in command line windows.


Symptoms

In the output of a command line program (like e.g. KPScript), some special characters are displayed correctly and some are not.

When redirecting the output to a TXT file using the '>' console operator and opening the TXT file in a text editor, special characters appear to be garbled.

These problems usually appear only under Windows, not Linux (because Linux usually uses UTF-8 encoding).


Cause

The Windows command line window by default uses OEM code pages. For general information about code pages, see Wikipedia: Code page. In the US, code page 437 is used; in Western Europe, code page 850 is used; etc. A detailed list can be found here: Code Page Identifiers.

These OEM code pages do not support all characters. They include a small subset of foreign characters (e.g. the US code page 437 includes some Greek characters), thus some special characters display properly in a command line window. Characters that are rarely used cannot be encoded though.

When redirecting the output to a TXT file using the '>' console operator, the TXT file uses the same encoding as the command line window. Text editors usually do not expect OEM code pages and thus render special characters improperly (even the ones that display properly in the command line window).

Finding your code page. You can find out which code page your command line window is using by typing 'Chcp' (without any parameters).


Weak solution: Good text editor

The most simple solution is to tell the text editor which code page the TXT file is using. All characters supported by the code page are then loaded/displayed properly.

The disadvantage of this solution is that characters outside the console code page are lost.

Every advanced text editor supports selecting the code page; some examples:

  • PSPad. In the 'Format' menu, click 'OEM', then open the TXT file (it is important to select 'OEM' before opening the TXT file).
  • Notepad2. Open the TXT file and choose the code page under 'File' → 'Encoding' → 'Recode'.
  • Notepad++. Open the TXT file and choose the code page under 'Encoding' → 'Character Sets'.
  • Microsoft Visual Studio. Go 'File' → 'Open' → 'File', select the TXT file, click the drop-down arrow right of the 'Open' button in the file selection dialog, choose 'Open With', select 'Source Code (Text) Editor With Encoding', choose the correct code page and click 'OK'.

Recommended solution: Change console code page

The console character encoding can be changed to UTF-8, which is identified by code page 65001 (on Windows systems). UTF-8 allows encoding all Unicode characters, i.e. special characters of all languages are supported.

In order to change the code page to UTF-8, run the following command:

Chcp 65001

This works fine under Windows 7 and higher. Older operating systems might not support it.

The command must be executed in the command line window before running the command that redirects the output to the TXT file. Windows does not save the chosen code page, so the code page change command must be executed in every command line window separately.

The output TXT file will be encoded using UTF-8. This encoding is supported by almost every text editor. UTF-8 is usually detected automatically, i.e. you do not have to select the encoding / code page manually; you can "just open" the file.

After changing the encoding to UTF-8, special characters might be displayed improperly in the command line window (but are written fine to the TXT file), because the default raster font does not support the characters. In this case, select a different font (by clicking on the command line window's icon → 'Properties'), like e.g. 'Consolas' or 'Lucida Console'.


PowerShell

When using the Windows PowerShell instead of the standard command line window, the TXT file will always be encoded using UTF-16 LE, independent of which console code page is selected. PowerShell automatically converts the output of a command line program from the currently active console code page to the UTF-16 LE representation. So, PowerShell does not magically preserve special characters. The command line program can only output characters that can be encoded using the currently active console code page. Thus, it is recommended to use the Chcp 65001 solution above for PowerShell, too.







Get KeePass