EUC-JP — LLMpedia

EUC-JP
Name	EUC-JP
Classification	Multibyte character encoding
Bits	8/16

Contents

Overview
History and Development
Technical Details
Variants and Extensions
Usage and Compatibility
Implementations and Support

EUC-JP is a multibyte character encoding used for representing Japanese text on Unix-like systems. It was widely adopted in academic, commercial, and governmental Japan-based computing environments, and saw extensive implementation in software projects originating from organizations such as AT&T, NEC, Fujitsu, Panasonic, and Sony. EUC-JP influenced and interacted with other encodings developed by Microsoft, Apple Inc., IBM, and standards bodies like ISO/IEC JTC1/SC2 and W3C.

Overview

EUC-JP encodes Japanese language characters by combining single-byte and multibyte sequences to represent ASCII-compatible text, JIS X 0201-derived kana, and JIS X 0208 kanji; implementations often coexist with ISO/IEC 646-derived locales, POSIX utilities, and toolchains from projects such as GNU Project, Debian, Red Hat, FreeBSD, and NetBSD. It provided interoperability for document formats used by TeX, Emacs, vi, vim, Perl, and Python prior to widespread UTF-8 adoption. EUC-JP's design facilitated conversions with encodings like Shift_JIS and standards such as ISO-2022-JP, as well as compatibility layers in servers run by operators like Nippon Telegraph and Telephone, SoftBank, NTT DoCoMo, and KDDI.

History and Development

EUC-JP emerged in the context of 1970s–1990s efforts to standardize Japanese text processing involving bodies like Japanese Industrial Standards Committee, Ministry of International Trade and Industry (Japan), and the Joint Computer Graphics Project. Early precursors included work by Mitsubishi Electric, Hitachi, and academic research at University of Tokyo and Kyoto University. The encoding saw adoption across Unix vendors including Sun Microsystems, Digital Equipment Corporation, and IBM AIX, and became a default in distributions and products from SUSE, Oracle Corporation, HP, and telecommunications firms. Interaction with global standards efforts—ISO/IEC 10646, Unicode Consortium, and IETF—shaped migration strategies used by corporations like Microsoft during Windows Japan localization and by browser vendors such as Netscape Communications Corporation, Mozilla Foundation, Google and Apple Inc..

Technical Details

EUC-JP uses an 8-bit clean transport and encodes characters using one to three bytes: single-byte ASCII (as in US-ASCII), single-byte kana per JIS X 0201, and two-byte sets for JIS X 0208 characters. It reserves specific byte ranges to distinguish lead and trail bytes, enabling transformation routines in libraries such as glibc, libiconv, ICU, and language runtimes like Java Virtual Machine and .NET Framework with codepages maintained by IANA. Conversion algorithms handle mappings between EUC-JP and Unicode code points as defined by the Unicode Consortium and mirrored in tables provided by W3C and IETF specifications. Error handling and stateful conversion have implications for protocols used by SMTP, HTTP, and NNTP servers, and for file systems used by X Window System and Wayland compositors.

Variants and Extensions

Several variants and vendor extensions evolved, often adding characters from JIS X 0212 and vendor-specific kanji sets used by NEC, IBM, and Microsoft. These variants were supported in enterprise products from Oracle Corporation, SAP SE, and SAP AG integrations, and in desktop software such as Microsoft Office Japan editions and LibreOffice. Mobile carriers like NTT DoCoMo, au by KDDI, and SoftBank historically maintained proprietary emoji sets mapped to legacy encodings. Efforts to represent extended repertoires intersected with ISO/IEC 10646 and influenced conversion strategies in libraries like OpenSSL and GnuTLS when handling internationalized certificates and identifiers.

Usage and Compatibility

EUC-JP remained common in archives, legacy documents, and mailings from institutions including Bank of Japan, Ministry of Finance (Japan), and universities such as Osaka University, Keio University, and Waseda University. System administrators working with servers from Canonical (company), Red Hat, and Amazon Web Services have to account for legacy EUC-JP-encoded logs and data when migrating to UTF-8-centric environments. Web browsers by Microsoft, Google, Mozilla Foundation, and Apple Inc. implemented heuristics for content negotiation and charset sniffing for pages declared as EUC-JP, and content management systems like WordPress, Drupal, and Joomla! historically offered plugins for conversion. Interoperation with databases such as MySQL, PostgreSQL, and Oracle Database required correct collations and character set definitions to preserve sorting and indexing behavior.

Implementations and Support

Support for EUC-JP is provided by operating systems including Linux kernel distributions, macOS, Windows NT, and BSD derivatives; language runtimes and libraries such as Python, Ruby, Perl, PHP, Java SE, and .NET Core include codecs or bindings to conversion libraries. Tools and utilities—iconv, nkf, enca, recode, imapd, and fetchmail—offer conversion and detection capabilities. Major software projects like Apache HTTP Server, nginx, Postfix, and Sendmail include configuration options for handling legacy encodings. Archival and digital preservation initiatives at institutions like the National Diet Library (Japan), Smithsonian Institution, and Library of Congress maintain workflows for converting EUC-JP materials to Unicode for long-term access.

Category:Character encoding