Shift JIS — LLMpedia

Shift JIS
Name	Shift JIS
Alias	SJIS
Developed by	Microsoft, ASCII Corporation, Japan Industrial Standards Committee
Introduced	1986
Classification	Variable-width character encoding
Based on	JIS X 0208, JIS X 0201
Used for	Japanese text
Mime	text/plain; charset=Shift_JIS

Contents

History
Encoding Structure
Variants and Extensions
Compatibility and Interoperability
Implementation and Usage
Security and Issues

Shift JIS is a character encoding for Japanese that maps JIS X 0208 and JIS X 0201 character sets into a byte-oriented form used on many Microsoft platforms, Apple systems, and embedded devices. It became widespread in consumer software, NTT, and personal computing during the late 1980s and 1990s, interacting with standards like ISO/IEC 2022 and implementations from vendors such as IBM and Sun Microsystems. Shift JIS influenced web content, file formats, and interchange among systems from Microsoft Windows 95 to PlayStation consoles, while coexisting with encodings like EUC-JP and UTF-8.

History

Shift JIS emerged from efforts to adapt JIS X 0208 double-byte sets to byte streams used by ASCII-based systems, building on earlier work by JIS X 0201 and corporate implementations at ASCII Corporation and Microsoft. Early adoption traces through Japanese personal computer lines such as NEC PC-9800, Fujitsu FM Towns, and Sharp X68000, and corporate deployments at NTT and Sony. The encoding saw integration into Microsoft Windows Japanese editions and influenced document formats in Adobe Systems products and publishing houses like Kodansha and Shogakukan. Internationalization efforts at W3C and companies like Netscape and Yahoo! Japan documented interoperability with ISO-2022-JP and EUC-JP, while later migration pushed content toward Unicode driven by initiatives from Unicode Consortium and IETF working groups.

Encoding Structure

Shift JIS combines single-byte and double-byte sequences: single-byte ranges derived from JIS X 0201 for Roman and Kana, and double-byte pairs mapping JIS X 0208 rows into lead and trail byte ranges. The mapping uses lead bytes that avoid control ranges used by ASCII and system APIs prevalent in Microsoft Windows API and POSIX environments, enabling backward compatibility with software from IBM and Apple Computer. Implementation details interact with fonts from vendors like Adobe Type Manager and foundries such as Monotype Imaging, affecting glyph assignment for kanji from sources like Kangxi Dictionary-derived standards and corporate extensions by NEC and Windows-31J variations by Microsoft.

Variants and Extensions

Multiple vendor variants arose: IBM created IBM-932 and IBM-943, Microsoft published Windows-31J (often called CP932), and platform-specific forms appeared in Mac OS Japanese encodings and embedded firmware for devices by Sony and Nintendo. Extensions added characters from corporate sets (e.g., NEC selection files), emoji codings later influenced by DoCoMo, KDDI, and SoftBank pictographs, and proprietary mappings used in gaming consoles like PlayStation and Nintendo DS. These variants complicate interchange between systems such as Sun Solaris, HP-UX, and FreeBSD, and during data exchange with internet entities like Google Japan and Rakuten.

Compatibility and Interoperability

Interoperability requires handling differences among Microsoft, IBM, and vendor-specific mappings; email systems using SMTP, web servers like Apache HTTP Server, and browsers such as Mozilla Firefox, Google Chrome, and Internet Explorer negotiate encodings with Content-Type headers and meta tags. Conversion tools in iconv libraries, Perl modules, Python codecs, and Java's charset framework mediate between Shift JIS, EUC-JP, and UTF-8. Historic issues appeared in cross-platform workflows involving Excel, Word, and LaTeX toolchains, and in content management systems by WordPress and enterprise stacks from Oracle and SAP requiring normalization with Unicode Normalization practices advocated by the Unicode Consortium.

Implementation and Usage

Shift JIS was widely used in desktop publishing by companies such as Adobe Systems, in web content for Yahoo! Japan, in email exchanges among corporations like NTT DoCoMo and SoftBank Group, and in software internationalization efforts by Microsoft and Apple Inc.. Game developers at Square Enix and Capcom used Shift JIS in older console titles for PlayStation and PlayStation 2, while embedded systems from Panasonic and Toyota telematics supported variant mappings. Libraries like libiconv and frameworks in Ruby and PHP provided conversion utilities; database systems including MySQL, PostgreSQL, and Oracle Database offered character set options and migration paths to UTF-8.

Security and Issues

Ambiguities in byte sequences and differences among vendor mappings lead to parsing vulnerabilities and misinterpretation in inputs processed by web servers like Apache and application servers such as Tomcat. Attack techniques exploiting multi-byte encodings affected CGI scripts and middleware in Perl and PHP, and influenced recommendations by IETF and the Open Web Application Security Project on input validation and canonicalization. Data corruption occurred during round-trip conversions between Shift JIS and Unicode when characters lacked direct mappings, affecting archival projects at National Diet Library and publishing workflows at Kodansha. Migration to UTF-8 in systems by Google, Microsoft Azure, and Amazon Web Services reduced such issues, while forensic work at institutions like Interpol and NIST documented legacy pitfalls.

Category:Character encoding