ASCII

From Citizendium
Jump to navigation Jump to search
This article is developing and not approved.
Main Article
Discussion
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
 
This editable Main Article is under development and subject to a disclaimer.

ASCII stands for American Standard Code for Information Interchange and is a character-encoding scheme used internally by computers dating back to the 1960's.[1] It encodes up to 128 characters as 7-bit values, with the most significant bit of an 8-bit byte unused. Characters with values below decimal 32 were considered control characters and were used to control print heads of devices, or to indicate line ends of text files, just for example. These are not printable or visible to the naked eye most of the time, but they do have important consequences, depending on how particular programs handle them.

Within years of its implementation, it was found that 128 characters were not enough, and a new version of ASCII, called Extended ASCII, was devised that used all 8 bits and could represent 255 characters. Extended ASCII included additional punctuation marks and common "foreign" characters. But within a very few years, it became apparent that 255 characters were also not enough.

Thus, additional character encoding standards were devised, including ISO-8???, which was a remapping of the upper 128 characters of Extended ASCII to include more European language characters. Eventually, all these encodings were superseded by three versions of Unicode encodings (UTF-8, UTF-16 and UTF-32), which extend the vocabulary of expressed character encodings to including as many as 24 bits if needed.

During the first three decades after electronic computers were invented, a variety of other encodings competed with ASCII for dominance, including notably EBCDIC from IBM, but in practice ASCII became so entrenched that no one could really afford to do away with its conventions altogether. Thus, each successive character encoding standard attempted to preserve the original 128 values as mapped by ASCII, so that ASCII encodings would continue to work whenever possible. Many files today, although saved with newer (perhaps wider) character encodings, can still be read with at least partial success by legacy programs which only understand ASCII character encodings.

Character encoding schemes for files is an important and complex topic for the World Wide Web, because all the programs on both the server and client side may need to be in agreement about character encoding in order for information to be transmitted and displayed correctly. Text editors used to prepare web pages and other files shipped across networks always save files using a particular character encoding, and the programs on the other end of the network may try to use a different encoding when opening and displaying the files. The problem arises because it is not always possible, given a file, to tell what encoding was used when saving the file just by examining the raw contents of the file. Thus, standards such as HTTP and HTML attempt to keep track of the character encodings of files, but not always with 100% success.

References