String (computing): Difference between revisions
imported>Howard C. Berkowitz No edit summary |
imported>Ed Poor (like a "string" of real objects: take your pick of popcorn or pearls) |
||
Line 1: | Line 1: | ||
{{subpages}} | {{subpages}} | ||
In computer [[programming languages]], a '''string''' is a data type which consists of a list of characters | In computer [[programming languages]], a '''string''' is a data type which consists of a list of characters arranged together in sequence (like a string of pearls in a necklace). In some languages, a string is simply a list of characters with some convenient helper methods that make strings more like blocks of text. As a list, many programming languages let you use array or list processing methods on strings - getting the n-th member of the list will return the n-th character in the string. | ||
With most traditional text encoding methods, each member of the list represents a single character, single byte piece of data. So the word 'hello' would contain five characters and thus five bytes. With the introduction of [[Unicode]], many programming languages now support multibyte string encoding, where some letters are single bytes and others are multiple bytes. In the [[Java (programming language)|Java]] programming language (and many languages which run on the Java platform: [[JRuby]], [[Scala (programming language)|Scala]], Groovy etc.), strings can contain Unicode characters and all the string methods are multibyte aware. The [[Python (programming language)|Python]] programming language has a separate Unicode datatype. The [[Ruby (programming language)|Ruby]] language can support multibyte string encoding in later versions or by using extra libraries. | With most traditional text encoding methods, each member of the list represents a single character, single byte piece of data. So the word 'hello' would contain five characters and thus five bytes. With the introduction of [[Unicode]], many programming languages now support multibyte string encoding, where some letters are single bytes and others are multiple bytes. In the [[Java (programming language)|Java]] programming language (and many languages which run on the Java platform: [[JRuby]], [[Scala (programming language)|Scala]], Groovy etc.), strings can contain Unicode characters and all the string methods are multibyte aware. The [[Python (programming language)|Python]] programming language has a separate Unicode datatype. The [[Ruby (programming language)|Ruby]] language can support multibyte string encoding in later versions or by using extra libraries. | ||
Line 14: | Line 14: | ||
In some languages, this conditional will not be satisified. The conditional is comparing an integer and a string, and the types do not match. But for many uses, this kind of matching is pedantic and unnecessary. If the string had been converted into an integer, it would be equal to the integer it is being compared with. Similarly, if the integer had been converted into a string, it would be equal to the string it is being compared with. | In some languages, this conditional will not be satisified. The conditional is comparing an integer and a string, and the types do not match. But for many uses, this kind of matching is pedantic and unnecessary. If the string had been converted into an integer, it would be equal to the integer it is being compared with. Similarly, if the integer had been converted into a string, it would be equal to the string it is being compared with. | ||
==Conversion== | ==Conversion== | ||
This kind of conversion is called [[implicit conversion]], and some languages ([[Scala (programming language)|Scala]], for instance) allow one to describe how said implicit conversions happen by declaring implicit type conversion functions. | This kind of conversion is called [[implicit conversion]], and some languages ([[Scala (programming language)|Scala]], for instance) allow one to describe how said implicit conversions happen by declaring implicit type conversion functions. |
Revision as of 10:57, 16 April 2010
In computer programming languages, a string is a data type which consists of a list of characters arranged together in sequence (like a string of pearls in a necklace). In some languages, a string is simply a list of characters with some convenient helper methods that make strings more like blocks of text. As a list, many programming languages let you use array or list processing methods on strings - getting the n-th member of the list will return the n-th character in the string.
With most traditional text encoding methods, each member of the list represents a single character, single byte piece of data. So the word 'hello' would contain five characters and thus five bytes. With the introduction of Unicode, many programming languages now support multibyte string encoding, where some letters are single bytes and others are multiple bytes. In the Java programming language (and many languages which run on the Java platform: JRuby, Scala, Groovy etc.), strings can contain Unicode characters and all the string methods are multibyte aware. The Python programming language has a separate Unicode datatype. The Ruby language can support multibyte string encoding in later versions or by using extra libraries.
Strings can be implicitly or explicitly converted into other datatypes depending on the programming language. Consider the following statement:
print "My favourite number is " + 5
In many languages, the 5 literal will represent an integer. It will be automatically cast into a string '5' and appended to the prior string. Now consider the following:
if (10 == "10") { /* ... */ }
In some languages, this conditional will not be satisified. The conditional is comparing an integer and a string, and the types do not match. But for many uses, this kind of matching is pedantic and unnecessary. If the string had been converted into an integer, it would be equal to the integer it is being compared with. Similarly, if the integer had been converted into a string, it would be equal to the string it is being compared with.
Conversion
This kind of conversion is called implicit conversion, and some languages (Scala, for instance) allow one to describe how said implicit conversions happen by declaring implicit type conversion functions.
String manipulation
Some languages were developed for manipulating strings, such as awk
and Snobol
. String-handling capability will be found in more and more general-purpose programming languages, but especially "scripting" languages such as Perl
, PHP
, and Python programming language
.
Discussed abstractly, there are a number of common string operations, the details of which vary with the language
Operation | Parameters | Result |
---|---|---|
Concatenation | string1, string2 | string1string2 |
Substring | String1, integer1,[ integer2] | Integer1 characters of string1, starting at the first character unless integer2 is specified as a starting point |
Fields | String1, String2 (or character) | An array of strings taken from string1, which were separated by string2 |