Black Hole
May contain traces of nut
If you typed "3" into Word with superscript style, the character is still just "3" even though it displays as a superscript 3. All that is happening is the character is displayed in a smaller font and displaced vertically, and copy&pasting that into a non-Word-styles-aware program removes the styling. There are however the proper superscript numerals (and many other special characters) available as characters in their own right (but with reservations – see later).Robobunny said said:Hello Black Hole, in the "media mistakes" you posted "38m³" (I just copied and pasted from it). I originally wrote "38m^3" as I couldn't find the superscript option (I even tried copy/paste from MS Word but it pastes plain text). I expect I'm missing the obvious, but where is it?
In my case, the answer is I mostly use an iPad for general web browsing and the forum, and there is a useful app for accessing "unusual" characters: Character Pad, which also seems to be available for Android.
The equivalent in Windows is the Character Map applet (type that into the Windows search), or in many text programs (including Word) there is a Special Character tool on (usually) the Insert menu. These display most of* the characters available in any particular font, and inserts ones selected into your text... but it can be quite hard to find the character/glyph you want. No doubt similar are available for Linux, if not already included in your distro (there is one in Mint).
* For some reason, the Win7 version of Character Map does not provide access to code points 0-32 (0-20 hex) or 127-159 (7F-9F hex), even though there are characters assigned to those positions in the relevant code pages.
There are a number of other options in the iOS/iPadOS world: In Settings >> General >> Keyboards >> Text Replacement, I have (for example) defined the typed string "omega" as generating Ω. Also, holding an on-screen keypad button (not a bluetooth keyboard key) pops up a list of alternative characters – try ' (apostrophe) for example. Ditto Android.
In Windows and Linux there are direct entry methods for typing a code to insert any specific character, but there are a variety of "ifs and buts" which make the whole thing quite complex with no one-size-fits-all solution. If you want to avoid any of that complexity and simply copy&paste "special" characters, skip to the tables below NOW.
The tables include codes for direct entry, with the following notes and reservations:
- Historically, direct entry in Windows is restricted to "Alt Codes", and these only work through the currently-active code pages (without a leading zero for the DOS code page, or with a leading zero for the Windows code page – there is a difference!). The alt codes in the table must be entered as shown (including any leading zero), and are only valid for system code pages 850 and 1252 (ie Western, appropriate to UK). The procedure is: press and hold the Alt key; enter the code on the numeric keypad (not the top row of the keyboard); release Alt. Numlock must be on to avoid random results!
- Historically, alt codes are only valid as decimal numbers in the range 1-255 (ie the extent of the code pages), and entering numbers greater than 255 wrap around modulo 256 (this applies to Win7). To provide access to supported characters outside the code page, later versions of Windows interpret numbers greater than 255 as Unicode, and even provide a hack for hex entry. Support may vary. For details see this useful Wikipedia article and/or this other Wikipedia article.
- Linux only supports Unicode direct entry (which I find a pain because I have a selection of alt codes memorised, but you can see why the Linux world wouldn't want to go down the code pages rabbit hole!). The procedure is (Linux Mint): press and hold Shift+Ctrl; type "u" and then the required Unicode (in hex); release Shift+Ctrl.
- See the Unicode website for an exhaustive listing of characters available, but you might get on better with a downloadable summary listing in plain text here.
Be aware that the more specialised the character is, the more likely it is some people's systems are not set up to display it the way you see it on your system, and support may depend on a particular installed font.
Neither of the above should apply to content on the Web, because characters get converted to HTML representation.
Once upon a time, when digital electronic communication was in its infancy, there were various telegraphy systems, and for economy it was necessary to represent each character in the message in as few a number of bits as possible. Ignoring encodings such as Morse, where common characters are assigned fewer bits than less common ones, 6 bits is plenty for the English alphabet plus numbers and some punctuation (and all of this was happening in the English-speaking nations). Eventually this expanded to 7-bit ASCII used for teleprinters.
Once computers standardised on an 8-bit byte as the unit of data storage, there was room for 256 characters and control codes in the mapping table. Computers were expanding into foreign language markets, with the need to support accented Latin characters or even non-Latin characters such as Cyrillic, so this was achieved by "code pages" which assigned different character maps according to the system locale, and a means to access them even without a dedicated keyboard button by entering a numeric code ("alt codes").
Thus the actual character obtained by pressing a keyboard key or entering its code would depend on the code page currently in effect, and also whether that code was supported by the installed font (be that on a graphics display or a text-driven VDU).
Various attempts were made to expand the character encoding and disambiguate character mappings, but the one with the most traction now is Unicode. The aim is to represent every single character, punctuation mark, accent, glyph, symbol, emoticon... for any language each with a unique code (not necessarily restricted to 8 or even 16 bits).
The implementation in all major modern operating systems is to map whatever internal encoding is used to the equivalent Unicode, and use the Unicode to access the required character in the font file.
Therefore, if the user has the means to input Unicode directly, they can enter any character in use worldwide... but that does not mean their system supports that character, or that it is included in the current font. More particularly, even if their system displays it, there is no certainty that somebody else viewing the same document will be able to see it. Missing characters may be displayed as a box with an X in it, or some other random character entirely.
Once computers standardised on an 8-bit byte as the unit of data storage, there was room for 256 characters and control codes in the mapping table. Computers were expanding into foreign language markets, with the need to support accented Latin characters or even non-Latin characters such as Cyrillic, so this was achieved by "code pages" which assigned different character maps according to the system locale, and a means to access them even without a dedicated keyboard button by entering a numeric code ("alt codes").
Thus the actual character obtained by pressing a keyboard key or entering its code would depend on the code page currently in effect, and also whether that code was supported by the installed font (be that on a graphics display or a text-driven VDU).
Various attempts were made to expand the character encoding and disambiguate character mappings, but the one with the most traction now is Unicode. The aim is to represent every single character, punctuation mark, accent, glyph, symbol, emoticon... for any language each with a unique code (not necessarily restricted to 8 or even 16 bits).
The implementation in all major modern operating systems is to map whatever internal encoding is used to the equivalent Unicode, and use the Unicode to access the required character in the font file.
Therefore, if the user has the means to input Unicode directly, they can enter any character in use worldwide... but that does not mean their system supports that character, or that it is included in the current font. More particularly, even if their system displays it, there is no certainty that somebody else viewing the same document will be able to see it. Missing characters may be displayed as a box with an X in it, or some other random character entirely.
The following tables are (what I regard as) the most useful special characters. It's only a subset – if there are obviously useful omissions let me know.
Superscript and Subscript Numerals
Character | Interpretation | Alt Code | Unicode (hex) | Unicode (decimal) |
---|---|---|---|---|
⁰︎ | Superscript 0 | 2070 | 8304 | |
¹ | Superscript 1 | 0185 or 251 | B9 | * |
² | Superscript 2 | 0178 or 253 | B2 | * |
³ | Superscript 3 | 0179 or 252 | B3 | * |
⁴︎ | Superscript 4 | 2074 | 8308 | |
⁵︎ | Superscript 5 | 2075 | 8309 | |
⁶︎ | Superscript 6 | 2076 | 8310 | |
⁷︎ | Superscript 7 | 2077 | 8311 | |
⁸︎ | Superscript 8 | 2078 | 8312 | |
⁹︎ | Superscript 9 | 2079 | 8313 | |
₀︎ | Subscript 0 | 2080 | 8320 | |
₁︎ | Subscript 1 | 2081 | 8321 | |
₂︎ | Subscript 2 | 2082 | 8322 | |
₃︎ | Subscript 3 | 2083 | 8323 | |
₄︎ | Subscript 4 | 2084 | 8324 | |
₅︎ | Subscript 5 | 2085 | 8325 | |
₆︎ | Subscript 6 | 2086 | 8326 | |
₇︎ | Subscript 7 | 2087 | 8327 | |
₈︎ | Subscript 8 | 2088 | 8328 | |
₉︎ | Subscript 9 | 2089 | 8329 |
Fractions
Character | Interpretation | Alt Code | Unicode (hex) | Unicode (decimal) |
---|---|---|---|---|
½ | One half | 0189 or 171 | BD | * |
⅓︎ | One third | 2153 | 8531 | |
⅔︎ | Two thirds | 2154 | 8532 | |
¼ | One quarter | 0188 or 172 | BC | * |
¾ | Three quarters | 0190 or 243 | BE | * |
⅕︎ | One fifth | 2155 | 8533 | |
⅖︎ | Two fifths | 2156 | 8534 | |
⅗︎ | Three fifths | 2157 | 8535 | |
⅘︎ | Four fifths | 2158 | 8536 | |
⅙︎ | One sixth | 2159 | 8537 | |
⅚︎ | Five sixths | 215A | 8538 | |
⅛︎ | One eighth | 215B | 8539 | |
⅜︎ | Three eighths | 215C | 8540 | |
⅝︎ | Five eighths | 215D | 8541 | |
⅞︎ | Seven eighths | 215E | 8542 |
Miscellaneous Mathematical & Engineering Symbols
Character | Interpretation | Alt Code | Unicode (hex) | Unicode (decimal) |
---|---|---|---|---|
± | Plus or minus | 0177 or 241 | B1 | * |
× | Multiply | 0215 or 158 | D7 | * |
· | Dot product | 0183 or 250 | B7 | * |
÷ | Divide | 0247 or 246 | 2028 | 8232 |
≠︎ | Not equal | 2260 | 8800 | |
≈︎ | Roughly equal | 2248 | 8776 | |
≤︎ | Less than or equal | 2264 | 8804 | |
≥︎ | Greater than or equal | 2265 | 8805 | |
≡︎ | Identical | 2261 | 8801 | |
∴︎ | Therefore | 2234 | 8756 | |
… | Ellipsis | 2026 | 8230 | |
∞︎ | Infinity | 2028 | 8232 | |
√︎ | Root | 221A | 8730 | |
∫︎ | Integrate | 222B | 8747 | |
∑ | Sum of series | 2211 | 8721 | |
∆ | Difference | 2206 | 8710 | |
∏ | Product of series | 220F | 8719 | |
° | Degrees | 0176 or 248 | B0 | * |
′ | Minutes / Feet | 2032 | 8242 | |
″ | Seconds / Inches | 2033 | 8243 | |
Ω | Ohms (omega) | 3A9 | 937 | |
μ | Micro (mu) | 0181 or 230 | 3BC | 956 |
π | Pi | 3C0 | 960 | |
✓︎ | Tick | 2713 | 10003 | |
✘︎ | Cross | 2718 | 10008 | |
←︎ | Left arrow | 27 | 2190 | 8592 |
→︎ | Right arrow | 26 | 2192 | 8594 |
↑ | Up arrow | 24 | 2191 | 8593 |
↓ | Down arrow | 25 | 2193 | 8595 |
Typographical Characters
Character | Interpretation | Alt Code | Unicode (hex) | Unicode (decimal) |
---|---|---|---|---|
‘ | Open quote | 0145 | 2018 | 8216 |
’ | Close quote | 0146 | 2019 | 8217 |
“ | Open speech mark | 0147 | 201C | 8220 |
” | Close speech mark | 0148 | 201D | 8221 |
– | En dash | 0150 | 2013 | 8211 |
— | Em dash | 0151 | 2014 | 8212 |
* Although clearly there is a decimal equivalent of the stated hex Unicode, it is of no use because the Alt entry method will interpret the number as to be translated via the respective code page and therefore not as Unicode.
Last edited: