01. Primitive Types

Primitive types

A value’s type provides the compiler with two pieces of information: first, how much memory to allocate — the size of the value — and second, what that memory represents. In the case of many of the built-in types, size and representation are part of the type’s name.

  • bool - 1 byte of memory (8 bits)

  • byte - alias for uint8

  • int, int8, int16, int32, int64

  • uint, uint8, uint16, uint32, uint64, uintptr

  • string

  • rune - alias for int32, represents a Unicode code point

  • float32, float64

  • complex64, complex128

Независимо от конкретной архитектуры, типы int и uint отличаются от других целочисленных типов с явно указанным размером. Даже если на текущей платформе int имеет значение 32 бита, то все равно int не равен int32. Необходимо явное преобразование типов.

Operator Precedence

При операциях с целочисленным типом результат всегда целочисленный. 5/3 = 1.

String

The Go Blog - Strings, bytes, runes and characters in Go

In Go, a string is in effect a read-only slice of bytes. It's important to state right up front that a string holds arbitrary bytes. It is not required to hold Unicode text, UTF-8 text, or any other predefined format. As far as the content of a string is concerned, it is exactly equivalent to a slice of bytes.

  • Strings can contain arbitrary bytes, but when constructed from string literals, those bytes are (almost always) UTF-8.

  • Strings are built from bytes so indexing them yields bytes, not characters. A string might not even hold characters.

  • Go source code is always UTF-8.

  • A string literal, absent byte-level escapes, always holds valid UTF-8 sequences.

  • Those sequences represent Unicode code points, called runes.

  • No guarantee is made in Go that characters in strings are normalised.

  • Strings cannot have nil value.

  • Оператор индекса, применяемый к строке, возвращает байтовое значение (byte value), а не символ.

  • And like integer and floating-point values, two values of the same string type can also be compared with >, <, >= and <= operators. When comparing two strings, their underlying bytes will be compared, one byte by one byte. If one string is a prefix of the other one and the other one is longer, then the other one will be viewed as the larger one.

  • The built-in string type has no methods (just like most other built-in types in Go), but we can use functions provided in the strings standard package to do all kinds of string manipulations.

  • Indeed, even though a string is backed by a []byte, converting a []byte into a string requires a copy of the byte slice.

String literal that uses the \xNN notation to define a string constant holding some peculiar byte values:

const sample = "\xbd\xb2\x3d\xbc\x20\xe2\x8c\x98"

Usual string:

const placeOfInterest = `⌘`

fmt.Printf("plain string: ")
fmt.Printf("%s \n", placeOfInterest) // ⌘

fmt.Printf("quoted string: ")
fmt.Printf("%+q \n", placeOfInterest) // "\u2318"

fmt.Printf("hex string: ")
fmt.Printf("%x \n", placeOfInterest)

String operations

  • Функция len(s) возвращает число байт, а не символов строки. Функция fmt.Println(utf8.RuneCountInString(s)) вернет число рун. The other approach is to convert the string into a slice of runes and iterate over it (has a copy overhead).

  • Обращение s[i] возвращает i-й байт строки, а не символ.

  • Оператор s[i:j] возвращает новую подстроку (без копирования).

  • Строки неизменяемы, и следовательно строки могут разделяться между переменными. Поэтому копирование строки, так и выделение подстроки является очень дешевой операцией. В обоих случаях не выделяется никакая новая память.

String comparison

Above has mentioned that comparing two strings is comparing their underlying bytes actually. Generally, Go compilers will made the following optimisations for string comparisons.

  • For == and != comparisons, if the lengths of the compared two strings are not equal, then the two strings must be also not equal (no needs to compare their bytes).

  • If their underlying byte sequence pointers of the compared two strings are equal, then the comparison result is the same as comparing the lengths of the two strings.

So for two equal strings, the time complexity of comparing them depends on whether or not their underlying byte sequence pointers are equal. If the two are equal, then the time complexity is O(1), otherwise, the time complexity is O(n), where n is the length of the two strings.

So please try to avoid comparing two long strings if they don't share the same underlying byte sequence.

Unicode

В литералах Go можно указывать символы Unicode при помощи их числового кода:

  • \uXXXX - 16-и битных значений

  • \uXXXXXX - 32-х битных значений

Raw string literal

Неформатированный строковый литерал записывается при помощи одинарных обратных ковычек `...` вместо двойных. Внутри такого литерала управляющие последовательности не обрабатываются; содержимое принимается буквально, включая обратные косые черты и символы новой строки, так что неформатированный литерал может состоять из нескольких строк сплошного текста.

Convert string to bytes

Допустимо преобразование строки в слайс байт:

  • When a string is converted to a byte slice, the result byte slice is just a deep copy of the underlying byte sequence of the string.

  • When a byte slice is converted to a string, the underlying byte sequence of the result string is also just a deep copy of the byte slice.

str2 := "你好世界"
bin := []byte(str2)
fmt.Println("binary cn: ", bin, len(bin))
for idx, val := range bin {
    fmt.Printf("raw binary idx: %v, oct: %v, hex: %x\n", idx, val, val)
}

Допустимо преобразование строки в слайс rune.

str := "你好世界"
runes := []rune(str)

In a conversion from a rune slice to string, each slice element (a rune value) will be UTF-8 encoded as from one to four bytes and stored in the result string. If a slice rune element value is outside the range of valid Unicode code points, then it will be viewed as 0xFFFD, the code point for the Unicode replacement character.

When a string is converted to a rune slice, the bytes stored in the string will be viewed as successive UTF-8 encoding byte sequence representations of many Unicode code points. Bad UTF-8 encoding representations will be converted to a rune value 0xFFFD.

Runes and Unicode

The Go language defines the word rune as an alias for the type int32, so programs can be clear when an integer value represents a code point. They are "mapped" to their Unicode codepoint. For example the rule literal 'a' is in reality the number 97.

  • Rune means exactly the same as "code point", with one interesting addition.

  • A rune literal represents a rune constant, an integer value identifying a Unicode code point.

  • Unicode characters ('a'), 8-bit octal numbers ('\141'), 8-bit hexadecimal numbers ('\x61'), 16-bit hexadecimal numbers ('\u0061'), or 32-bit Unicode numbers ('\U00000061'). There are also several back‐ slash escaped rune literals, with the most useful ones being newline ('\n'), tab ('\t'), single quote ('''), double quote ('"'), and backslash ('\').

var symbol rune = 'a'
var autoSymbol = 'a'
unicodeSymbol := '⌘'
uncideSymboldByNumber := '\u2318'
println(symbol, autoSymbol, unicodeSymbol, uncideSymboldByNumber); // 97 97 8984 8984

for range for strings

A for range loop decodes one UTF-8-encoded rune on each iteration. Each time around the loop, the index of the loop is the starting position of the current rune, measured in bytes, and the code point is its value. Here's an example using yet another handy Printf format, %#U, which shows the code point's Unicode value and its printed representation:

 const nihongo = "日本語"
for index, runeValue := range nihongo {
    fmt.Printf("%#U starts at byte position %d\n", runeValue, index)
}
// U+65E5 '日' starts at byte position 0
// U+672C '本' starts at byte position 3
// U+8A9E '語' starts at byte position 6

String copy optimisations

The standard Go compiler makes some optimizations, which are proven to still work in Go Toolchain 1.16, for some special scenarios to avoid the duplicate copies. These scenarios include:

  • a conversion (from string to byte slice) which follows the range keyword in a for-range loop.

  • a conversion (from byte slice to string) which is used as a map key in map element retrieval indexing syntax.

  • a conversion (from byte slice to string) which is used in a comparison.

  • a conversion (from byte slice to string) which is used in a string concatenation, and at least one of concatenated string values is a non-blank string constant.

package main

import "fmt"

func main() {
	var str = "world"
	// Here, the []byte(str) conversion will
	// not copy the underlying bytes of str.
	for i, b := range []byte(str) {
		fmt.Println(i, ":", b)
	}

	key := []byte{'k', 'e', 'y'}
	m := map[string]string{}
	// The string(key) conversion copys the bytes in key.
	m[string(key)] = "value"
	
	// Here, this string(key) conversion doesn't copy
	// the bytes in key. The optimization will be still
	// made, even if key is a package-level variable.
	fmt.Println(m[string(key)]) // value (very possible)
}

Substrings do not create new string, but reuse existing one. This could cause memory leaks:

s1 := "Hello, World!"
s2 := s1[:5] // Hello

As of Go 1.18, the standard library also includes a solution with strings.Clone that returns a fresh copy of a string: uuid := strings.Clone(log[:36]).

Pointers

Go has pointers.

  • The type *T is a pointer to a T value. Its zero value is nil.

  • A pointer holds the memory address of a value.

  • The & operator generates a pointer to its operand (reference operator).

  • The * operator denotes the pointer's underlying value (dereference operator).

  • Dereferencing a nil pointer causes a runtime panic.

  • If a container is addressable, then its elements are also addressable.

    • Elements of a map are always unaddressable, even if the map itself is addressable.

    • Elements of a slice are always addressable, even if the slice itself is not addressable.

var p *int // pointer to int variable
i := 42 
p = &i // get address of i
fmt.Println(p) // read i through the pointer p = 21 
// set i through the pointer p

There are two ways to get a non-nil pointer value.

  1. The built-in new function can be used to allocate memory for a value of any type. new(T) will allocate memory for a T value (an anonymous variable) and return the address of the T value. The allocated value is a zero value of type T. The returned address is viewed as a pointer value of type *T.

  2. We can also take the addresses of values which are addressable in Go. For an addressable value t of type T, we can use the expression &t to take the address of t, where & is the operator to take value addresses. The type of &t is viewed as *T.

Pointer receiver methods

Методы, получающие указатель любого типа могут через указатель изменять находящееся по нему значение:

func inc(p *int) int {
    *p++
    return *p
}

v := 1
inc(&v)
println(v) // 2

Pointer class citizens:

  • Не всякое значение имеет адрес, но его имеет каждая переменная.

  • Каждый компонент переменной составного типа -- поле структуры или элемент массива -- так же является переменной и поэтому имеет свой адрес.

  • У констант нельзя взять адрес

  • Функция может спокойно возвращать адрес локальной переменной, которая продолжит существовать даже если функция завершила свою работу.

  • Значения карты не могут иметь указателей на себя &map[key] поскольку карта может перехэшироваться и значение указателя может меняться с течением времени.

Pointer equality

  • Нулевое значение указателя любого типа равно nil

  • Проверка p != nil, если p указывает на переменную.

  • Указатели можно сравнивать. Они равны тогда когда либо указывают на одну и ту же переменную или оба nil.

Difference from C

  • Unlike C, Go has no pointer arithmetic.

  • Note that, unlike in C, it's perfectly OK to return the address of a local variable; the storage associated with the variable survives after the function returns.

func NewFile(fd int, name string) *File {
    if fd < 0 {
        return nil
    }
    return &File{fd, name, nil, 0}
}

uintptr

Существует целочисленный тип uintptr, ширина которого не указана, но достаточна для хранения всех битов указателя. Этот тип используется в основном для низкоуровневого программирования.

Copy Cost

Generally speaking, the cost to copy a value is proportional to the size of the value. However, value sizes are not the only factor determining value copy costs. Different CPU architectures may specially optimize value copying for values with specific sizes.

In practice, we can view struct values with less than 5 fields and with sizes not larger than four native words as small-size values. The costs of copying small-size values are small.

To avoid large value copy costs in argument passing and channel value send and receive operations, we should try to avoid using large-size struct and array types as function and method parameter types (including method receiver types) and channel element types. We can use pointer types whose base types are large-size types instead for such scenarios.

One the other hand, we should also consider the fact that too many pointers will increase the pressure of garbage collectors at run time. So whether large-size struct and array types or their corresponding pointer types should be used relies on specific circumstances.

Generally, in practice, we seldom use pointer types whose base types are slice types, map types, channel types, function types, string types and interface types. The costs of copying values of these assumed base types are very small.

We should also try to avoid using the two-iteration-variable forms to iterate array and slice elements if the element types are large-size types, for each element value will be copied to the second iteration variable in the iteration process.

func Benchmark_Range_TwoIterVar(b *testing.B) {
	for i := 0; i < b.N; i++ {
		sumZ = 0
		// BAD: copy of v
		for _, v := range sZ {
			sumZ += v.a
		}
	}
}

Last updated