Data Fundamentals (H)

2. Vectorised operations:

Do the same operation on many data items at once

Do many operations to the same data items at once

Do one operation on one number at a time

Do operations without a defined order

Do not work on numbers

3. Arrays have fixed:

shape and dtype

variable names

legs

sums

elements

4. An (8,4) array has:

4 legs, 8 arms

8 rows, 1 column, 1 depth

8 rows, 4 columns

4 rows, 8 columns

2 frames, 4 rows, 4 columns

5. x[::2, 1:5] is a slice which indexes:

Every other row, columns 1-5

Every row, all columns

Every row, columns 0-6

Every row, columns 1-5

Every column, rows 1-6

6. np.arange(1,5) produces the array:

[1,1,1,1,1]

[0,1,2,3,4]

[[1], [1,2], [1,2,3], [1,2,3,4], [1,2,3,4,5]]

[1,2,3,4]

[0,1,2,3,4,5]

7. x is (15,4). y is (15,). How would I add the values in y to each row of x?

x + y

Impossible, shape mismatch

(x.T + y).T

x[:,0] * y

x + y[:,:]

8. x is shape (3,3). What does the operation x[:, [1,0,2]] do?

Returns a new array with the columns of x replaced with [1,0,2]

Returns a new array with the first two columns of x exchanged

Exchanges the exponent and mantissa of x

Impossible, undefined operation

Invokes the NumPy self-destruct sequence

9. A strided array data structure means that:

values in arrays are in undefined order

array legs are very long

transpose has time complexity O(1)

transpose has time complexity O(N)

transpose cannot be performed without making a copy.

10. Not a number (NaN) is represented in IEEE 754 as:

a sign of -3

a mantissa that evaluates to 666

an exponent of all 1s

a mantissa equal to the exponent

an exponent of all 0s

11. Which of these is not a IEEE 754 floating point exception?

overflow

inexact operation

underflow

invalid operation

incomplete operation

12. x is (10,4) and y is (4,4). Which of these operations results in x and y joined into a (14,4) array?

np.stack([x,y])

np.concatentate([x,y], axis=1)

np.join(x,y)

np.stack([x,y], axis=-1)

np.concatentate([x,y], axis=0)

13. A line is a:

facet

coord

layer

geom

stat

14. Faceted and layered

faceted: set of small scatterplots; layered: a 3D plot

faceted: many separate views of one dataset; layered: stacks multiple plots on top of each other

faceted: uses human faces to represent data; layered: uses the lasagne model of visualisation

faceted: stacks multiple plots on top of each other; layered: many separate views of one dataset

layered: multiple coords with the same geom; faceted: creates many geoms for same coord

15.
Which of these things is missing from this plot?

[Figure: The heart mass of cats, plotted against the cats' body weight]

A scale

Colours

A title

A caption

Axis labels

16.
I want to compute

\( z = sum_i i^2 x_i^3 y_i \)

\(x\) and \(y\) are 1D vectors of the same shape. z will be a scalar. Which of these does that?

i = np.arange(x.shape[0]) z = np.sum(x**3 * y * i**2)

i = np.linspace(0, x.shape[0], 100) z = np.sum(x**3 * y * i**2)

z = np.sum(x**3 * y * x.shape[0]**2)

i = np.ones_like(x) z = np.cumsum(x**3 * y * i**2)

i = np.arange(x.shape[1]) z = np.sum(x**3) * np.sum(y) * np.sum(i**2)

17. x is shape (100,5,5,8). What does np.einsum('ijkl->likj', x) result in?

A (100, 100, 5, 5) tensor

A (1, 100, 5, 5, 8) tensor

A (8, 100, 5, 5) tensor

A (100, 8, 5, 5) tensor

confusion

18. The memory layout of numerical arrays is stored using:

An array of strides specifying byte offsets to move along each dimension

Sentinel NaN values at the end of each row

A linked list data structure, with a hierarchy of pointers

Tiny snakes and ladders.

An array map, that gives stores memory locations in an array of the same size

19. \(\|\vec{x}\|_\infty\) could be computed by which operation?

np.sum(np.abs(x))

np.max(np.abs(x))

np.cumsum(x*x[::-1])

np.min(np.abs(x))

np.sqrt(np.sum(x**2))

20. Which of these operations is not defined over vectors in a topological vector space equipped with an inner product?

square root

inner product

length measurement

addition

scalar multiplication

21. \(\vec{x}\bullet \vec{y}\) is zero if and only if:

\(x\) is orthogonal to \(y\)

\(x\) is not equal to \(y\)

\(x\) is a scaled version of \(y\)

\(x\) is nonzero

\(x\) is equal to \(y\)

22. Distances in high-dimension can be counter intuitive because:

Every vector will have a very similar distance to every other vector

Only the \(L_\infty\) norm can be applied.

Distances will span a huge range of possible values

Distances cannot be computed

There are so many different kinds of distances

23. The covariance matrix represents:

the number of non-zero elements in a dataset

the size of the largest element of a dataset

the colour of the dataset

the spread of the dataset around its mean

the cross product of the mean vector with itself

24. When rendering a graph with unsigned scalar values mapped to colours, what property should the colour map have?

Perceptually uniform, monotonically increasing brightness

A diverging hue around zero.

Perceptually cuniform, moronically unceasing colour

Monotonic red-blue separation.

Monotonic, perceptually nonuniform hue-saturation separation

25. Applying the linear map defined by the matrix \(A\) to the column vector \(\vec{x}\) should be written:

\(\vec{x}^TA\vec{x}\)

\(\vec{x}A\vec{x}^T\)

\(\vec{x}A\)

\(A\vec{x}\)

\(A\vec{x}^T\)

26. Repeatedly applying a matrix \(A\) to a random initial vector \(\vec{x}_0\), normalising after each step, will lead to:

zero

the cross eigenvector

infinity

the leading eigenvector

the minor eigenvector

27. If \(A\) is orthogonal, then:

\(A^{-1} = A^T\)

\(A = A^{-1}\)

\(A = AA^T\)

\(A^A=T^A\)

\(A^T=A\)

28. The adjacency matrix of an undirected graph is:

Symmetric

Circulant

Non-singular

Self-similar

Adiabatic

29. In a stochastic matrix:

all elements are either 0 or 1

the sum of all elements is -1

the sum of each row 0

the sum of all elements is 1

the sum of each row is 1

30. The determinant of a matrix is equal to:

The sum of the eigenvalues

The sum of the diagonal

The Frobenious norm of the matrix

The product of the eigenvalues

The product of the rows

31. I want to find the shape of an object, with constant surface area, that holds the least water. What is the objective function?

The colour of the surface.

The surface area of the object.

None of the above.

The shape of the object.

The amount of water the object holds.

32. A convex constraint is equivalent to a restriction to a portion of the parameter space:

defined by a collection of planes.

where the minima are.

inside an axis-aligned box.

within a torus of fixed radius.

where the parameter vector has a fixed \(L_\infty\) norm.

33. An objective function is nonconvex, iff:

It is incomputable.

It is discontinuous.

It more than one minimum.

It is partially differentiable.

It has two maxima.

34. The feasible set in an optimisation problem is:

the possible configurations of the parameters

a kind of metaheuristic

the most distant configurations in the parameter space

the possible values of the objective function

the best solutions to the problem

35. In an approximation problem, we'd often have a loss function of the form:

\(L(\theta) = \frac{1}{\theta}\)

\(L(\theta) = \|\theta - \vec{x}\|\)

\(L(\theta) = \|f(\vec{x};\theta)-y\|\)

\(L(\theta) = \theta \vec{x}\)

\(L(\theta) = \frac{\theta}{f(\vec{x}-\vec{\theta})}\)

36. The definition of an eigenvector is:

\(A\lambda = \vec{x}A\)

\(\lambda = \|\vec{x}\|_2\)

\(A^{-1}\vec{x} = A^{+}\lambda\)

\(A\vec{x} = x\)

\(A\vec{x} = \lambda x\)

37. Simulated annealing uses what metaheuristic to help avoid getting trapped in local minima?

Hill climbing.

Randomised restart.

A temperature schedule.

A population of solutions.

Crossover rules.

38. A hyperparameter of an optimisation algorithm is:

A value that is used to impose constraints on the solution.

The determinant of the Hessian.

A direction in hyperspace.

A measure of how good a solution is.

A value that affects how a solution is searched for.

39. First-order optimisation requires that objective functions be:

monotonic

one-dimensional

\(C^1\) continuous

disconcerting

invertible

40. The gradient vector \(\nabla L(\theta)\) is a vector which, at any given point \(\theta\) will:

be equal to \(\theta\)

point towards the global minimum of \(L(\theta)\)

be zero

point in the direction of steepest descent

have \(L_2\) norm 1

41. Finite differences is not an effective approach to apply first-order optimisation because:

all of the above

the effect of measurement noise

of numerical roundoff issues.

none of the above

the curse of dimensionality

42. Ant colony optimisation applies which two metaheuristics to improve random local search?

memory and population

thants

temperature and memory

gradient descent and crossover

random restart and hyperdynamics

43. For a multi-objective optimisation, Pareto optimality means that:

Gradient descent is invalid.

Any improvement in any sub-objective functions makes at least one other worse.

There is no possible improvement to any sub-objective function.

Every combination of the sub-objective functions has been searched.

All sub-objective functions are zero.

44. What property of a probability distribution always holds true?

The determinant of probabilities is \(\infty\)

Probabilities are equally divided among outcomes

The product of all probabilities is 1

The sum of all probabilities is 0

The sum of all probabilities is 1

45. Bayesians use probability as:

a representation of the long-term average of frequencies of outcomes

a calculus of truth

a prayer book

a calculus of belief

complex angles

46. The conditional probability P(A|B) is defined to be: (\(\land\) means "and" and \(\lor\) means "or")

\(P(A)P(B)\)

\(P(A \land B) / P(B)\)

\(P(A \lor B) + P(B \lor A)\)

\(P(A||B) - B(A||P)\)

\(P(A \land B) P(B)\)

47. If I have a joint distribution over two random variables \(A\) and \(B\), \(P(A,B)\), how can I compute \(P(A)\)?

Sum/integrate \(P(A,B)\) for every value of \(A\) and \(B\).

Divide \(P(A,B)\) by \(P(B)\)

Sum/integrate over \(P(A,B)\) for every value of \(B\)

\(P(A,B)âˆ’P(A|B)\)

Sum/integrate over \(P(A,B)\) for every value of \(A\)

48. In an optimisation problem, a penalty function can be used to:

implement soft constraints

reduce the need for random search

issue red cards

accelerate gradient descent

implement genetic algorithms

49. The entropy of a random variable \(H(X) = \sum_i -\log(P(x_i))P(x_i)\) is:

A measure of expectation

A measure of surprise

A measure of likelihood

A measure of dimension

A measure of validity

50. Bayes' Rule is:

P(A|B) = P(B|A)P(A)P(B)

P(A|B) = P(B|B)P(A|A)

P(A|B) = P(A)P(B)

P(A|B) = P(B|A)P(A) / P(B)

P(A|B) = P(A) - P(A ^ B)

51. The name for P(B|A) and P(A) in Bayes Rule are:

posterior and evidence

likelihood and prior

priory and little red riding hood

evidence and prior

likelihood and posterior

52. The expectation \(\mathbb{E}[X+1]\) for a discrete random variable \(X\) would be computed as:

\(\sum_x P(X=x)(x+1)\)

\(\sum_x P(X=x+1)(x+1)\)

\(\int_x P(X=x)P(X=1)x\)

\(\sum_x P(X+1=x)(x)\)

\(\sum_x P(X=x+1)\)

53. Which of these is a statistic which is an estimator of a population parameter for a normal distribution?

The sample mean

The bootstrap

The probability

The sample minimum

The entropy

54. Which of these is a nonparametric statistic?

mean

expectation

median

standard deviation

gradient

55. The Nyquist limit \(f_n\) is equal to:

half the sampling rate \(f_s\)

half the amplitude quantization levels

1.0Hz

twice the sampling rate \(f_s\)

twice the amplitude quantization levels

56. Decreasing the number of levels of amplitude quantization will have what affect on the sampled representation of a signal?

Decreased SNR

Decreased Nyquist rate

Frequency shift

No effect

Increased SNR

57. The exponential smooth is often used instead of a moving average because:

it is probabilistic

it requires storing/computing less data

it is more numerically stable

it is nonlinear

it is super-quadratic

58. Aliasing is caused by sampling signals with:

frequencies greater than the Nyquist limit

noise levels greater than the maximum SNR

frequencies less than the Nyquist limit

undefined values present

noise levels less than the maximum SNR

59. Along with a way to evaluate the likelihood and prior at any parameter setting \(\theta\), what else does Metropolis-Hastings need to sample from the posterior distribution?

A maximum likelihood estimation procedure.

An integration function \(V(\theta|D)\)

A proposal distribution \(q(\theta^\prime|\theta)\)

The square root of 2.

A way to evaluate the evidence \(P(\theta)[/theta]

60. In medical device, if you had an initial heart/pulse rate p0 and a 1D vector of changes in pulse rates captured at evenly spaced intervals, delta_p, how would you compute p, the pulse rate at each of these times?

p = p0 + np.cumsum(delta_p)

p = p0 + delta_p[:]

p = np.prod(delta_p) * p0

p = p0 * delta_p

p = np.sum(delta_p) + p0