1. Why is Cross Validation important? 11

Solution 11

Code 11

2. Why is Grid Search important? 12

Solution 12

Code 12

3. What are the new Spark DataFrame and the Spark Pipeline? And how we can use the new ML library for Grid Search 13

Solution 13

Code 14

4. How to deal with categorical features? And what is one-hot-encoding? 16

Solution 16

Code 17

5. What are generalized linear models and what is an R Formula? 18

Solution 18

Code 18

6. What are the Decision Trees? 19

Solution 19

Code 21

7. What are the Ensembles? 22

Solution 22

8. What is a Gradient Boosted Tree? 22

Solution 22

9. What is a Gradient Boosted Trees Regressor? 23

Solution 23

Code 23

10. Gradient Boosted Trees Classification 24

Solution 24

Code 25

11. What is a Random Forest? 26

Solution 26

Code 26

12. What is an AdaBoost classification algorithm? 27

Solution 27

13. What is a recommender system? 28

Solution 28

14. What is a collaborative filtering ALS algorithm? 29

Solution 29

Code 30

15. What is the DBSCAN clustering algorithm? 32

Solution 32

Code 32

16. What is a Streaming K-Means? 33

Solution 33

Code 34

17. What is Canopi Clusterting? 34

Solution 34

18. What is Bisecting K-Means? 35

Solution 35

19. What is the PCA Dimensional reduction technique? 36

Solution 36

Code 37

20. What is the SVD Dimensional reduction technique? 38

Solution 38

Code 38

21. What is Latent Semantic Analysis (LSA)? 39

Solution 39

22. What is Parquet? 39

Solution 39

Code 39

23. What is the Isotonic Regression? 40

Solution 40

Code 40

24. What is LARS? 41

Solution 41

25. What is GMLNET? 42

Solution 42

26. What is SVM with soft margins? 43

Solution 43

27. What is the Expectation Maximization Clustering algorithm? 44

Solution 44

28. What is a Gaussian Mixture? 45

Solution 45

Code 45

29. What is the Latent Dirichlet Allocation topic model? 46

Solution 46

Code 47

30. What is the Associative Rule Learning? 48

Solution 48

31. What is FP-growth? 50

Solution 50

Code 50

32. How to use the GraphX Library? 50

Solution 50

33. What is PageRank? And how to compute it with GraphX 51

Solution 51

Code 52

Code 52

34. What is Power Iteration Clustering? 54

Solution 54

Code 54

35. What is a Perceptron? 55

Solution 55

36. What is an ANN (Artificial Neural Network)? 56

Solution 56

37. What are the activation functions? 57

Solution 57

38. How many types of Neural Networks are known? 58

39. How can you train a Neural Network 59

Solution 59

40. What application have the ANNs? 59

Solution 59

41. Can you code a simple ANNs in python? 60

Solution 60

Code 60

42. What support has Spark for Neural Networks? 61

Solution 61

Code 62

43. What is Deep Learning? 63

Solution 63

44. What are autoencoders and stacked autoencoders? 68

Solution 68

45. What are convolutional neural networks? 69

Solution 69

46. What are Restricted Boltzmann Machines, Deep Belief Networks and Recurrent networks? 70

Solution 70

47. What is pre-training? 71

Solution 71

48. An example of Deep Learning with nolearn and Lasagne package 72

Solution 72

Code 73

Outcome 73

Code 74

49. Can you compute an embedding with Word2Vec? 75

Solution 75

Code 76

Code 77

50. What are Radial Basis Networks? 77

Solution 77

Code 78

51. What are Splines? 78

Solution 78

Code 78

52. What are Self-Organized-Maps (SOMs)? 78

Solution 78

Code 79

53. What is Conjugate Gradient? 79

Solution 79

54. What is exploitation-exploration? And what is the armed bandit method? 80

Solution 80

55. What is Simulated Annealing? 81

Solution 81

Code 81

56. What is a Monte Carlo experiment? 81

Solution 81

Code 82

57. What is a Markov Chain? 83

Solution 83

58. What is Gibbs sampling? 83

Solution 83

Code 84

59. What is Locality Sensitive Hashing (LSH)? 84

Solution 84

Code 85

60. What is minHash? 85

Solution 85

Code 86

61. What are Bloom Filters? 86

Solution 86

Code 87

62. What is Count Min Sketches? 87

Solution 87

Code 87

63. How to build a news clustering system 88

Solution 88

64. What is A/B testing? 89

Solution 89

65. What is Natural Language Processing? 90

Solution 90

Code 90

Outcome 92

66. Where to go from here 92

Appendix A 95

67. Ultra-Quick introduction to Python 95

68. Ultra-Quick introduction to Probabilities 96

69. Ultra-Quick introduction to Matrices and Vectors 97

70. Ultra-Quick summary of metrics 98

Classification Metrics 98

Clustering Metrics 99

Scoring Metrics 99

Rank Correlation Metrics 99

Probability Metrics 100

Ranking Models 100

71. Comparison of different machine learning techniques 101

Linear regression 101

Logistic regression 101

Support Vector Machines 101

Clustering 102

Decision Trees, Random Forests, and GBTs 102

Associative Rules 102

Neural Networks and Deep Learning 103

## No comments:

## Post a Comment