Articles
BY OUR STUDENT CONTRIBUTORS
Nitya Nigam Machine learning is a buzzword that has been plastered across the media for the last few years. With applications in any field you can think of, it has been touted as a technology that can revolutionalise the way we work and think. To a large extent, this is true: machine learning is used in everything from Covid-19 infection modelling to giving you personalised Netflix recommendations. Many people are under the impression that the way machine learning operates is too complicated for the average Joe to understand. However, the underlying principles of machine learning are actually quite straightforward. In this article, I will outline the intuition underpinning machine learning, and explain some of the maths required to make it work. All concepts that may be new to our readers are linked to an external resource to help build understanding.
In essence, machine learning is a computerised method of finding models for data. Machine learning algorithms work by:
In step 1, a polynomial function is generated to model the data. Although not all situations can be modelled perfectly by polynomial functions, exponential, logarithmic and trigonometric functions can all be represented as polynomial functions through their Taylor series representations. Essentially, through a funky bit of calculus (3Blue1Brown does a nice explainer of this on his YouTube channel), we can show that these functions are the infinite sums of polynomial series. Obviously, we can’t use an infinite number of polynomial terms in our function, but we can use a large amount, and this is often more than enough to properly model data within a given domain. The coefficients of the polynomial terms, or parameters, are stored in vectors, with one vector for each of the input variables. These parameter vectors will be what we change in order to improve the model, so keep them in mind. Step 2 makes use of something called a cost function. This measures the average distance between each value in the test data and the value predicted by the model. Usually, the measure used is the variance, which is a statistical measure of dispersion. The reason this is used, as opposed to the difference, is because it is squared, so values that are too high and too low both increase the variance, rather than cancelling each other out. The process of computing the variance of the data is called the cost function, and the output of the cost function for a given set of model parameters is a scalar value. When the cost function is high, this means the model does not fit the data well, so steps must be taken to increase the accuracy of the model and decrease the cost function. Step 3 is where the magic really happens. Through a bit of differential calculus, we find the gradient (also known as the slope, derivative or rate of change) of the cost function, which is a function of the model’s parameter vectors. This means that as the parameters change, the cost function will also change. When the gradient is negative, this means the cost function is decreasing, so taking a step in that direction would bring us to a lower cost and a more accurate model. Taking small steps in the direction of the negative gradient is called gradient descent (the method is explained further in this video), which eventually lands us in a spot where the gradient is zero, and is a minimum of the function. The values of the parameter vectors at this point should be optimised for the data we are trying to model. This is just an overview of what constitutes machine learning - there are several issues not covered here, such as how to avoid local minima and optimising step size, but I hope this serves as a helpful introduction to the topic. Let me know if you want to learn more about maths in machine learning in the comments!
0 Comments
Leave a Reply. |
Our AuthorsWe are high school and college students from around the world who are passionate about maths, and want to share that passion with others. Categories
All
|