Q-Learning

Considering how last time’s post was about how algorithms “learn”, let’s look at another way in which AI can advance – “Q-learning”. Q-Learning takes a different approach to teaching AI as compared to neural networks. Instead of just multiple rounds of trial and error wherein the AI learns what’s wrong, the AI is instead rewarded for doing good, and punished for doing bad. Try to imagine it like this :

Imagine the AI is a pig, that you want to teach how to stay in one place. In a neural network, first you would obtain a pig.

Related image
Pig obtained.

Then, you would wait to see what it did. Ideally, you would bring in more than just one pig.

Related image
They are coming.

Now, you would wait. If a pig does something good, you would then let that pig make lots and lots of little baby pigs, and those pigs would do the same. If the pig did something bad, you would get rid of it, and get another.

Image result for image not found
Good thing, too.

This tries to emulate the ideas of “natural selection”. Good behaved pigs go on to make others, and bad behaved pigs do not. Eventually, all the pigs then know what to do. However, this takes many versions of the pig, and makes a lot of excess bacon. In Q-learning, the pig would instead be given a reward every time it stayed for a long enough time, and punished if it did not. As a result, we get something a lot closer to how human children learn, with the principle of ‘Negative Reinforcement‘.

Good Piggy!

Like our pig friends, the AI would be put on an already created playing field where it wouldn’t know what to do. It would then slowly try to move around. If it did anything bad, it would get a punishment in the form of a low number, and if it did something good, then it would get a reward as a high number. The AI would keep trying out different things, and in the end, the AI would follow all the steps that give the best reward and thus learn to do whatever you want it to.

Q-Learning comes from the “Q-function”, or “quality function”. The AI would use this Q function for every task it does. The Q function is essentially modelled like this:

Q[s(state),a(action)]

Here, the function would consider the current state of the AI. It would then consider the action that the AI is about to take. The function would try to realize the immediate reward it would get for doing the current action, and all the future rewards that the current action would help it get later (the function isn’t “greedy”, it doesn’t try to just look at the immediate reward, it considers the future also). So, while actually working, the process is as follows :

π(s) = argmaxa(Q[s,a])

The ‘π(s)’ part represents the “policy” for state ‘s’, or the action we take in state ‘s’. The equation here tries to test all the possible actions we can take in state ‘s’, and then find the one that gives the maximum reward. A table is then made for all the possible rewards that the AI can get, and then the table is constantly updated as the AI performs more and more actions. The AI then does that over and over until there is a clear picture of what it should do. Finally, the AI can follow the path with the highest reward as listed in the table, and do whatever needs to be done.

Image result for qlearning

For example, take a look at this explanation of the value functions and a simple approach to eating using Q-learning. Also, check out this video by Siraj Raval for a great explanation about how Q-learning works, and this video by Code Bullet again to see a cool car drive around a track with Q-learning. Well, that’s pretty much it for now. Next week, we’ll look at some other stuff with different AI algorithms. Until then, good luck.

Resources : freecodecamp.org

Neural Networks

With AIs developing as fast as they are, today let’s look at how a well-made AI can make a decision. In the end, the goal of the AI is to take an input, run whatever process it wants on the data and use the data to make a decision. Neural networks try to emulate actual thinking in the computers (Very Terminator-esque, right?) and can be used to play games or even predict illnesses in hospitals. To do this, the network tries to learn about whatever task by taking in as much input as possible. This input (Whatever you want to tally, from age, to SAT scores, to handwriting) becomes a ‘node’ in the network. Try to imagine actual human neurons. Meet Nathaniel.

Now Nathaniel is a special guy. He’s got a lot of friends who like to tell him stuff. Whenever he gets a signal from some of the other neurons, and the signal is strong enough, he passes it on to his own friends.

These are Nathaniel’s friends. They like to keep in touch.

In the brain, the connections in the neurons can be weaker or stronger than others, and the stronger the connection between any two neurons, the stronger the signal is passed between them. Similarly, neural networks “pass on” strong data and values, until they arrive at a result that the user likes.

So let’s visualize a fully developed network. In essence, if you’ve ever tried to make a family tree for yourself, imagine that tree, but on its side. Basically, it is layer after layer of nodes that each connect to the next ‘layer’ and so on, until at the end, the network can have output.

When the network gets any input in the input layer (These are the values that we know), it can use them in value functions, called “Activation Functions”, to try and get a value for the input, which is then passed on through the rest of the layers. The function can be either really simple (Kind of a bad AI if it is though…) or extremely complicated. Each node gets a value, and uses the function to get a number. It then passes the number to each connected node, and on the way, it is multiplied by the ‘weight’ of the connection, or the strength of that particular connection.

So, now that we have a design in mind, we can talk about how the network runs. The network basically works backwards, and tries to figure out what it got wrong. The network checks to see what connection made the output wrong, and then tries to avoid that the next time it runs. As a result, it may take multiple runs for the desired output. There also exist multiple types of neural networks. One kind is the “Feed Forward Neural Network” shown here, passing each value through the rest of the network. However, other types of networks, called “Recurrent Neural Networks” pass the data back into themselves. With Recurrent Networks, you can run programs to make sentences or even write books! (The previous word used matters a lot in a sentence, as it decides what the next word will be).

Well that’s pretty much a general summary of networks. If you want, check out Brandon Rohrer’s video explaining Deep Neural Networks. Also check out Code Bullet on youtube to see flash games made easy with AIs. Neural Networks make it so that the computer can make decisions for itself. As a result, we get swarms of numbers and values that would be too big for humans, which the computer deals with by itself. So let’s see where Neural Networks take us in the future. Until then, good luck.

Resources : towardsdatascience.com, askabiologist.asu.edu