Q-learning
Q-learning is an interesting concept developed based on the principles of punishment and reward principle. This can be referred to as an extension of the temporal difference learning to make it more reliable by creating a state-action matrix and update the reward function. The basic objective would be to increase the total reward received over a period of time or based on immediate response. The recency factor similar to that in the temporal difference learning helps in assigning varying levels of importance to the futuristic rewards and past rewards.
The problem is with the large state space and action space that exists in reality. Creating a matrix for each possible state and action is a herculean task and might drive the system into sub-optimality if all the possible combination cannot be visited.
I am currently trying to make my state space into discrete sets of definable parameters and apply Q-learning to it. Hope i would be able to succeed at least to some extent in this endeavor.
The problem is with the large state space and action space that exists in reality. Creating a matrix for each possible state and action is a herculean task and might drive the system into sub-optimality if all the possible combination cannot be visited.
I am currently trying to make my state space into discrete sets of definable parameters and apply Q-learning to it. Hope i would be able to succeed at least to some extent in this endeavor.
Comments