This paper is published in Volume-7, Issue-4, 2021
Area
Computer Science
Author
Narilala Daniel Ranaivoson, Soloniaina Rakotomiraho, Rivo Randriamaroson
Org/Univ
ED-STII, Antananarivo, Madagascar, Madagascar
Pub. Date
30 August, 2021
Paper ID
V7I4-1884
Publisher
Keywords
Robot Modeling, Robot Manipulator, ROS, Reinforcement Learning, Trust Region Policy Optimization, Trust Region Policy Optimization, Actor-Critic using Kronecker-Factored Trust Region, Robot Control Reinforcement Learning, Robot-Reinforcement Learning Interface, Training, Use Case, Robot Environment, Task Environment, Reward Curve, Joint Values Curve, Error Curve.

Citationsacebook

IEEE
Narilala Daniel Ranaivoson, Soloniaina Rakotomiraho, Rivo Randriamaroson. Training of a robot manipulator model by three reinforcement learning algorithms (TRPO, PPO2, and ACKTR), use cases, and comparative studies, International Journal of Advance Research, Ideas and Innovations in Technology, www.IJARIIT.com.

APA
Narilala Daniel Ranaivoson, Soloniaina Rakotomiraho, Rivo Randriamaroson (2021). Training of a robot manipulator model by three reinforcement learning algorithms (TRPO, PPO2, and ACKTR), use cases, and comparative studies. International Journal of Advance Research, Ideas and Innovations in Technology, 7(4) www.IJARIIT.com.

MLA
Narilala Daniel Ranaivoson, Soloniaina Rakotomiraho, Rivo Randriamaroson. "Training of a robot manipulator model by three reinforcement learning algorithms (TRPO, PPO2, and ACKTR), use cases, and comparative studies." International Journal of Advance Research, Ideas and Innovations in Technology 7.4 (2021). www.IJARIIT.com.

Abstract

A model of the manipulator robot is available. It presents properties necessary for its integration in a reinforcement learning loop. This is the subject of the article [1]. Three reinforcement learning algorithms are used, TRPO, PPO2, and ACKTR. The first two algorithms learn well, but for the third one, the reward curve fails to find an increasing variation. Rather an erratic variation is observed. Experiments or use cases of the policy functions in controlling the robot were carried out. It is ok for the first two. The robot moves and reaches the targets presented after some iteration. For ACKTR there is no complete convergence but an oscillation around a fixed position.