Hello!

I am a second-year research master's student at the Robotics Institute in the CMU School of Computer Science. I am part of the M.S. in Robotics program, a fully-funded research master's, which is giving me the opportunity to continue my research on machine learning with Prof. Katia Sycara . My interests lie at the intersection of robotics, generative models, reinforcement learning, and computer vision. Prior to joining CMU, I double-majored in Computer Science and Electrical Engineering from BITS-Pilani, Goa, India, in 2020.

Prior to joining CMU, I completed my undergraduate thesis on hierarchical reinforcement learning under Prof. Shalabh Bhatnagar from IISc, Bangalore. I also worked on using reinforcement learning for dynamically enriching cluster trees under Prof. Bharat Deshpande from BITS-Pilani, Goa. I also developed hidden markov models for image segmentation during my internship at CEERI, Pilani, with Dr. Jagdish Lal Raheja. For more details, check out my CV or hit me up on my email.

Publications

Self-Supervised Multi-Agent Imitation Learning
Akshay Dharmavaram, Tejus Gupta, Jiachen Li, Katia Sycara

(Under Submission)

Arxiv (Pre-Print)

Hierarchical Average Reward Policy Gradient Algorithms
Akshay Dharmavaram, Mathew Riemer, Shalabh Bhatnagar

AAAI 2020 [Extended Abstract]

pdf

Research Experience

CMU Robotics Institute
Aug 2020 - Present

Graduate Research Assistant | Advisor: Prof. Katia Sycara

  • Master's Thesis: Generative Modeling using Self-Supervision
    • Self-Supervised Multi-Agent Imitation Learning (Under Submission)
      • Developed a novel Self-Supervised Imitation Learning loss that shows a 100x reduction in training epochs
      • Created a graph based Actor-Critic algorithm that reduces compounding errors by 10x
      • Designed a curriculum-training algorithm that reduces the mean \& std of the testing loss by 10x and 5x
    • Realistic Image Synthesis using Self-Supervision on Sketches (In Progress)
      • Designed a latent mapping architecture that interfaces user-drawn sketches and latent image representations
      • Created a self-supervised Discriminator loss to regress between latent representations and user-drawn sketches
    • Reconstruction using Self-Supervision on Partially-Observed Joint Angles (In Progress)
      • Developed a self-supervised algorithm that reproduces a robot's joint angles from partially observed input
      • Designed a self-attention based graph convolution operation that incorporates neighborhood into the inference
Indian Institute of Science, Bangalore
May 2019 - May 2020

Visiting Undergraduate Researcher | Advisor: Prof. Shalabh Bhatnagar

  • CS thesis: Convergence of Hierarchical Policy Gradients
    • Proved the convergence of hierarchical average reward policy gradient algorithms, which learn long term temporal abstractions for achieving the globally optimal sequence of rewards, using the ODE based approach
    • Increased the sample-efficiency by 2.8x by incorporating natural gradients using K-FAC
    • Created 3 specialized environments with "traps" to illustrate our framework's enhanced credit-assignment capabilities
    • Increased throughput by over 100x by parallelizing the sampling procedure over 128 processors
  • EEE thesis: Continuous Control using Continuous Option Spaces
    • Derived the deterministic hierarchical policy gradient for continuous action spaces and discrete option spaces
    • Developed a novel framework that alleviates the option-collapse issue by using continuous option spaces
Birla Institute of Technology and Science Pilani, Goa
May 2018 - May 2019

Undergraduate Research Assistant | Advisor: Prof. Bharat Deshpande

  • Formulated a novel mapping from the policy space of Reinforcement Learning (RL) algorithms to the space of hierarchical cluster trees in order to learn a clustering policy that can dynamically conform to accommodate an influx of new data points
  • Created a customized environment that can interface any RL algorithm with any clustering dataset
  • Used the DDPG algorithm to obtain for the first time the ground truth clustering strategy for an adapting synthetic dataset
  • 1st Project: Real-time human detection and distance regression
    • Reduced the time taken for humanoid detection by 16x by employing support vector machines, histogram of oriented gradients, and epipolar geometric principles.
  • 2nd Project: Image segmentation using Hidden Markov Models (HMM)
    • Employed Markov Chain Monte Carlo Algorithms for learning a Hidden Markov Model (HMM) for image segmentation
    • Pipelined the HMM into the principal image-stitching code-base, which was used for image stitching in VR