WORK

AWS Sagemaker DataWrangler

Brief:

Amazon SageMaker Data Wrangler is a tool designed to simplify data preparation and feature engineering for machine learning. It accelerates the process by offering functionalities to clean, explore, and visualize data from various sources.

I worked at Amazon during summer of 2019 and again 2020 until early 2023 when I chose to leave Amazon to explore startups and design-centric opportunites.

Team:

As a software engineer on the SageMaker Data Wrangler team, I collaborated with other software engineers on my team and other AWS teams. I also worked with non-technical stakeholders such as PMs and designers to build features that solve customer and team needs.

Work:

I worked on a number of features and enhancements.

  • SaaS Integration as Data Sources: Collaborated with multiple AWS teams to enable the aggregation of data from over 40 external SaaS applications, including Salesforce, SAP, and Google Analytics. This integration, powered by Amazon AppFlow, automates data cataloging in AWS Glue Data Catalog. It allows users to seamlessly access SaaS data within SageMaker Data Wrangler’s interface, streamlining the process of importing, transforming, and preparing data for machine learning. More details on this feature can be found here.

  • SQL Explorer Enhancement: Revamped Data Wrangler’s SQL Explorer by introducing features such as syntax highlighting, error handling, and interactive table views. These improvements have significantly enriched the data exploration and analysis experience for users.

  • CI/CD Pipeline Optimization: Addressed the bottleneck in the CI/CD pipeline by implementing parallelization in our test suite. This change reduced the run time from four hours to under half an hour, enhancing our development efficiency.

  • Test Suite Parallelization: In collaboration with a team member, we optimized our test suite by introducing parallelization, which led to a drastic reduction in run time from four hours to just under 30 minutes.

  • Onboarding Tool Development: Designed and developed tools to streamline the onboarding process for new team members, ensuring they could efficiently set up their workspaces and contribute to our projects.

Toolset:

  • FE: Typescript with React and several internal packages.
  • BE: Python with Spark and various internal packages.
  • Internal tools: bash and python