- 3+ years of Data Engineering experience
- Experience with data modeling, warehousing and building ETL pipelines
- Bachelor's degree
- Knowledge of distributed systems as it pertains to data storage and computing
- Knowledge of professional software engineering & best practices for full software development life cycle, including coding standards, software architectures, code reviews, source control management, continuous deployments, testing, and operational excellence
Key job responsibilities
- Engage in collaborative efforts with cross-functional teams, including data scientists and business intelligence engineers, to architect a state-of-the-art data analytics platform on AWS, employing the AWS Cloud Development Kit (CDK).
- Construct resilient and scalable data pipelines using SQL/PySpark/Airflow to effectively ingest, process, and transform substantial data volumes from diverse sources into a structured format, ensuring data quality and integrity.
- Devise and implement an efficient, scalable data warehousing solution on AWS, utilizing appropriate NoSQL/SQL storage and database technologies for both structured and unstructured data.
- Automate ETL/ELT processes to streamline data integration from diverse sources, enhancing the platform's reliability and efficiency.
- Develop data models to support business intelligence, delivering actionable insights and interactive reports to end-users.
- Enable advanced analytics and machine learning capabilities within the platform, extracting predictive and prescriptive insights through tools like EMR/SageMaker Notebooks.
- Continuously monitor and optimize the performance of data pipelines, databases, and applications, ensuring low-latency data access for analytics and machine learning tasks.
- Implement robust security measures and ensure data compliance with internal requirements, industry standards, and regulations to safeguard sensitive information.
- Collaborate closely with data scientists and business intelligence engineers to comprehend their requirements and work together on data-related projects.
- Generate comprehensive technical documentation covering the platform's architecture, data models, and APIs, promoting knowledge sharing and ease of maintainability.
Embark on an exhilarating journey with the DISCO team at Amazon Music! We're seeking a dynamic Data Engineer to be a vital part of propelling our success story. In this role, you'll be instrumental in crafting extraordinary Analytics & Science infrastructure for DISCO teams. Join us in championing a culture of inclusivity and a data-driven mindset, where every team member at Amazon Music is empowered to make informed decisions and measure their impact. Be the driving force behind delivering top-notch, accessible data and democratizing data access through user-friendly self-service tools. Join us on this exciting quest to redefine how we approach insights and decision-making for all!
We are open to hiring candidates to work out of one of the following locations:
Bangalore, KA, IND
- Experience with AWS technologies like Redshift, S3, AWS Glue, EMR, Kinesis, FireHose, Lambda, and IAM roles and permissions
- Experience with non-relational databases / data stores (object storage, document or key-value stores, graph databases, column-family databases)
- Knowledge of batch and streaming data architectures like Kafka, Kinesis, Flink, Storm, Beam