Uday Kumar

← Back to list

Registration: 02.04.2024

PostgreSQL SQL Git Python JavaScript

Skills

Python

Java

Spring Boot

Scala

Terraform

HTML5

CSS3

XML

Perl

SQL

MS SQL

MY SQL

Oracle

MongoDB

HBASE

Cassandra

Tableau

SSRS

Cloud Health

SDLC

Waterfall

Agile

Eclipse

Tomcat

NetBeans

JUnit

SVN

Log4j

SOAP UI

ANT

Maven

Alteryx

Visio

Jenkins

Jira

AWS

Azure

Work experience

Data Engineer

since 09.2023 - Till the present day |Sky IT Developers

PySpark, SQL, ETL, HDFS, Kafka, DevOps

● Worked on a clustered Hadoop for Windows Azure using HDInsight and Hortonworks Data Platform for Windows. ● Writing PySpark and spark Sql transformation in Azure Databricks to perform complex transformations. for business rule implementation. ● Involved in Big data requirement analysis, develop and design solutions for ETL and Business Intelligence platforms. ● Designed 3NF data models for ODS, OLTP systems and dimensional data models using Star and Snowflake Schemas ● Worked on Snowflake environment to remove redundancy and load real time data from various data sources into HDFS using Kafka. ● Developed data warehouse model in Snowflake for over 100 datasets. ● Implemented a fully operational production grade large scale data solution on Snowflake Data Warehouse. ● Proficiently employed IBM Streams for designing, developing, and maintaining scalable data streaming pipelines. ● Applied expertise in Apache Kafka, SPL (Streams Processing Language), and related technologies to meet business requirements. ● Worked with Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, PySpark, Impala, Tealeaf, Pair RDD's, DevOps, Spark YARN. ● Took proof of concept projects ideas from business, lead, developed and created production pipelines that deliver business value using Azure Data Factory. ● Create Spark Vectorized panda user defined functions for data manipulation and wrangling. ● Transfer data in logical stages from System of records to raw zone, refined zone and produce zone for easy translation and de normalization. ● Setting up Azure infrastructure like storage accounts, integration runtime, service principal id, app registrations to enable scalable and optimized utilization of business user analytical requirements in Azure. ● Configured Spark streaming to get ongoing information from Kafka and store the stream information to HDFS. ● Extracted and updated the data into HDFS using Sqoop import and export. ● Developed HIVE UDFs to incorporate external business logic into Hive script and developed join data set scripts using HIVE join operations. ● Involved in creating HDInsight cluster in Microsoft Azure Portal also created Event hub and Azure SQL Databases.

data engineer

03.2020 - 07.2022 |TCS

DATA ENGINEER

 Developed efficient MapReduce programs for filtering out unstructured data and developed multiple MapReduce jobs to perform data cleaning and pre-processing on Hortonworks.  Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in near real time and persist it to Cassandra.  Involved in designing and deploying multi-tier applications using all the AWS services like (EC2, Route53, S3, RDS, Dynamo DB, SNS, SQS, IAM) focusing on high-availability, fault tolerance, and auto-scaling in AWS Cloud Formation  Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and Aggregation on the fly to build the common learner data model and persists the data in HDFS.  Writing big query to get data wrangling for with help of data flow in gcp cloud.  Responsible for the development of Spark Cassandra connector to load data from flat file to Cassandra for analysis.  Implemented and optimized real-time data processing solutions using IBM Streaming technologies, including Apache Kafka and Streams Processing Language (SPL).  Designed and developed end-to-end data pipelines for seamless integration of diverse data sources, ensuring high-performance and meeting service level agreements (SLAs).  Worked with Data Analysts to understand Business logic and User Requirements.  Closely worked with cross functional Data warehouse members to import data into SQL Server and connected to SQL Server to prepare spreadsheets.  Created SQL queries to simplify migration progress reports and analyses.  Wrote SQL queries using joins, grouping, nested sub-queries, and aggregation depending on data needed from various relational customer databases.  Used Hortonworks Apache Falcon for data management and pipeline process in the Hadoop cluster.  Used impala to query the data into the publish layers where all the other teams or business users can access for faster processing.

Languages

EnglishUpper Intermediate