44F1A130 – Rui Qin – BigData/Hadoop Developer

Resume posted by Glady in BigData Developer.
Desired Rate: $70.00/hr
Desired position type: C2C
Current Location: Plano Texas, United States

gcorreya@compunnel.com
Tel:
609-779-1361
Mobile:

Summary

• Professional IT work experience in developing Big Data applications for clients in multiple industries. Comprehensive experience in working with Apache Hadoop Eco-components.
• Hands on experience in Big Data Ecosystem with Hadoop, HDFS, MapReduce, Hive, HBase, and Sqoop
• Experienced in Yarn tuning and dynamic resource allocation
• Strong proficiency in implementing and optimizing MapReduce programs to support big data ETL procedures
• Experience in analyzing and cleansing raw data using HiveQL
• Proficient in writing and optimizing HiveQL queries to achieve data manipulation
• Experience handling Spark Dataframe using pySpark
• Experience with large batch data processing with Hive using Spark scala and Spark SQL
• Working experience with Database including MySQL 5.x, PostgreSQL 9.x, HBase 0.98+, MongoDB 2.4
• Implemented MLib functions for training and building fully loaded classifiers models using Spark streaming, Spark SQL and Machine Learning APIs.
• Expertise in analyzing the data using Hive and writing custom UDF’s in Java for extended Hive core functionality.
• Deep working knowledge of functional programing, MVC design patterns, Spring MVC, Java mock framework, Quality Assurance
• Experience in descriptive and predictive analysis with Microsoft Excel, Tableau and Python using libraries like Pandas, Matplotlib, Scipy and Seaborn
• Experience working with AWS S3 and AWS EC2, AWS EMR for data storage and cloud computing, as well as Redshift for data warehouse
• Selected appropriate AWS services to design and deploy an application based on given requirements.
• Experience using pySpark and MLlib for classification and regression analysis
• Worked on ingesting, reconciling, compacting and purging base table and incremental table data using Hive and HBase and job scheduling through Oozie
• Solid understanding of predictive analysis, statistical analysis and modeling and regression
• Experience in Agile, Waterfall, TDD (Test Drive Development) methodologies
• Good experience working on Tableau and Spotfire and enabled the JDBC/ODBC data connectivity from those to Hive tables
• Practical experience in implementing multi-threading and concurrency framework
• Self-driven goal getter, excellent communication skills in collaborative team and have motivations to take independent responsibility.

TECHNICAL SKILLS:

Hadoop Ecosystem \
Cloud Platform\ Hadoop 2. *, Spark 2.4+, MapReduce 2.0,\AWS, Heroku, Sina App\ Hive 0.13+, Sqoop 1.4.5, Flume 1.4+\Kafka 0.8.2+, Hbase, Yarn
Database\IDE\HBase 0.98, MySQL 5.x, MongoDB 2.4+\visual studio, Sublime, Atom, Webstorm\
Eclipse, PyCharm, Notepad++\
Data Analysis & Viz \Version Control\ Matlab, Tableau, MS Excel, Matplotlib, D3.js\
GIT\Scipy, Scikt-learn, Pandas, Seaborn\

Education

• MS in Mathematical Finance, Lehigh University, 2018
• Bachelor of Science, Huazhong University of Science and Technology, 2016

Experience

Blue Cross and Blue Shield, Richardson, TX October 2018 to March 2020
Client: Health Care Service Corporation
Role: Hadoop Developer
Project: Finance/Actuarial Data Solutions Consumption Team

Blue Cross Blue Shield Association (BCBSA) is a federation of 36 separate United States health insurance companies that provide health insurance in the United States to more than 106 million people. This project is mainly done by BCBSIL team in Chicago and BCBSTX team in Richardson.

Description:
The Finance/Actuarial Data Solutions project serves as part of the FSD 20/20 Program for the company. FSD 20/20 is a multi-year program to ensure HCSC has accurate, timely and standardized financial data. Data capabilities developed in this project will enable an easily comprehensive and an improved user experience through consistency and simplicity across data architectural components.
The development of the unified data repository with increased granularity, availability and data quality will support improved and quicker business decisions.

Responsibilities:
• Hands on experience building data pipeline for different for formats/delimiters
• Migrated the data from MySQL in to HDFS using Sqoop and importing various formats of flat files into HDFS
• Developed Spark jobs and Hive Jobs to summarize and transform data
• Hands on experience optimizing Hive queries for best performance and efficiency
• Solved small file issues using compaction through Hive
• Worked on HIVE for exposing data for further analysis and for generating transforming files from different analytical formats to orc files.
• Analyzed the large amount of data sets to determine optimal way to aggregate and report on it
• Handled importing of data from various data sources, performance transformation using HIVE
• Managed data de-identification daily activity for the team
• Performed data validation to support ETL procedures when data is moved between layers of the data lake
• Performed thorough testing in Functional and UAT environments for multiple releases
• Conducted In-depth analysis of data quality/data quality threshold and solved data quality issues
• Worked with business analysts to implement designed join logics to HQL for creating tables for data storage
• Used Scala developed spark code and Spark-SQL for faster processing of data.
• Implemented ASG Zena for automated workflow and job scheduling
• Used Jenkins for CI/CD
• Worked on Python Scripts to allow for process automation
• Involved in some product business and functional requirement through gathering team, update the user comments in JIRA and documentation in Sharepoint
• Validated financial accounting balance records for daily/monthly data extraction through Infogix
• Collaborated with business stakeholders for putting data extraction into business production and send to end users.

Environment:
Java 7, Hadoop 2.4.0, Linux, HBase, Unix Shell, Python 2.7, Sqoop 1.4.5, Hive 0.13, Talend, Teradata, Infogix, Zena, MapReduce, Git, Zena, Jenkins, JIRA, QTest, Teradata SQL Assistant, Talend, HP ALM, Agile/Scrum, Spark 2.2, Scala

Marlabs Inc, Piscataway, NJ May 2018 to October 2018
Client: Travel Click
Role: Big Data Developer
Project: Hotel Booking Recommendation Engine
TravelClick offers innovative, cloud-based and data-driven solutions for hotels around the globe. Headquartered in New York, TravelClick operates in 176 countries, with local experts in 39 countries and 14 offices.

Description:
The project mainly focused on building a recommendation engine for the hotel booking system. With this recommendation engine, the website filters the data using different algorithms and recommends the most relevant hotel to users. It first captures the past behavior of a customer/customers from specific regions/travel purposes and based on all the factors, recommends hotels which the users might be likely to book.

Responsibilities:
• Involved in designing and developing Hadoop MapReduce jobs Using Java Runtime Environment for the batch processing to search and match the scores.
• Performed data analytics in Hive and then exported this metrics back to Relational Database using Sqoop.
• Hands on experience on AWS platform with EC2, S3 & EMR.
• Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS.
• Created data partitions on large data sets in S3 and DDL on partitioned data.
• Involved in developing Hadoop MapReduce jobs for merging and appending data.
• Involved in Data model sessions to develop models for HIVE tables.
• Used Numpy and Pandas for building the collaborative filtering model
• Used JUnit and MRUnit for unit testing

Environment:
Java, Python, Numpy, Pandas, Hadoop, Sqoop, Hive, Git, MRUnit, MapReduce, Eclipse, AWS

Project: Data Repository for Anti Financial Crime Purposes Oct 2017 to May 2018
Bethlehem, PA
Role: Hadoop Developer

Description:
This project involves setting up a data repository and use it for search processing for analytical and research purposes to prevent financial crimes and help with regulatory report process. This application provides the capability for both real-time and large batch processing using Hadoop map reduce jobs/Spark and JRE, as well as real time search capabilities.
Responsibilities:
• Worked on administrating and tuning MapR Converged Data Platform with Agile development methodology.
• Implemented and supported in big data ETL procedure using Java, Kafka, Spark Streaming and Big Data APIs.
• Hands on experience in setting up HBase Column based storage repository for archiving and retro data.
• Utilized Oozie workflow to run Hive Jobs Extracted files through Sqoop and placed in HDFS and processed.
• Worked on developing Hadoop MapReduce jobs for merging and appending the repository data.
• Handled importing of data from various data sources, performance transformation using Hive.
• Created multiple Hive tables with partitioning and bucketing for efficient data access
• Developed and optimized HiveQL to perform data enrichment, transformation and wrangling
• Used Sqoop to ingest data from various data sources to MapR-DB NoSQL database/ HBase
• Worked on Hive for exposing data for further analysis and for generating transforming files from different analytical formats to text files.
• Worked with analytics team to build statistical model with MLlib and Python/Spark
• Used JUnit and MRUnit for unit testing.

Environment:
Java 7, Pig 0.10.1,Hadoop 2.3, Kafka 0.9, Spark 1.6.0,Linux, Unix Shell, Scala 2.10,Sqoop 1.4.5, Hive 0.13, HBase 0.98, Impala,, Lucene, JUnit, Oozie, MapReduce, MRUnit, Git,JDBC

Project: Recommendation engine based on user behavior Nov 2016 to Sep 2017
Bethlehem, PA
Role: Hadoop Developer

Description:
The scope of the project is to use the customer transaction data in store and online over a period of time to recommend items to customer, that they will find engaging depending on their age, gender, shopping habits. Another concept is a recommendation engine that introduces new products to a customer for which they might not came across before. In order to achieve these goals, we perform data exploration to learn about user behavior and demographics. We perform future engineering to create new features from existing features that truly reflects the signals in the data and avoid noise.

Responsibilities:
• Developed and implemented API services using Scala in spark
• Extensively implemented POC’s on migrating to Spark-Streaming to process the live data.
• Hands on experience working on in-memory based application Apache Spark for ETL transformations
• Used Flume to transfer log source files to HDFS
• Performed data transformation, cleaning and filtering using Hive
• Setup Oozie work flow /sub work flow jobs for Hive/Sqoop/HDFS actions.
• Used Spark SQL and Spark Streaming for data streaming and analysis

Environment:
Java 7, Pig, CDH 5.7, Hadoop 2.4.0, Linux, Unix Shell, Python 2.7, Flume 1.4, Sqoop 1.4.5, Kafka 0.8.2, Hive 0.13, Tableau, Oozie, MapReduce, Git

Project: Medical School Official Website System July 2015 to Aug 2016
Chengdu, China
Role: Software Developer

Description:
The work includes creating new extensible architecture for the official website system, implementing internal management tools and supporting customized services for patients, students and instructors such as appointment system for patients and resource sharing system.

Responsibilities:
• Worked on Struts 2 Framework with Agile methodology
• Design and developed customized services using HTML, JSP, Java Script, CSS
• Integrated Spring DAO for data access using Hibernate
• Using HQL/SQL for performing queries in MySQL databases
• Involved in implementing the user interface of official website with HTML, CSS, Javascript
• Wrote unit test with JUnit

Environment:
Java Struts 2.0, MySQL, JSP, HTML, CSS, JavaScript, MySQL, JDBC, JUnit and Agile

Skills

  • Apache Spark, Api, Css, D3.js, Data Analytics, Data Model, Data Quality, Data Sources, Data Transformation, Data Validation, Databases, DE-Identification, Eclipse, Etl, Git, Hadoop Mapreduce, Hdfs, Hibernate, Html, Impala, Java Runtime Environment, Javascript, Jdbc, Jira, Job Scheduling, Jre, Jsp, Lucene, Map Reduce, Matplotlib, Pandas, Real Time, ReaL-Time, Recommendation Engine, Relational Database, Sharepoint, Tableau, Transaction Data, Uat, Unit Test, Unit Testing, Unix Shell, User Interface, Version Control, Visual Studio, Work Flow, Workflow, Zookeeper, Information Technology

Specialties

    Apache Spark, Api, CSS, D3.js, Data Analytics, Data Model, Data Quality, Data Sources, Data Transformation, Data Validation, Databases, DE-Identification, Eclipse, ETL, GIT, Hadoop Mapreduce, Hdfs, Hibernate, Html, Impala, Information Technology, Java Runtime Environment, JavaScript, JDBC, JIRA, Job Scheduling, Jre, JSP, Lucene, Map Reduce, Matplotlib, Pandas, ReaL-Time, Recommendation Engine, Relational Database, Sharepoint, Tableau, Transaction Data, Uat, Unit Test, Unit Testing, Unix Shell, User Interface, Version Control, Visual Studio, Work Flow, Workflow, Zookeeper

Groups & Associations

    H1B

37 total views, 1 today