- Understanding business case
- Data assessment
- Capacity planning to meet the end-goals
- Strategies to improve the time to market
- Understanding how data processes work and define data storage practices
- Data integration with the current solutions
- Configuring Hadoop clusters
- Big Data Application Integration Services
- Integration with existing Enterprise Data Warehouse and Data Sources
- Building real time data pipeline
- Migration from Relational DB to NoSQL
- Collect and process gigantic heaps of complex data
- Trend Analysis, Pattern Identification, Payroll and Billing Systems, Weather Forecasting
- Real-time data processing by deploying the collected business data to drive insights
- Stock Market Analytics, Real-time taxi booking
- Serverless ETL Process Definition
- Make scaling easier by processing high-velocity, high-volume transactions, and handle events more efficiently in a faster time frame
- Orchestration using Airflow
- Machine Learning/Data Mining
- Data Modelling
- Predictive and Prescriptive Analytics
- Analytics Optimization
- Analytics, Dash-boarding & Alerting
- Informed decision making by leveraging BI
- Intelligent and personalized insights about gaps and opportunities for improvement in processes
Technology Stack
Data Engineering
- Spark
- HDFS
- HIVE
- HBASE
- Presto
- Scribe
- Pig
- MapReduce
- Sqoop
- Flume
- Talend
- Informatica
- Pentaho
- Airflow
- Apache Kafka
Data Warehouses
- Impala
- HP Vertica
- AWS Redshift
- Azure SQL DW
Data Modelling Tools
- ERwin
- ER/Studio
- OmniGraffle
- IDEF1X
- DDL
- DML
- UML
Data Analytics
- Tableau
- Power BI
- Spotfire
- SSRS
- MicroStrategy
- Looker
- Filebeat
- Grafana
- AWS Quicksight
- Azure Stream Analytics
- Data Lake Analytics
- Azure Analysis Services
Monitoring Tools
- Icinga
- Nagios
- Ganglia
- Graphite
- Prometheus
NoSQL DB
- MongoDB
- Cassandra
Development Languages
- Python
- Scala
- Java
- R
Log Management
- Splunk
- Logstash