As highlighted in the previous post, in my 15-year long career, I've enjoyed enabling and supporting data scientists and analysts. For the first nine years or so, I was a business analyst trying to understand the problems and pain-points in financial operations and envisioning data-driven solutions to enhance them. Now, I'd like to channel my experience and interests to provide business and technology advisory as well as practical design and development services around Data Architecture, Management, and Governance.
Broadly, this would entail understanding the process and depicting it, often as a visual aid like the process map, brainstorming it with the analysts to identify and enumerate their problems. These were then prioritised and added to the roadmap. Items from the roadmap were then picked up and for understanding further and working with the tech team to devise a solution. Typically, as most of us follow an agile or incremental methodology, this will go on the product backlog, and is then scheduled to be added to the sprint backlog, then it is coded, tested, verified by the user and released during the sprint.
The typical traceability is as follows:
User Story ➔ Use Case ➔ Functional Requirement ➔ Technical Specification ➔ Programmed Artefact ➔ Test Case ➔ Final Product Feature
This is the process I still follow, but after my Masters, I got more into designing data-intensive applications (I borrowed that phrase from the title of a must-read book for anyone who works with data) and making sure data is available to data scientists, analysts and users at the right time (e.g. real-time, batch), right place (e.g. cloud, on-prem), and in the right form (e.g. tables, files). To this end, I build and maintain data warehouses and information retrieval applications, most recently with Greenplum, a massively parallel processing system based on PostGres, business intelligence pipelines, and implement what I call the PQRS framework of Data Governance - Privacy, Quality, Regulatory-compliance, Security.
In terms of technology, I've worked with:
- Data Warehousing: Data Vault, Star Schema
- Databases: Greenplum (MPP), PostGresQL, Oracle, MS SQL Server, Sybase, Neo4j (graphs), MongoDB (documents)
- Languages: SQL, Python
- ETL: Informatica, Airflow
- Big Data stack: Hadoop, Hive, Spark, Containers, Sqoop, Kafka
- Environments: Cloud, On-Prem, Hybrid
- Enterprise Architecture: TOGAF, SOA
- Enterprise Integration Patterns: MFT, MQ, PubSub, SOAP, REST
- Project Management: SDLC (agile, incremental, waterfall)
- BI & Reporting Tools: COGNOS, Qlikview
- Business Analysis: Requirements Engineering
- Business Process / Data / UML Modelling
- IT Security controls: Access controls, encryption, DLP
Please reach me at email@example.com to discuss your Data Management or Wrangling needs.