Alexander Günsche

Managing a Data Lake on AWS by Alexander Günsche

Transactional infrastructure is not suited for processing large amounts of data for analytics. In this talk, participants will learn about data architecture fundamentals and get deep insights into building an enterprise-grade data lake with a business intelligence frontend on AWS, using AWS analytics services such as Glue, Athena, Lake Formation, Kinesis and QuickSight.

(While the presentation is based on AWS, the fundamental concepts are transferable to other environments.)

Talk Questions

      
  • Question 785
    How can we perform data drift detection using AWS?
  • Question 782
    What is the difference between data lake and data warehouse?
  • Question 783
    How can we scale, monitor usage and control costs on data lakes?
  • Question 784
    What is the reason for using S3 as a database? Isn't it insecure and prone to data duplication? Couldn't you read directly from Snowflake?
  • Question 786
    In case we need to generate new datalakes similar in format and volume to source environments but anonymized (for example, to perform load testing in new environments), do you consider using Glue transformations an appropriate method for achieving data anonymization?
  • Question 787
    how do "row-based" and "column-based" permission affect the speed of the process?
  • Question 788
    I do not understand why you say that in rest we cannot make ranges.. actually cannyou clarify why sql as api is the best option?? Thanks
  • Question 789
    If you have data at the country-brand level, could you provide anomaly detection granularity for each country-brand pair using QuickSight? Could it connect directly to a database?
  • Question 790
    How much cost to run the DIY example?