Big Data Demystified
Big Data Demystified
  • 108
  • 24 950
Stream processing with Redpanda and Apache Flink [English]
Join us for a comprehensive discussion on stream processing with Redpanda and Apache Flink. This talk is designed for those who have some working knowledge of Kafka/Redpanda and are looking to learn more about stream processing.
The first part of our talk will focus on introducing stream processing with Flink. We begin by discussing stateless operations in Flink. Then we dive into how Flink handles the state and time concepts, covering a range of topics including stateful aggregations, window processing, and joins. Furthermore, we will explore some advanced topics, such as fault tolerance, state checkpointing, and event time processing, shedding light on its importance in stream processing.
The second half of the talk transitions from theory to practice, and will be more hands-on. We will guide you through the process of building a Flink application in Java. This segment is designed to provide a practical understanding of how to write, build, and deploy an application to a Flink cluster. We will demonstrate how this application is set up to ingest data from a Kafka topic, deploy it to a Flink cluster, and monitor how it operates under various conditions.
This talk is an excellent opportunity to enhance your understanding of these powerful tools. It will equip you with the knowledge and skills needed to build robust stream processing applications with Redpanda and Apache Flink.
Lecturer: Dunith Danushka
My short biography:
Dunith avidly enjoys designing, building, and operating large-scale real-time, event-driven architectures. He's got 10+ years of doing so and loves to share his learnings through blogging, videos, and public speaking.
Dunith works at Redpanda as a Senior Developer Advocate, where he spends much time educating developers about building event-driven applications with Redpanda.
big-data-demystified.ninja/2024/07/29/stream-processing-with-redpanda-and-apache-flink/
มุมมอง: 60

วีดีโอ

Filtering and Denormalizing Data in NoSQL Databases [English]
มุมมอง 140หลายเดือนก่อน
This talk covers the different options for denormalizing data in Wide-column NoSQL Databases. We explore Materialized Views, Indexing, Allow Filtering, discuss the pros and cons, and when to choose each option. Lecturer: Guy Shtub, Head of Training at ScyllaDB big-data-demystified.ninja/2024/07/24/filtering-and-denormalizing-data-in-nosql-databases/
Building a management layer to your data lake for structured/unstructured data
มุมมอง 532 หลายเดือนก่อน
Intro: The challenges in managing a data lake for structured and unstructured data. Achieving manageability: 1. The components of the architecture and their role Opentable formats Catalogs Data Version control systems 2. How it all fits together Example using Databricks technologies Example using Apache Iceberg Example using AWS technologies 3. Discussion Language: English About the lecturer: E...
NoSQL Core Concepts [English]
มุมมอง 403 หลายเดือนก่อน
Description: Learn the basics of NoSQL using ScyllaDB. Assess if it’s a good fit for your use case, and see what’s involved in spinning up your first cluster and running some basic queries. Covers an Intro to NoSQL, Basic Concepts and Architecture, a Hands-on Demo, and Basic Data Modeling. Lecturer: Guy Shtub, Head of Training at ScyllaDB About the lecturer: Guy Shtub is Head of Training at Scy...
Advanced Data Modeling in NoSQL Databases [English]
มุมมอง 654 หลายเดือนก่อน
Lecturer: Guy Shtub, Head of Training at ScyllaDB Language: English Description: Intermediate to advanced topics in NoSQL data modeling. Topics include: Choosing a Primary Key Collections, Counters, UDT and TTL Large Partitions and Collections Hot Partitions cardinality and Tombstones Common pitfalls and how to diagnose About the lecturer: Guy Shtub is Head of Training at ScyllaDB and holds a B...
Fixing small files performance issues in Apache Spark, using DataFlint [English]
มุมมอง 2004 หลายเดือนก่อน
One of the big challenges in big data is interacting with the storage layer, especially in the data lake where we are the one who manages the files and partitions. One of the most common performance problems in data lakes is working with small files. In this lecture we will learn about: * Why it's important to read and write files in best-practice size * How Apache Spark under the hood interact...
Embrace the Unchanging: The Counterintuitive Way to Thrive in a World in Chaos
มุมมอง 235 หลายเดือนก่อน
Target audience Creative problem-solvers who struggle with ill-defined problems Startup executives who are responsible for creating value and communicating it Curious minds who wish to learn how to see the world anew Lecturer: Omer Yarkowich ⵙ Positive Constraint What will the audience learn? “It’s the first time in human history that nobody has any idea how the world will look like in 20 years...
NoSQL Basics and Getting Started
มุมมอง 1847 หลายเดือนก่อน
Learn about NoSQL vs. SQL, basic architecture for high availability and performance, NoSQL data modeling, and how to get started with running basic queries. About the lecturer: Guy Shtub is Head of Training at ScyllaDB and holds a B.SC. degree in Software Engineering from Ben Gurion University. He co-founded two start-ups and is experienced in creating products that people love. big-data-demyst...
Why you should build an LLM benchmark [English]
มุมมอง 1.9K7 หลายเดือนก่อน
📊 Dive Deep into the World of LLM Benchmarks! 📊 Objective: By the end of this session, you should have a good understanding of how to select and maintain your own LLM benchmark. Agenda: 🔬 Demo! 🔍Discover what ARC, HellSwag, and MMLU are exactly 🧫 Learn how to select the right benchmark 🧪 Methods to test LLMs tailored to your unique use case 🧱 Q&A Speaker: J. Yarkoni ex-Google AI/ML Specialist (...
Innovative NLP Breakthroughs in Hebrew and Arabic at IAHLT [English]
มุมมอง 1418 หลายเดือนก่อน
In this presentation, we introduce IAHLT, detailing our organization's motivations, business model, and focus on linguistic annotation for NLP applications. We'll present our team and methodology, explaining how we manage multiple projects and handle large datasets. The journey of a document through various linguistic annotation pipelines, ensuring data integrity, will be illustrated. We'll hig...
Semantic Layer vs. Metric Layer in Business Intelligence [English]
มุมมอง 9709 หลายเดือนก่อน
Abstract Metrics are an emerging type of BI artifact that packages data into a format optimized for consumption by non-technical business users. Over the last few years two types of technologies, the semantic layer and metric layer, have arisen independently. They both support the creation of metrics. In this talk, you will learn: 👉 Which needs do metric and semantic layers address - for data t...
My First Petabyte Scale Architecture- Part4 [English]
มุมมอง 4111 หลายเดือนก่อน
In this lecture, I will teach you the principles of how to build a big data architecture that stores and processes 1 PB worth of data using different technologies (Redshift, Databricks, BigQuery, SQream). What will you learn? 1. Challenges with such a big data scale. 2. Consideration set of the use case. 3. The architecture of each technology, relevant features for the use case, and pro/cons an...
Software 3.0: Software Development in the AI era [English]
มุมมอง 12511 หลายเดือนก่อน
The landscape of software development is evolving faster than ever before. Join us as we explore the future of software development in the era of AI coding assistants. In this talk, we'll discover why technical specifications are gaining increasing relevance in 2023 and how AI agents are reshaping the software development life cycle. Language: English Speaker: Dan Leshem, CTO @ Fine. A seasoned...
My First Petabyte Scale Architecture- Part3 [English]
มุมมอง 4111 หลายเดือนก่อน
In this lecture, I will teach you the principles of how to build a big data architecture that stores and processes 1 PB worth of data using different technologies (Redshift, Databricks, BigQuery, SQream). What will you learn? 1. Challenges with such a big data scale. 2. Consideration set of the use case. 3. The architecture of each technology, relevant features for the use case, and pro/cons an...
My First Petabyte Scale Architecture Part2 [English]
มุมมอง 6511 หลายเดือนก่อน
In this lecture, I will teach you the principles of how to build a big data architecture that stores and processes 1 PB worth of data using different technologies (Redshift, Databricks, BigQuery, SQream). What will you learn? 1. Challenges with such a big data scale. 2. Consideration set of the use case. 3. The architecture of each technology, relevant features for the use case, and pro/cons an...
Rise, Fall and re-Rise of the Semantic Layer [English]
มุมมอง 39511 หลายเดือนก่อน
Rise, Fall and re-Rise of the Semantic Layer [English]
My First Petabyte Scale Architecture- Part1 [English]
มุมมอง 344ปีที่แล้ว
My First Petabyte Scale Architecture- Part1 [English]
Architectural Evolution of Amazon internal Data Platform [English]
มุมมอง 183ปีที่แล้ว
Architectural Evolution of Amazon internal Data Platform [English]
The Semantic Cure for Your GenAI Hangover [English]
มุมมอง 57ปีที่แล้ว
The Semantic Cure for Your GenAI Hangover [English]
Data Modeling for NoSQL Databases [English]
มุมมอง 155ปีที่แล้ว
Data Modeling for NoSQL Databases [English]
NoSQL Data Modeling and the LSM Tree data structure [English]
มุมมอง 488ปีที่แล้ว
NoSQL Data Modeling and the LSM Tree data structure [English]
How to design ML Observability for high-risk AI use cases [English]
มุมมอง 38ปีที่แล้ว
How to design ML Observability for high-risk AI use cases [English]
Data integration, ETL, ELT...challenges, and complexities [English]
มุมมอง 156ปีที่แล้ว
Data integration, ETL, ELT...challenges, and complexities [English]
High Performance, Low Latency Database Architecture [English]
มุมมอง 179ปีที่แล้ว
High Performance, Low Latency Database Architecture [English]
Data Mesh: Experimentation to Industrialisation [English]
มุมมอง 84ปีที่แล้ว
Data Mesh: Experimentation to Industrialisation [English]
Keep your data encrypted in BigQuery [English]
มุมมอง 680ปีที่แล้ว
Keep your data encrypted in BigQuery [English]
Next Generation Databases: Critical Innovations for Performance at Scale [English]
มุมมอง 57ปีที่แล้ว
Next Generation Databases: Critical Innovations for Performance at Scale [English]
AWS EMR Demystified Part4 [English]
มุมมอง 50ปีที่แล้ว
AWS EMR Demystified Part4 [English]
Airflow Distributed workloads, AWS Lambda, Multi Threaded Python Script [Hebrew]
มุมมอง 112ปีที่แล้ว
Airflow Distributed workloads, AWS Lambda, Multi Threaded Python Script [Hebrew]
AWS EMR Demystified Part 3 [English]
มุมมอง 49ปีที่แล้ว
AWS EMR Demystified Part 3 [English]

ความคิดเห็น

  • @jazzvids
    @jazzvids 2 หลายเดือนก่อน

    Thank you for this valuable talk! I am currently writing my masters' thesis in nlp and this is very helpful

  • @scarfacezalusky8238
    @scarfacezalusky8238 5 หลายเดือนก่อน

    Promo SM 👇

  • @dawnstroup4698
    @dawnstroup4698 8 หลายเดือนก่อน

    'Promo sm' 😘

  • @davidrecheni
    @davidrecheni 11 หลายเดือนก่อน

    The sound quality is very bad

  • @24baby60
    @24baby60 ปีที่แล้ว

    thank you

  • @adamo1262
    @adamo1262 2 ปีที่แล้ว

    I understand you didn't mention Azure in this video. I'm trying to transition into DE and for some reason I like Azure. Is there any reason why you haven't mentioned at all. Thanks and I thought the content was first class, keep it up!

    • @bigdatademystified3192
      @bigdatademystified3192 2 ปีที่แล้ว

      No reason, No Clouds bring value to the table given the right usecase. I didn't speak about Azure, as the lecture is only about AWS. :) glad you liked the content. much appriciated.

  • @ASMRaphael
    @ASMRaphael 2 ปีที่แล้ว

    Whoa, so superb and so absolutely wonderful :) I enjoyed the video to bits my dear friend! :)

  • @sebastiencrepel5032
    @sebastiencrepel5032 2 ปีที่แล้ว

    Hello. Very interesting. Many thanks !

  • @bvbc87
    @bvbc87 2 ปีที่แล้ว

    This lecture was very informative.. Thank you.. ☺

  • @AnnieCushing
    @AnnieCushing 3 ปีที่แล้ว

    Wow, this was fantastic. Well done.

  • @9webberi92
    @9webberi92 4 ปีที่แล้ว

    היי תודה על תכנים מעולים תמשיכו כך!

  • @palanit4191
    @palanit4191 4 ปีที่แล้ว

    We have a star schema and a dimension which gets refreshed on a daily basis, so is it advisable to denormalize or keep the star schema as it is in bigquery ? Thanks.

    • @omidvahdaty7636
      @omidvahdaty7636 4 ปีที่แล้ว

      As a rule of thumbs yes, denormalize (assuming analyitcs use case). There may be some exceptions. contact me via Linkedin "Omid Vahdaty" I will happy to answer in more details.

  • @omidvahdaty7636
    @omidvahdaty7636 4 ปีที่แล้ว

    Lectured slides: big-data-demystified.ninja/2019/10/27/80-cost-reduction-in-google-cloud-bigquery-tips-and-tricks-big-query-demystified/

  • @omidvd
    @omidvd 4 ปีที่แล้ว

    Lecture slides: big-data-demystified.ninja/2019/10/27/80-cost-reduction-in-google-cloud-bigquery-tips-and-tricks-big-query-demystified/

  • @bigdatademystified3192
    @bigdatademystified3192 5 ปีที่แล้ว

    lectures slides and bio: big-data-demystified.ninja/2019/01/28/big-data-demystified-nature-writes-the-best-algorithms/

  • @bigdatademystified3192
    @bigdatademystified3192 5 ปีที่แล้ว

    Omid Vahdaty big-data-demystified.ninja/ Join our meetup, subscribe to youtube channels www.meetup.com/AWS-Big-Data-Demystified/ www.meetup.com/Big-Data-Demystified/ lectures slides: amazon-aws-big-data-demystified.ninja/2019/01/10/big-data-demystified-data-engineering-demystified/