Hendrik Makait - Geoscience at Massive Scale | PyData Paris 2024

แชร์
ฝัง
  • เผยแพร่เมื่อ 30 พ.ย. 2024
  • When scaling geoscience workloads to large datasets, many scientists and developers reach for Dask, a library for distributed computing that plugs seamlessly into Xarray and offers an Array API that wraps NumPy. Featuring a distributed environment capable of running your workload on large clusters, Dask promises to make it easy to scale from prototyping on your laptop to analyzing petabyte-scale datasets.
    Dask has been the de-facto standard for scaling geoscience, but it hasn’t entirely lived up to its promise of operating effortlessly at massive scale. This comes up in a few ways:
    Correctly chunking your dataset has a significant impact on Dask’s ability to scale
    Workers accidentally run out of memory due to:
    Data being loaded too eagerly
    Rechunking
    Unmanaged memory
    Over the last few months, Dask has addressed many of those pains and continues to do so through:
    Improvements to its scheduling algorithms
    A faster and more memory-stable method for rechunking
    First-of-its-kind logical optimization layer for a distributed array framework (ongoing)
    Join us as we dive into real-world geoscience workloads, exploring how Dask empowers scientists and developers to run their analyses at massive scale. Discover the impact of improvements made to Dask, ongoing challenges, and future plans for making it truly effortless to scale from your laptop to the cloud.
    www.pydata.org
    PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.
    PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.
    00:00 Welcome!
    00:10 Help us add time stamps or captions to this video! See the description for details.
    Want to help add timestamps to our TH-cam videos to help with discoverability? Find out more here: github.com/num...
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น •