Finding Logic Bugs in Database Management Systems (Manuel Rigger, ETH SQLancer)

แชร์
ฝัง
  • เผยแพร่เมื่อ 21 พ.ย. 2024

ความคิดเห็น • 6

  • @monfera
    @monfera 4 ปีที่แล้ว

    On randoms: maybe it was just not explained, but there's such a depth to it. For example, uniform vs normal distribution; heavily skewed distributions; sparsity and Poisson distribution for categorical values; uncorrelated vs correlated random variables; generating realistic data, based on first creating randomized functional dependencies (candidate superkey and candidate key compositions); creating autocorrelated, time series like data etc. For the database management systems often keep track of statistics for distribution, cardinality, skew etc. and pick different optimizations and physical operations based on that

  • @monfera
    @monfera 4 ปีที่แล้ว

    Why doesn't the UNION ALL here return the expected (one-tuple) answer despite the highlighted MySQL equality bug? If 0 doesn't equal -0 (in contrast to the expectation) the tuple just goes through the wrong gate (2nd one) and the UNION ALL result is still the same. Maybe the bug is the wrong equality check AND something else, as apparently all three gates yielded no match - but this was not mentioned. So I'm not sure if this illustration hints at the power of partitioning; sure it captures certain classes of errors but it wouldn't have caught the 0 = -0 error if that were the only error here.

  • @AnhadJaiSingh
    @AnhadJaiSingh 4 ปีที่แล้ว +1

    Did this video really just open with Xplosive off of chronic 2001?! Wow

    • @carlineng
      @carlineng 4 ปีที่แล้ว

      th-cam.com/video/O8PN4v-Lud0/w-d-xo.html

  • @monfera
    @monfera 4 ปีที่แล้ว

    The partitioning approach along the lines of three-valued logic seems rewarding as it clearly finds bugs. But, as SQL relies on denotational semantics, in fact, the central idea behind RA is this - why not automatically perform the full diversity of query rewrites? For example, why not compare the result of a CTE with a series of views that build on one another; subselect vs CTE; pushing down selections - if needed, by materializing interim results, lest the query optimizer itself rewrites two differently posed queries into the same one, with obvious identical result (even this is useful as it tests whether the optimizer observes equational semantics). There's plenty of work in the space of query rewrites, as the optimizers of the very databases already do this, and there are non-storage systems eg. Apache Calcite.

  • @tom.zhangmingfnegtom.zhang6163
    @tom.zhangmingfnegtom.zhang6163 3 ปีที่แล้ว

    great job!