code::dive conference 2015 - Andrei Alexandrescu - Writing Fast Code II

แชร์
ฝัง
  • เผยแพร่เมื่อ 26 ส.ค. 2024
  • code::dive conference 2015 - Nokia Wrocław
    codedive.pl/

ความคิดเห็น • 8

  • @iEarthos
    @iEarthos ปีที่แล้ว +1

    Andrei Alexandrescu Reps "D" Perfectly.. he captures and expounds the essance of "OurCodes" in a profoundly delightful manner.

  • @chriswatch582
    @chriswatch582 6 ปีที่แล้ว +1

    I modified the first version of the program. In this program for the unrolling test, I replaced the "for" loop with an "if" statement. The "if" statement checked if there were "4", "8", or "16" digits in the test string. Then for each active option, I unrolled 4 times, 8 times, or 16 times. The resulting output from the program was that the routine based on unparllelism was always fastest. The second fastest was always unrolling. The routine based on parallelism was always the slowest routine.
    Output from program:
    TEST 1: Convert string to number: 1234.
    Not Use Parallelism Times: min = 0 max = 3520 Min Iter = 10000 Max Iter = 100000000
    RESULT: 1234
    Use Parallelism Times: min = 0 max = 5156 Min Iter = 10000 Max Iter = 100000000
    RESULT: 1234
    Use Unrolling Times: min = 0 max = 4933 Min Iter = 10000 Max Iter = 100000000
    RESULT: 1234
    TEST 2: Convert string to number: 12345678.
    Not Use Parallelism Times: min = 0 max = 5161 Min Iter = 10000 Max Iter = 100000000
    RESULT: 12345678
    Use Parallelism Times: min = 1 max = 8520 Min Iter = 10000 Max Iter = 100000000
    RESULT: 12345678
    Use Unrolling Times: min = 1 max = 8295 Min Iter = 10000 Max Iter = 100000000
    RESULT: 12345678
    TEST 3: Convert string to number: 1234567890123456.
    Not Use Parallelism Times: min = 1 max = 9065 Min Iter = 10000 Max Iter = 100000000
    RESULT: 1234567890123456
    Use Parallelism Times: min = 1 max = 15273 Min Iter = 10000 Max Iter = 100000000
    RESULT: 1234567890123456
    Use Unrolling Times: min = 2 max = 15128 Min Iter = 10000 Max Iter = 100000000
    RESULT: 1234567890123456
    ---------------------------------------------------------------------------------------------------------------------------------
    #include "stdafx.h"
    #include
    #include
    #include
    using namespace std;
    // 2 ROUTINES THAT CREATE AN INTEGER FROM A STRING REPRESENTATION OF THAT NUMBER.
    //THE SLOWEST IS SUPPOSE TO RELY ON NON-PARALLELISM.
    //ROUTINE FROM TALK: slide 59
    unsigned long long atoui_notuse_parallelism(const char* b, const char* e)
    {
    unsigned long long result = 0;
    for (; *b != *e; ++b)
    {
    result = result * 10 + (*b - '0');
    }
    return result;
    }
    static const long long pow10[21] = {
    1UL,
    10UL,
    100UL,
    1000UL,
    10000UL,
    100000UL,
    1000000UL,
    10000000UL,
    100000000UL,
    1000000000UL,
    10000000000UL,
    100000000000UL,
    1000000000000UL,
    10000000000000UL,
    100000000000000UL,
    1000000000000000UL,
    10000000000000000UL,
    100000000000000000UL,
    1000000000000000000UL,
    10000000000000000000UL
    };
    //IDEA FOR ROUTINE CAME FROM :slide 63
    unsigned long long atoui_use_parallelism(const char* b, const char* e, short i)
    {
    unsigned long long result = 0;
    for (; *b != *e; ++b)
    {
    result += pow10[i--] * (*b - '0');
    }
    return result;
    }
    //IDEA FOR ROUTINE CAME FROM UNROLLING SLIDE: slide
    unsigned long long atoui_use_unrolling (const char* b, const char* e, short i, short j)
    {
    unsigned long long result = 0;
    if (j == 1)
    {
    result += pow10[i--] * (*b - '0'); b++;
    result += pow10[i--] * (*b - '0'); b++;
    result += pow10[i--] * (*b - '0'); b++;
    result += pow10[i--] * (*b - '0');
    }
    else if (j == 2)
    {
    result += pow10[i--] * (*b - '0'); b++;
    result += pow10[i--] * (*b - '0'); b++;
    result += pow10[i--] * (*b - '0'); b++;
    result += pow10[i--] * (*b - '0'); b++;
    result += pow10[i--] * (*b - '0'); b++;
    result += pow10[i--] * (*b - '0'); b++;
    result += pow10[i--] * (*b - '0'); b++;
    result += pow10[i--] * (*b - '0');
    }
    else if (j == 3)
    {
    result += pow10[i--] * (*b - '0'); b++;
    result += pow10[i--] * (*b - '0'); b++;
    result += pow10[i--] * (*b - '0'); b++;
    result += pow10[i--] * (*b - '0'); b++;
    result += pow10[i--] * (*b - '0'); b++;
    result += pow10[i--] * (*b - '0'); b++;
    result += pow10[i--] * (*b - '0'); b++;
    result += pow10[i--] * (*b - '0'); b++;
    result += pow10[i--] * (*b - '0'); b++;
    result += pow10[i--] * (*b - '0'); b++;
    result += pow10[i--] * (*b - '0'); b++;
    result += pow10[i--] * (*b - '0'); b++;
    result += pow10[i--] * (*b - '0'); b++;
    result += pow10[i--] * (*b - '0'); b++;
    result += pow10[i--] * (*b - '0'); b++;
    result += pow10[i--] * (*b - '0');
    }
    return result;
    }
    int main(int argc, int** argv[])
    {
    int elapsed_Time;
    int max, min;
    int max_num_iterations, min_num_iterations;
    long long result;
    std::clock_t start; //Holds starting time
    __int64 test_Number;
    char test[30]; //3 TEST VALUES
    char test2[9] = "1234.";
    char test3[11] = "12345678.";
    char test4[18] = "1234567890123456.";
    const char* b;
    const char* e = ".";
    int strlength;
    for (auto j = 1; j

  • @jaredmulconry
    @jaredmulconry 7 ปีที่แล้ว +1

    That final comment wasn't far off the mark, considering constexpr_if and the like are likely to appear in the next standard. Perhaps not with identical semantics, I haven't examined the two proposals closely, but I've already heard talk of STL implementers excited to apply it to their code bases. Removing of classic tag dispatch patterns, to name one example use.

  • @chriswatch582
    @chriswatch582 7 ปีที่แล้ว +1

    On Visual Studio 2015 using Microsoft 7 Professional with 2 Cores, I ran a test version of the non-parallel, parallel computation and unrolling, but the non-parallel routine was always faster. I even tried putting the large array and string length computation in the routine but it was still slower than the non-parallel computation. Relying on parallelism was slightly slower than unrolling in the last 2 tests.
    Output from program:
    TEST 1: Convert string to number: 1234.
    Not Use Parallelism Times: min = 0 max = 5307 Min Iter = 10000 Max Iter = 100000000
    RESULT: 1234
    Use Parallelism Times: min = 0 max = 6310 Min Iter = 10000 Max Iter = 100000000
    RESULT: 1234
    Use Unrolling Times: min = 0 max = 6621 Min Iter = 10000 Max Iter = 100000000
    RESULT: 1234
    TEST 2: Convert string to number: 12345678.
    Not Use Parallelism Times: min = 1 max = 9573 Min Iter = 10000 Max Iter = 100000000
    RESULT: 12345678
    Use Parallelism Times: min = 1 max = 10227 Min Iter = 10000 Max Iter = 100000000
    RESULT: 12345678
    Use Unrolling Times: min = 1 max = 9799 Min Iter = 10000 Max Iter = 100000000
    RESULT: 12345678
    TEST 3: Convert string to number: 1234567890123456.
    Not Use Parallelism Times: min = 1 max = 12625 Min Iter = 10000 Max Iter = 100000000
    RESULT: 1234567890123456
    Use Parallelism Times: min = 2 max = 17928 Min Iter = 10000 Max Iter = 100000000
    RESULT: 1234567890123456
    Use Unrolling Times: min = 2 max = 17227 Min Iter = 10000 Max Iter = 100000000
    RESULT: 1234567890123456
    ---------------------------------------------------------------------------------------------------------------------------------
    #include "stdafx.h"
    #include
    #include
    #include
    using namespace std;
    // 2 ROUTINES THAT CREATE AN INTEGER FROM A STRING REPRESENTATION OF THAT NUMBER.
    //THE SLOWEST IS SUPPOSE TO RELY ON NON-PARALLELISM.
    //ROUTINE FROM TALK: slide 59
    unsigned long long atoui_notuse_parallelism(const char* b, const char* e)
    {
    unsigned long long result = 0;
    for (; *b != *e; ++b)
    {
    result = result * 10 + (*b - '0');
    }
    return result;
    }
    static const long long pow10[21] = {
    1UL,
    10UL,
    100UL,
    1000UL,
    10000UL,
    100000UL,
    1000000UL,
    10000000UL,
    100000000UL,
    1000000000UL,
    10000000000UL,
    100000000000UL,
    1000000000000UL,
    10000000000000UL,
    100000000000000UL,
    1000000000000000UL,
    10000000000000000UL,
    100000000000000000UL,
    1000000000000000000UL,
    10000000000000000000UL
    };
    //IDEA FOR ROUTINE CAME FROM :slide 63
    unsigned long long atoui_use_parallelism(const char* b, const char* e, short i)
    {
    unsigned long long result = 0;
    for (; *b != *e; ++b)
    {
    result += pow10[i--] * (*b - '0');
    }
    return result;
    }
    //IDEA FOR ROUTINE CAME FROM UNROLLING SLIDE: slide
    unsigned long long atoui_use_unrolling (const char* b, const char* e, short i)
    {
    unsigned long long result = 0;
    for (; *b != *e;)
    {
    result += pow10[i--] * (*b - '0'); b++;
    result += pow10[i--] * (*b - '0'); b++;
    result += pow10[i--] * (*b - '0'); b++;
    result += pow10[i--] * (*b - '0'); b++;
    }
    return result;
    }
    int main(int argc, int** argv[])
    {
    int elapsed_Time;
    int max, min;
    int max_num_iterations, min_num_iterations;
    long long result;
    std::clock_t start; //Holds starting time
    __int64 test_Number;
    char test[30]; //3 TEST VALUES
    char test2[9] = "1234.";
    char test3[11] = "12345678.";
    char test4[18] = "1234567890123456.";
    const char* b;
    const char* e = ".";
    int strlength;
    for (auto j = 1; j

  • @chriswatch582
    @chriswatch582 6 ปีที่แล้ว +1

    So, all the interesting things Andrei said about parallelism (and unrolling) does not work with the program that I developed on Visual Studio 2015. I believe those things should work elsewhere though!

  • @QuentinUK
    @QuentinUK 8 ปีที่แล้ว +1

    First comment! The first talk had loads of views and comments.

  • @rehash3d
    @rehash3d 4 ปีที่แล้ว +1

    Damn, the boost version is slow as heck, someone must have put in the extra effort to make it that slow

  • @user-ov5nd1fb7s
    @user-ov5nd1fb7s ปีที่แล้ว +1

    None of this works in Go because GO shits bounds checks all over the pow10 table.