code::dive conference 2015 - Andrei Alexandrescu - Writing Fast Code II

NOKIA Technology Center Wrocław

มุมมอง 11 593

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 26 ส.ค. 2024
code::dive conference 2015 - Nokia Wrocław
codedive.pl/

ความคิดเห็น • 8

@iEarthos ปีที่แล้ว ⁺¹
Andrei Alexandrescu Reps "D" Perfectly.. he captures and expounds the essance of "OurCodes" in a profoundly delightful manner.
@chriswatch582 6 ปีที่แล้ว ⁺¹
I modified the first version of the program. In this program for the unrolling test, I replaced the "for" loop with an "if" statement. The "if" statement checked if there were "4", "8", or "16" digits in the test string. Then for each active option, I unrolled 4 times, 8 times, or 16 times. The resulting output from the program was that the routine based on unparllelism was always fastest. The second fastest was always unrolling. The routine based on parallelism was always the slowest routine.
Output from program:
TEST 1: Convert string to number: 1234.
Not Use Parallelism Times: min = 0 max = 3520 Min Iter = 10000 Max Iter = 100000000
RESULT: 1234
Use Parallelism Times: min = 0 max = 5156 Min Iter = 10000 Max Iter = 100000000
RESULT: 1234
Use Unrolling Times: min = 0 max = 4933 Min Iter = 10000 Max Iter = 100000000
RESULT: 1234
TEST 2: Convert string to number: 12345678.
Not Use Parallelism Times: min = 0 max = 5161 Min Iter = 10000 Max Iter = 100000000
RESULT: 12345678
Use Parallelism Times: min = 1 max = 8520 Min Iter = 10000 Max Iter = 100000000
RESULT: 12345678
Use Unrolling Times: min = 1 max = 8295 Min Iter = 10000 Max Iter = 100000000
RESULT: 12345678
TEST 3: Convert string to number: 1234567890123456.
Not Use Parallelism Times: min = 1 max = 9065 Min Iter = 10000 Max Iter = 100000000
RESULT: 1234567890123456
Use Parallelism Times: min = 1 max = 15273 Min Iter = 10000 Max Iter = 100000000
RESULT: 1234567890123456
Use Unrolling Times: min = 2 max = 15128 Min Iter = 10000 Max Iter = 100000000
RESULT: 1234567890123456
---------------------------------------------------------------------------------------------------------------------------------
#include "stdafx.h"
#include
#include
#include
using namespace std;
// 2 ROUTINES THAT CREATE AN INTEGER FROM A STRING REPRESENTATION OF THAT NUMBER.
//THE SLOWEST IS SUPPOSE TO RELY ON NON-PARALLELISM.
//ROUTINE FROM TALK: slide 59
unsigned long long atoui_notuse_parallelism(const char* b, const char* e)
{
unsigned long long result = 0;
for (; *b != *e; ++b)
{
result = result * 10 + (*b - '0');
}
return result;
}
static const long long pow10[21] = {
1UL,
10UL,
100UL,
1000UL,
10000UL,
100000UL,
1000000UL,
10000000UL,
100000000UL,
1000000000UL,
10000000000UL,
100000000000UL,
1000000000000UL,
10000000000000UL,
100000000000000UL,
1000000000000000UL,
10000000000000000UL,
100000000000000000UL,
1000000000000000000UL,
10000000000000000000UL
};
//IDEA FOR ROUTINE CAME FROM :slide 63
unsigned long long atoui_use_parallelism(const char* b, const char* e, short i)
{
unsigned long long result = 0;
for (; *b != *e; ++b)
{
result += pow10[i--] * (*b - '0');
}
return result;
}
//IDEA FOR ROUTINE CAME FROM UNROLLING SLIDE: slide
unsigned long long atoui_use_unrolling (const char* b, const char* e, short i, short j)
{
unsigned long long result = 0;
if (j == 1)
{
result += pow10[i--] * (*b - '0'); b++;
result += pow10[i--] * (*b - '0'); b++;
result += pow10[i--] * (*b - '0'); b++;
result += pow10[i--] * (*b - '0');
}
else if (j == 2)
{
result += pow10[i--] * (*b - '0'); b++;
result += pow10[i--] * (*b - '0'); b++;
result += pow10[i--] * (*b - '0'); b++;
result += pow10[i--] * (*b - '0'); b++;
result += pow10[i--] * (*b - '0'); b++;
result += pow10[i--] * (*b - '0'); b++;
result += pow10[i--] * (*b - '0'); b++;
result += pow10[i--] * (*b - '0');
}
else if (j == 3)
{
result += pow10[i--] * (*b - '0'); b++;
result += pow10[i--] * (*b - '0'); b++;
result += pow10[i--] * (*b - '0'); b++;
result += pow10[i--] * (*b - '0'); b++;
result += pow10[i--] * (*b - '0'); b++;
result += pow10[i--] * (*b - '0'); b++;
result += pow10[i--] * (*b - '0'); b++;
result += pow10[i--] * (*b - '0'); b++;
result += pow10[i--] * (*b - '0'); b++;
result += pow10[i--] * (*b - '0'); b++;
result += pow10[i--] * (*b - '0'); b++;
result += pow10[i--] * (*b - '0'); b++;
result += pow10[i--] * (*b - '0'); b++;
result += pow10[i--] * (*b - '0'); b++;
result += pow10[i--] * (*b - '0'); b++;
result += pow10[i--] * (*b - '0');
}
return result;
}
int main(int argc, int** argv[])
{
int elapsed_Time;
int max, min;
int max_num_iterations, min_num_iterations;
long long result;
std::clock_t start; //Holds starting time
__int64 test_Number;
char test[30]; //3 TEST VALUES
char test2[9] = "1234.";
char test3[11] = "12345678.";
char test4[18] = "1234567890123456.";
const char* b;
const char* e = ".";
int strlength;
for (auto j = 1; j
@jaredmulconry 7 ปีที่แล้ว ⁺¹
That final comment wasn't far off the mark, considering constexpr_if and the like are likely to appear in the next standard. Perhaps not with identical semantics, I haven't examined the two proposals closely, but I've already heard talk of STL implementers excited to apply it to their code bases. Removing of classic tag dispatch patterns, to name one example use.
@chriswatch582 7 ปีที่แล้ว ⁺¹
On Visual Studio 2015 using Microsoft 7 Professional with 2 Cores, I ran a test version of the non-parallel, parallel computation and unrolling, but the non-parallel routine was always faster. I even tried putting the large array and string length computation in the routine but it was still slower than the non-parallel computation. Relying on parallelism was slightly slower than unrolling in the last 2 tests.
Output from program:
TEST 1: Convert string to number: 1234.
Not Use Parallelism Times: min = 0 max = 5307 Min Iter = 10000 Max Iter = 100000000
RESULT: 1234
Use Parallelism Times: min = 0 max = 6310 Min Iter = 10000 Max Iter = 100000000
RESULT: 1234
Use Unrolling Times: min = 0 max = 6621 Min Iter = 10000 Max Iter = 100000000
RESULT: 1234
TEST 2: Convert string to number: 12345678.
Not Use Parallelism Times: min = 1 max = 9573 Min Iter = 10000 Max Iter = 100000000
RESULT: 12345678
Use Parallelism Times: min = 1 max = 10227 Min Iter = 10000 Max Iter = 100000000
RESULT: 12345678
Use Unrolling Times: min = 1 max = 9799 Min Iter = 10000 Max Iter = 100000000
RESULT: 12345678
TEST 3: Convert string to number: 1234567890123456.
Not Use Parallelism Times: min = 1 max = 12625 Min Iter = 10000 Max Iter = 100000000
RESULT: 1234567890123456
Use Parallelism Times: min = 2 max = 17928 Min Iter = 10000 Max Iter = 100000000
RESULT: 1234567890123456
Use Unrolling Times: min = 2 max = 17227 Min Iter = 10000 Max Iter = 100000000
RESULT: 1234567890123456
---------------------------------------------------------------------------------------------------------------------------------
#include "stdafx.h"
#include
#include
#include
using namespace std;
// 2 ROUTINES THAT CREATE AN INTEGER FROM A STRING REPRESENTATION OF THAT NUMBER.
//THE SLOWEST IS SUPPOSE TO RELY ON NON-PARALLELISM.
//ROUTINE FROM TALK: slide 59
unsigned long long atoui_notuse_parallelism(const char* b, const char* e)
{
unsigned long long result = 0;
for (; *b != *e; ++b)
{
result = result * 10 + (*b - '0');
}
return result;
}
static const long long pow10[21] = {
1UL,
10UL,
100UL,
1000UL,
10000UL,
100000UL,
1000000UL,
10000000UL,
100000000UL,
1000000000UL,
10000000000UL,
100000000000UL,
1000000000000UL,
10000000000000UL,
100000000000000UL,
1000000000000000UL,
10000000000000000UL,
100000000000000000UL,
1000000000000000000UL,
10000000000000000000UL
};
//IDEA FOR ROUTINE CAME FROM :slide 63
unsigned long long atoui_use_parallelism(const char* b, const char* e, short i)
{
unsigned long long result = 0;
for (; *b != *e; ++b)
{
result += pow10[i--] * (*b - '0');
}
return result;
}
//IDEA FOR ROUTINE CAME FROM UNROLLING SLIDE: slide
unsigned long long atoui_use_unrolling (const char* b, const char* e, short i)
{
unsigned long long result = 0;
for (; *b != *e;)
{
result += pow10[i--] * (*b - '0'); b++;
result += pow10[i--] * (*b - '0'); b++;
result += pow10[i--] * (*b - '0'); b++;
result += pow10[i--] * (*b - '0'); b++;
}
return result;
}
int main(int argc, int** argv[])
{
int elapsed_Time;
int max, min;
int max_num_iterations, min_num_iterations;
long long result;
std::clock_t start; //Holds starting time
__int64 test_Number;
char test[30]; //3 TEST VALUES
char test2[9] = "1234.";
char test3[11] = "12345678.";
char test4[18] = "1234567890123456.";
const char* b;
const char* e = ".";
int strlength;
for (auto j = 1; j
@chriswatch582 6 ปีที่แล้ว ⁺¹
So, all the interesting things Andrei said about parallelism (and unrolling) does not work with the program that I developed on Visual Studio 2015. I believe those things should work elsewhere though!
@QuentinUK 8 ปีที่แล้ว ⁺¹
First comment! The first talk had loads of views and comments.
@rehash3d 4 ปีที่แล้ว ⁺¹
Damn, the boost version is slow as heck, someone must have put in the extra effort to make it that slow
@user-ov5nd1fb7s ปีที่แล้ว ⁺¹
None of this works in Go because GO shits bounds checks all over the pow10 table.

ต่อไป

เล่นอัตโนมัติ

code::dive conference 2015 - Andrei Alexandrescu - Writing Fast Code I