Pārlūkot izejas kodu

qemu_img: is_not_zero() optimization

I run qemu-img under profiler and realized, that most of CPU time is
consumed by is_not_zero() function. I had made a couple of optimizations
on it and got the following output for `time qemu-img convert -O qcow2
volume.qcow2 snapshot.qcow2`:

Original qemu-img:
real 0m56.159s
user 0m34.670s
sys  0m12.079s

Patched qemu-img:
real 0m34.805s
user 0m18.445s
sys  0m12.552s

Signed-off-by: Dmitry Konishchev <konishchev@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Dmitry Konishchev 14 gadi atpakaļ
vecāks
revīzija
f6a00aa150
1 mainītis faili ar 26 papildinājumiem un 3 dzēšanām
  1. 26 3
      qemu-img.c

+ 26 - 3
qemu-img.c

@@ -496,14 +496,37 @@ static int img_commit(int argc, char **argv)
     return 0;
     return 0;
 }
 }
 
 
+/*
+ * Checks whether the sector is not a zero sector.
+ *
+ * Attention! The len must be a multiple of 4 * sizeof(long) due to
+ * restriction of optimizations in this function.
+ */
 static int is_not_zero(const uint8_t *sector, int len)
 static int is_not_zero(const uint8_t *sector, int len)
 {
 {
+    /*
+     * Use long as the biggest available internal data type that fits into the
+     * CPU register and unroll the loop to smooth out the effect of memory
+     * latency.
+     */
+
     int i;
     int i;
-    len >>= 2;
-    for(i = 0;i < len; i++) {
-        if (((uint32_t *)sector)[i] != 0)
+    long d0, d1, d2, d3;
+    const long * const data = (const long *) sector;
+
+    len /= sizeof(long);
+
+    for(i = 0; i < len; i += 4) {
+        d0 = data[i + 0];
+        d1 = data[i + 1];
+        d2 = data[i + 2];
+        d3 = data[i + 3];
+
+        if (d0 || d1 || d2 || d3) {
             return 1;
             return 1;
+        }
     }
     }
+
     return 0;
     return 0;
 }
 }