Posted 2019-11-27 20:02:15 GMT
昨日リリースされたSBCL 1.5.9にcaseのジャンプテーブル最適化が入ったようなので早速どんなものか試してみたいと思います。
とりあえず若干わざとらしいものを試してみます。
case
のキーに0から511までの数値をシャッフルしたものを指定して分岐し、さらに二段目のcase
で元に戻すのを5回繰り返すのを1000繰り返してみます。
(defconstant nbranch 512);; alexandria
(defun shuffle (sequence &key (start 0) end)
"Returns a random permutation of SEQUENCE bounded by START and END.
Original sequece may be destructively modified, and share storage with
the original one. Signals an error if SEQUENCE is not a proper
sequence."
(declare (type fixnum start)
(type (or fixnum null) end))
(etypecase sequence
(list
(let* ((end (or end (length sequence)))
(n (- end start)))
(do ((tail (nthcdr start sequence) (cdr tail)))
((zerop n))
(rotatef (car tail) (car (nthcdr (random n) tail)))
(decf n))))
(vector
(let ((end (or end (length sequence))))
(loop for i from start below end
do (rotatef (aref sequence i)
(aref sequence (+ i (random (- end i))))))))
(sequence
(let ((end (or end (length sequence))))
(loop for i from (- end 1) downto start
do (rotatef (elt sequence i)
(elt sequence (+ i (random (- end i)))))))))
sequence)
(defmacro casetabletest (x)
(let ((xy (loop :for x :across (shuffle
(let ((vec (make-sequence 'vector nbranch)))
(dotimes (i nbranch vec)
(setf (elt vec i) i))))
:for i :from 0 :collect (list i x))))
`(case (case ,x
,@xy
(otherwise -1))
,@(mapcar #'reverse xy)
(otherwise -1))))
(defun casetest (&aux (n 0))
(dotimes (i nbranch n)
(incf n (casetabletest
(casetabletest
(casetabletest
(casetabletest
(casetabletest i))))))))
(compile 'casetest)
(time (dotimes (i 1000)
(casetest)))
t% /l/sbcl/1.5.8/bin/sbcl --no-sysinit --no-userinit --load /tmp/case.lisp --quit
This is SBCL 1.5.8, an implementation of ANSI Common Lisp.
More information about SBCL is available at <http://www.sbcl.org/>.SBCL is free software, provided as is, with absolutely no warranty.
It is mostly in the public domain; some portions are provided under
BSD-style licenses. See the CREDITS and COPYING files in the
distribution for more information.
Evaluation took:
1.986 seconds of real time
1.990000 seconds of total run time (1.990000 user, 0.000000 system)
100.20% CPU
6,537,459,720 processor cycles
0 bytes consed
t% /l/sbcl/1.5.9/bin/sbcl --no-sysinit --no-userinit --load /tmp/case.lisp --quit
This is SBCL 1.5.9, an implementation of ANSI Common Lisp.
More information about SBCL is available at <http://www.sbcl.org/>.SBCL is free software, provided as is, with absolutely no warranty.
It is mostly in the public domain; some portions are provided under
BSD-style licenses. See the CREDITS and COPYING files in the
distribution for more information.
Evaluation took:
0.056 seconds of real time
0.060000 seconds of total run time (0.060000 user, 0.000000 system)
107.14% CPU
184,341,012 processor cycles
0 bytes consed
この極端な例では35倍も速くなっています。
まあこんなことはそうそうないですが!
ちなみに類似の最適化を実施するClozure CLでも同じ位のスピードが出るようです。
t% /l/ccl/1.11.5/lx86cl64 -n -l /tmp/case.lisp -e '(quit)'
(DOTIMES (I 1000) (CASETEST))
took 55,783 microseconds (0.055783 seconds) to run.
During that period, and with 8 available CPU cores,
60,000 microseconds (0.060000 seconds) were spent in user mode
0 microseconds (0.000000 seconds) were spent in system mode
上記の例では最適化が発動しましたが、case
のジャンプテーブル化ではそんなに大きなテーブルは作らないことがほとんどなので、SBCLではどういう縛りがあるか確認してみます。
発動ルールは、src/compiler/ir2opt.lisp
のshould-use-jump-table-p
の中に記述されているようで、
のようです。
(defun should-use-jump-table-p (chain &aux (choices (car chain)))
;; Dup keys could exist. REMOVE-DUPLICATES from-end can handle that:
;; "the one occurring earlier in sequence is discarded, unless from-end
;; is true, in which case the one later in sequence is discarded."
(let ((choices (remove-duplicates choices :key #'car :from-end t)))
;; Convert to multiway only if at least 4 key comparisons would be needed.
(unless (>= (length choices) 4)
(return-from should-use-jump-table-p nil))
(let ((values (mapcar #'car choices)))
(cond ((every #'fixnump values)) ; ok
((every #'characterp values)
(setq values (mapcar #'sb-xc:char-code values)))
(t
(return-from should-use-jump-table-p nil)))
(let* ((min (reduce #'min values))
(max (reduce #'max values))
(table-size (1+ (- max min )))
(size-limit (* (length values) 2)))
;; Don't waste too much space, e.g. {5,6,10,20} would require 16 words
;; for 4 entries, which is excessive.
(when (and (<= table-size size-limit)
(can-encode-jump-table-p min max))
;; Return the new choices
(cons choices (cdr chain)))))))
上記ルールからすると、一つ置きで配置された整数のキーは最適化されますが、二つ置きだとルールから外れるので最適化されないことが分かります。
一応試してみましょう。
(defun foo2 (x)
(declare (type fixnum x))
#.`(case x
,@(loop :for i :from 0 :by 2 :repeat 10
:collect (list i i))
(otherwise -1)))
; disassembly for FOO2
; Size: 110 bytes. Origin: #x52DF52DA ; FOO2
; 2DA: 498B4510 MOV RAX, [R13+16] ; thread.binding-stack-pointer
; 2DE: 488945F8 MOV [RBP-8], RAX
; 2E2: 4C8BDB MOV R11, RBX
; 2E5: 4983FB24 CMP R11, 36
; 2E9: 774E JNBE L10
; 2EB: 488D0526FFFFFF LEA RAX, [RIP-218] ; = #x52DF5218
; 2F2: 42FF2498 JMP QWORD PTR [RAX+R11*4]
; 2F6: L0: BA04000000 MOV EDX, 4
; 2FB: L1: 488BE5 MOV RSP, RBP
; 2FE: F8 CLC
; 2FF: 5D POP RBP
; 300: C3 RET
; 301: L2: BA08000000 MOV EDX, #x8 ; is_lisp_thread
; 306: EBF3 JMP L1
; 308: L3: BA0C000000 MOV EDX, 12
; 30D: EBEC JMP L1
; 30F: L4: BA10000000 MOV EDX, 16
; 314: EBE5 JMP L1
; 316: L5: BA14000000 MOV EDX, 20
; 31B: EBDE JMP L1
; 31D: L6: BA18000000 MOV EDX, 24
; 322: EBD7 JMP L1
; 324: L7: BA1C000000 MOV EDX, 28
; 329: EBD0 JMP L1
; 32B: L8: BA20000000 MOV EDX, 32
; 330: EBC9 JMP L1
; 332: L9: BA24000000 MOV EDX, 36
; 337: EBC2 JMP L1
; 339: L10: 48C7C2FEFFFFFF MOV RDX, -2
; 340: EBB9 JMP L1
; 342: L11: 31D2 XOR EDX, EDX
; 344: EBB5 JMP L1
; 346: CC10 INT3 16 ; Invalid argument count trap
(defun foo3 (x)
(declare (type fixnum x))
#.`(case x
,@(loop :for i :from 0 :by 3 :repeat 10
:collect (list i i))
(otherwise -1)))
; disassembly for FOO3
; Size: 154 bytes. Origin: #x52DF53CE ; FOO3
; 3CE: 498B5D10 MOV RBX, [R13+16] ; thread.binding-stack-pointer
; 3D2: 48895DF8 MOV [RBP-8], RBX
; 3D6: 4885C0 TEST RAX, RAX
; 3D9: 0F8483000000 JEQ L9
; 3DF: 4883F806 CMP RAX, 6
; 3E3: 750B JNE L1
; 3E5: BA06000000 MOV EDX, 6
; 3EA: L0: 488BE5 MOV RSP, RBP
; 3ED: F8 CLC
; 3EE: 5D POP RBP
; 3EF: C3 RET
; 3F0: L1: 4883F80C CMP RAX, 12
; 3F4: 7507 JNE L2
; 3F6: BA0C000000 MOV EDX, 12
; 3FB: EBED JMP L0
; 3FD: L2: 4883F812 CMP RAX, 18
; 401: 7507 JNE L3
; 403: BA12000000 MOV EDX, 18
; 408: EBE0 JMP L0
; 40A: L3: 4883F818 CMP RAX, 24
; 40E: 7507 JNE L4
; 410: BA18000000 MOV EDX, 24
; 415: EBD3 JMP L0
; 417: L4: 4883F81E CMP RAX, 30
; 41B: 7507 JNE L5
; 41D: BA1E000000 MOV EDX, 30
; 422: EBC6 JMP L0
; 424: L5: 4883F824 CMP RAX, 36
; 428: 7507 JNE L6
; 42A: BA24000000 MOV EDX, 36
; 42F: EBB9 JMP L0
; 431: L6: 4883F82A CMP RAX, 42
; 435: 7507 JNE L7
; 437: BA2A000000 MOV EDX, 42
; 43C: EBAC JMP L0
; 43E: L7: 4883F830 CMP RAX, 48
; 442: 7507 JNE L8
; 444: BA30000000 MOV EDX, 48
; 449: EB9F JMP L0
; 44B: L8: 4883F836 CMP RAX, 54
; 44F: 48C7C2FEFFFFFF MOV RDX, -2
; 456: 41BB36000000 MOV R11D, 54
; 45C: 490F44D3 CMOVEQ RDX, R11
; 460: EB88 JMP L0
; 462: L9: 31D2 XOR EDX, EDX
; 464: EB84 JMP L0
; 466: CC10 INT3 16 ; Invalid argument count trap
SBCLのcase
のジャンプテーブル化は、キーをそこそこ密に配置する必要がある様子。
ちなみに、case
の最適化と本記事では書いてきましたが、Clozure CLと同じく、コンパイラが最適化で実施するので、Lispのレベルではif
の組み合わせが最適化のルールに合致していれば発動します。
SBCLには最近細かい最適化が入ってきていますが今後も地味に速くなって行きそうです。
■
HTML generated by 3bmd in LispWorks 7.0.0