除錯與分析 (尚未翻譯)
程式設計中的一條準則是,程式碼不會直接給出您期望的結果,而是執行您指定的動作。 彌合這一差距有時是一項相當困難的。 在本講座中,我們將介紹處理錯誤和應對資源匱乏的程式碼的有用技術:除錯和效能分析。
偵錯
Printf 偵錯法與使用日誌
“最有效的除錯工具就是認真思考,加上放置得當的列印語句” — Brian Kernighan, Unix for Beginners.
偵錯的第一次嘗試往往是在發現問題的地方新增一些列印語句,然後不斷重複直至你獲得了足夠的資訊並認識到了問題的原因。
第二種方法是在程式中使用日誌記錄,而不是臨時印出語句。 由於以下幾個原因,使用日誌比常規的列印語句要好:
- 你可以將日誌寫入檔案,sockets 或者是伺服器而不僅僅是印出來。
- 日誌支援重要等級(例如 INFO, DEBUG, WARN, ERROR 等),允許你過濾日誌。
- 對於新問題,日誌常常已經包含了偵錯所需的足量訊息。
這裡 有記錄日誌的示例程式碼:
$ python logger.py
# Raw output as with just prints
$ python logger.py log
# Log formatted output
$ python logger.py log ERROR
# Print only ERROR levels and above
$ python logger.py color
# Color formatted output
我最喜歡的加強日誌可讀性的技巧之一就是給它們塗上顏色。
現在你可能已經意識到在終端使用顏色可以使它更具可讀性。這是如何做到的?
像 ls
或 grep
這樣的程式正在使用 [ANSI 跳脫序列](https://en.wikipedia.org/wiki/ANSI_escape_code),這種特殊字元序列用於指示你的 shell 更改輸出顏色。 例如,執行 echo -e "\e[38;2;255;0;0mThis is red\e[0m"
在終端上以紅色顯示 “ This is red”。 以下指令碼顯示瞭如何在終端中列印多種RGB顏色。
#!/usr/bin/env bash
for R in $(seq 0 20 255); do
for G in $(seq 0 20 255); do
for B in $(seq 0 20 255); do
printf "\e[38;2;${R};${G};${B}m█\e[0m";
done
done
done
第三方日誌
當你開始構建更大的程式系統時,很可能會遇到依賴項,這些依賴項可以作為單獨的程式執行。 Web 伺服器,資料庫或訊息代理是這種依賴關係的常見示例。 與這些系統進行互動時常常需要讀取其日誌,因為客戶端錯誤訊息可能不足。
幸運地是,許多程式都會自己記錄日誌。
在 UNIX 系統中,程式通常會將日誌寫入 /var/log
。
例如,NGINX 伺服器會將日誌放入 /var/log/nginx
。
現在的系統自身也會使用 系統日誌,這裡會記錄所有日誌訊息。
多數(但非全部) Linux 系統使用 systemd
,它是一個可以控制系統內許多事情,例如有哪些服務被啟用的後臺駐留程式。
systemd
將日誌以特殊格式置於 /var/log/journal
,你可以使用 journalctl
來顯示它們。
在多數 UNIX 系統中你也可以使用 dmesg
來存取核心日誌。
要在系統日誌下進行記錄,可以使用 logger
這個 shell 程式。
下面是一個使用 logger
且檢查條目是否進入系統日誌的範例。
大多數程式語言都有將自身日誌記錄到系統日誌中的功能。
logger "Hello Logs"
# On macOS
log show --last 1m | grep Hello
# On Linux
journalctl --since "1m ago" | grep Hello
如同我們在資料處理這堂課上看到的, 日誌可能非常冗長,致使我們需要經過處理和過濾才能得到所需的訊息。
如果你經常使用 journalctl
和 log show
來處理日誌,你可以嘗試使用他們內建的 flag 進行初步過濾。
也有類似 lnav
這樣的工具, 可以更優質地顯示日誌,以及更簡易地導航操作。
除錯工具(偵錯程式)
我們應該使用偵錯程式來完成 printf 無法勝任的除錯工作。 偵錯程式是一種讓你可以和其他程式執行過程進行互動的程式,擁有以下能力:
- 當程式執行到某一列時, 中斷程式執行
- 每次僅執行一條指令
- 程式當機後檢查所有變數
- 在滿足特定條件時中斷程式執行
- 更多進階功能
許多程式語言都帶有偵錯程式。Python 使用 pdb
作為其偵錯程式。
這裡是 pdb
支援的一部分指令的說明:
- l(ist) - 顯示當前列上下的 11 列內容,或者繼續顯示上一次的 list 指令結果。
- s(tep) - 執行當前列,且在儘可能早地停止執行。
- n(ext) - 繼續執行,直至當前函式的下一列或遇到 return。
- b(reak) - (基於提供的引數)設定中斷點。
- p(rint) - 在當前上下文對算是求值並印出結果。有一個類似的指令是 pp, 它使用
pprint
來顯示結果。 - r(eturn) -繼續執行,直至當前函式回傳。
- q(uit) - 結束偵錯程式。
讓我們看一個示例,使用 pdb
來修復錯誤的 Python 程式碼(請參考課程回放)
def bubble_sort(arr):
n = len(arr)
for i in range(n):
for j in range(n):
if arr[j] > arr[j+1]:
arr[j] = arr[j+1]
arr[j+1] = arr[j]
return arr
print(bubble_sort([4, 2, 1, 8, 7, 6]))
Python 是一種直譯語言,所以我們可以透過 pdb
shell 執行指令。
ipdb
是改進後的 pdb
,
它使用 IPython
作為REPL 並啟用了製表符補全,語法突出顯示,更優秀的回溯和內省功能,同時提供與 pdb
相同的介面。
對於更底層的程式語言,你可能會想要瞭解 gdb
(及它的易用版本 pwndbg
)和 lldb
。他們都對類似 C 的語言進行了優化,使得我們可以探測幾乎所有行程與當前電腦狀態:包含暫存器,堆疊和程式計數器等。
專用工具
即使我們要進行黑箱測試,仍然有一些工具可以使用。
當程式需要執行一些僅有系統內核可以執行的動作時,它們會使用系統呼叫。有一些指令可以讓我們追蹤程式使用的系統呼叫。在 linux 中,我們可以使用 strace
,在 macOS 與 BSD 裡則有 dtrace
。dtrace 可能很難使用,因為它使用了自己的 D
語言,但是名為 dtruss
的程式為其提供了類似 strace
的介面(更多細節見這裡)。
以下是一些使用 strace 或 dtruss 來展示 ls
指令執行時的 stat
的示例。如果你想更深入瞭解 strace
,可以閱讀這裡。
# On Linux
sudo strace -e lstat ls -l > /dev/null
4
# On macOS
sudo dtruss -t lstat64_extended ls -l > /dev/null
在某些情況下,我們需要檢視網路封包來找出問題。我們可以使用類似於 tcpdump
和 Wireshark 這樣的網路封包分析器來讀取封包內容,並且根據所需過濾內容。
對於 web 開發,Chrome 和 Firefox 的開發人員工具非常好用。它們可以實現諸如:
- 網頁原始碼 - 檢視任意頁面的 HTML/CSS/JS 原始碼
- 實時修改 HTML, CSS, JS - 修改頁面內容,樣式與行為來進行測試(所以網頁截圖不能作為推理證據)
- Javascript shell - 在 JS REPL 中執行指令
- Network - 分析網路請求時間線
- 儲存 - 檢視 Cookie 和本地程式儲存的內容
靜態分析
對於一部分問題,無需執行程式碼即可解決。 例如,僅透過檢視一段程式碼,我們就可以意識到迴圈變數正在覆蓋一個已經存在的變數。或程式在宣告變數前先讀取變數。 這些就是 靜態分析 工具大放異彩的地方。 靜態分析程式將原始碼作為輸入,透過檢視編碼規則來判斷程式碼是否正確。
在以下 Python 程式碼中存在一些錯誤。我們的迴圈變數 foo
覆蓋了函式 foo
的宣告,我們還在最後一列中寫入了 baz
, 而非 bar
,因此程式將在完成 sleep
後崩潰(需要大概一分鐘)。
import time
def foo():
return 42
for foo in range(5):
print(foo)
bar = 1
bar *= 0.2
time.sleep(60)
print(baz)
靜態分析工具可以認出這些問題。 當我們執行 pyflakes
時候,它會提示與此兩個 bug 相關的錯誤。 mypy
是一個可以檢測型別的工具。在這裡,mypy
會提示 bar
之前是一個 int
,之後被強制轉換成了 float
。
注意我們無需執行程式碼就可以檢測這類問題。
在 shell 講座中,我們介紹了 shellcheck
,這是檢測 shell 指令碼的工具。
$ pyflakes foobar.py
foobar.py:6: redefinition of unused 'foo' from line 3
foobar.py:11: undefined name 'baz'
$ mypy foobar.py
foobar.py:6: error: Incompatible types in assignment (expression has type "int", variable has type "Callable[[], Any]")
foobar.py:9: error: Incompatible types in assignment (expression has type "float", variable has type "int")
foobar.py:11: error: Name 'baz' is not defined
Found 3 errors in 1 file (checked 1 source file)
多數編輯器和 IDE 通常都支援這類程式直接將輸出顯示在程式碼編輯頁面上,來凸顯警告和錯誤提醒。 我們將其叫做 code linting,它也常常被用來顯示其他種類的問題,例如風格違規或不安全的建構。
在 Vim 中,可以使用外掛 ale
或者 syntastic
。
對於 Python, 使用pylint
和 pep8
檢查編碼風格,使用 bandit
檢查常見安全問題。
人們同樣為其他語言編譯了許多有用的靜態分析工具,例如 Awesome Static Analysis (如果想瞭解更多,檢視 Writing 部分),而 Awesome Linters 提供了 linting 功能.
我們也可以使用程式碼格式化工具作為補充功能。例如 Python 中的 black
,Go 中的 gofmt
,適用於 Rust 的 rustfmt
,以及可用於 JS/HTML/CSS 的 prettier
。
這些工具可以自動格式化程式碼,讓它們與這些程式語言的常見樣式保持一致。
儘管我們有時候不願意更改程式碼樣式,但是標準化的格式有助於他人閱讀我們的程式,也讓我們更容易閱讀其他人寫出的(標準化)程式。
分析
雖然你的程式可以正常執行,如果它佔據了過多的 CPU 或 記憶體資源,那它仍然不夠好。演算法課程通常講授大 O 表示法,但往往不會教你如何在程式中找到哪部分在耗費資源。 過早最佳化是萬惡的根源, 我們應該瞭解探查器和檢視工具,來檢測是哪些部分佔用了大部分時間和/或資源,讓我們可以專注優化這些部分。
計時
與我們做除錯的情況類似,許多時候僅印出兩點之間的程式嗎使用了多少時長就夠了。
這是 Python 中的 time
模組的示例。
import time, random
n = random.randint(1, 10) * 100
# Get current time
start = time.time()
# Do some work
print("Sleeping for {} ms".format(n))
time.sleep(n/1000)
# Compute time between start and now
print(time.time() - start)
# Output
# Sleeping for 500 ms
# 0.5713930130004883
However, wall clock time can be misleading since your computer might be running other processes at the same time or waiting for events to happen. It is common for tools to make a distinction between Real, User and Sys time. In general, User + Sys tells you how much time your process actually spent in the CPU (more detailed explanation here).
- Real - Wall clock elapsed time from start to finish of the program, including the time taken by other processes and time taken while blocked (e.g. waiting for I/O or network)
- User - Amount of time spent in the CPU running user code
- Sys - Amount of time spent in the CPU running kernel code
For example, try running a command that performs an HTTP request and prefixing it with time
. Under a slow connection you might get an output like the one below. Here it took over 2 seconds for the request to complete but the process only took 15ms of CPU user time and 12ms of kernel CPU time.
$ time curl https://missing.csail.mit.edu &> /dev/null`
real 0m2.561s
user 0m0.015s
sys 0m0.012s
Profilers
CPU
Most of the time when people refer to profilers they actually mean CPU profilers, which are the most common. There are two main types of CPU profilers: tracing and sampling profilers. Tracing profilers keep a record of every function call your program makes whereas sampling profilers probe your program periodically (commonly every millisecond) and record the program’s stack. They use these records to present aggregate statistics of what your program spent the most time doing. Here is a good intro article if you want more detail on this topic.
Most programming languages have some sort of command line profiler that you can use to analyze your code. They often integrate with full fledged IDEs but for this lecture we are going to focus on the command line tools themselves.
In Python we can use the cProfile
module to profile time per function call. Here is a simple example that implements a rudimentary grep in Python:
#!/usr/bin/env python
import sys, re
def grep(pattern, file):
with open(file, 'r') as f:
print(file)
for i, line in enumerate(f.readlines()):
pattern = re.compile(pattern)
match = pattern.search(line)
if match is not None:
print("{}: {}".format(i, line), end="")
if __name__ == '__main__':
times = int(sys.argv[1])
pattern = sys.argv[2]
for i in range(times):
for file in sys.argv[3:]:
grep(pattern, file)
We can profile this code using the following command. Analyzing the output we can see that IO is taking most of the time and that compiling the regex takes a fair amount of time as well. Since the regex only needs to be compiled once, we can factor it out of the for.
$ python -m cProfile -s tottime grep.py 1000 '^(import|\s*def)[^,]*$' *.py
[omitted program output]
ncalls tottime percall cumtime percall filename:lineno(function)
8000 0.266 0.000 0.292 0.000 {built-in method io.open}
8000 0.153 0.000 0.894 0.000 grep.py:5(grep)
17000 0.101 0.000 0.101 0.000 {built-in method builtins.print}
8000 0.100 0.000 0.129 0.000 {method 'readlines' of '_io._IOBase' objects}
93000 0.097 0.000 0.111 0.000 re.py:286(_compile)
93000 0.069 0.000 0.069 0.000 {method 'search' of '_sre.SRE_Pattern' objects}
93000 0.030 0.000 0.141 0.000 re.py:231(compile)
17000 0.019 0.000 0.029 0.000 codecs.py:318(decode)
1 0.017 0.017 0.911 0.911 grep.py:3(<module>)
[omitted lines]
A caveat of Python’s cProfile
profiler (and many profilers for that matter) is that they display time per function call. That can become unintuitive really fast, specially if you are using third party libraries in your code since internal function calls are also accounted for.
A more intuitive way of displaying profiling information is to include the time taken per line of code, which is what line profilers do.
For instance, the following piece of Python code performs a request to the class website and parses the response to get all URLs in the page:
#!/usr/bin/env python
import requests
from bs4 import BeautifulSoup
# This is a decorator that tells line_profiler
# that we want to analyze this function
@profile
def get_urls():
response = requests.get('https://missing.csail.mit.edu')
s = BeautifulSoup(response.content, 'lxml')
urls = []
for url in s.find_all('a'):
urls.append(url['href'])
if __name__ == '__main__':
get_urls()
If we used Python’s cProfile
profiler we’d get over 2500 lines of output, and even with sorting it’d be hard to understand where the time is being spent. A quick run with line_profiler
shows the time taken per line:
$ kernprof -l -v a.py
Wrote profile results to urls.py.lprof
Timer unit: 1e-06 s
Total time: 0.636188 s
File: a.py
Function: get_urls at line 5
Line # Hits Time Per Hit % Time Line Contents
==============================================================
5 @profile
6 def get_urls():
7 1 613909.0 613909.0 96.5 response = requests.get('https://missing.csail.mit.edu')
8 1 21559.0 21559.0 3.4 s = BeautifulSoup(response.content, 'lxml')
9 1 2.0 2.0 0.0 urls = []
10 25 685.0 27.4 0.1 for url in s.find_all('a'):
11 24 33.0 1.4 0.0 urls.append(url['href'])
Memory
In languages like C or C++ memory leaks can cause your program to never release memory that it doesn’t need anymore. To help in the process of memory debugging you can use tools like Valgrind that will help you identify memory leaks.
In garbage collected languages like Python it is still useful to use a memory profiler because as long as you have pointers to objects in memory they won’t be garbage collected.
Here’s an example program and its associated output when running it with memory-profiler (note the decorator like in line-profiler
).
@profile
def my_func():
a = [1] * (10 ** 6)
b = [2] * (2 * 10 ** 7)
del b
return a
if __name__ == '__main__':
my_func()
$ python -m memory_profiler example.py
Line # Mem usage Increment Line Contents
==============================================
3 @profile
4 5.97 MB 0.00 MB def my_func():
5 13.61 MB 7.64 MB a = [1] * (10 ** 6)
6 166.20 MB 152.59 MB b = [2] * (2 * 10 ** 7)
7 13.61 MB -152.59 MB del b
8 13.61 MB 0.00 MB return a
Event Profiling
As it was the case for strace
for debugging, you might want to ignore the specifics of the code that you are running and treat it like a black box when profiling.
The perf
command abstracts CPU differences away and does not report time or memory, but instead it reports system events related to your programs.
For example, perf
can easily report poor cache locality, high amounts of page faults or livelocks. Here is an overview of the command:
perf list
- List the events that can be traced with perfperf stat COMMAND ARG1 ARG2
- Gets counts of different events related a process or commandperf record COMMAND ARG1 ARG2
- Records the run of a command and saves the statistical data into a file calledperf.data
perf report
- Formats and prints the data collected inperf.data
Visualization
Profiler output for real world programs will contain large amounts of information because of the inherent complexity of software projects. Humans are visual creatures and are quite terrible at reading large amounts of numbers and making sense of them. Thus there are many tools for displaying profiler’s output in an easier to parse way.
One common way to display CPU profiling information for sampling profilers is to use a Flame Graph, which will display a hierarchy of function calls across the Y axis and time taken proportional to the X axis. They are also interactive, letting you zoom into specific parts of the program and get their stack traces (try clicking in the image below).
Call graphs or control flow graphs display the relationships between subroutines within a program by including functions as nodes and functions calls between them as directed edges. When coupled with profiling information such as the number of calls and time taken, call graphs can be quite useful for interpreting the flow of a program.
In Python you can use the pycallgraph
library to generate them.
Resource Monitoring
Sometimes, the first step towards analyzing the performance of your program is to understand what its actual resource consumption is. Programs often run slowly when they are resource constrained, e.g. without enough memory or on a slow network connection. There are a myriad of command line tools for probing and displaying different system resources like CPU usage, memory usage, network, disk usage and so on.
- General Monitoring - Probably the most popular is
htop
, which is an improved version oftop
.htop
presents various statistics for the currently running processes on the system.htop
has a myriad of options and keybinds, some useful ones are:<F6>
to sort processes,t
to show tree hierarchy andh
to toggle threads. See alsoglances
for similar implementation with a great UI. For getting aggregate measures across all processes,dstat
is another nifty tool that computes real-time resource metrics for lots of different subsystems like I/O, networking, CPU utilization, context switches, &c. - I/O operations -
iotop
displays live I/O usage information and is handy to check if a process is doing heavy I/O disk operations - Disk Usage -
df
displays metrics per partitions anddu
displays disk usage per file for the current directory. In these tools the-h
flag tells the program to print with human readable format. A more interactive version ofdu
isncdu
which lets you navigate folders and delete files and folders as you navigate. - Memory Usage -
free
displays the total amount of free and used memory in the system. Memory is also displayed in tools likehtop
. - Open Files -
lsof
lists file information about files opened by processes. It can be quite useful for checking which process has opened a specific file. - Network Connections and Config -
ss
lets you monitor incoming and outgoing network packets statistics as well as interface statistics. A common use case ofss
is figuring out what process is using a given port in a machine. For displaying routing, network devices and interfaces you can useip
. Note thatnetstat
andifconfig
have been deprecated in favor of the former tools respectively. - Network Usage -
nethogs
andiftop
are good interactive CLI tools for monitoring network usage.
If you want to test these tools you can also artificially impose loads on the machine using the stress
command.
Specialized tools
Sometimes, black box benchmarking is all you need to determine what software to use.
Tools like hyperfine
let you quickly benchmark command line programs.
For instance, in the shell tools and scripting lecture we recommended fd
over find
. We can use hyperfine
to compare them in tasks we run often.
E.g. in the example below fd
was 20x faster than find
in my machine.
$ hyperfine --warmup 3 'fd -e jpg' 'find . -iname "*.jpg"'
Benchmark #1: fd -e jpg
Time (mean ± σ): 51.4 ms ± 2.9 ms [User: 121.0 ms, System: 160.5 ms]
Range (min … max): 44.2 ms … 60.1 ms 56 runs
Benchmark #2: find . -iname "*.jpg"
Time (mean ± σ): 1.126 s ± 0.101 s [User: 141.1 ms, System: 956.1 ms]
Range (min … max): 0.975 s … 1.287 s 10 runs
Summary
'fd -e jpg' ran
21.89 ± 2.33 times faster than 'find . -iname "*.jpg"'
As it was the case for debugging, browsers also come with a fantastic set of tools for profiling webpage loading, letting you figure out where time is being spent (loading, rendering, scripting, &c). More info for Firefox and Chrome.
Exercises
Debugging
-
Use
journalctl
on Linux orlog show
on macOS to get the super user accesses and commands in the last day. If there aren’t any you can execute some harmless commands such assudo ls
and check again. -
Do this hands on
pdb
tutorial to familiarize yourself with the commands. For a more in depth tutorial read this. -
Install
shellcheck
and try checking the following script. What is wrong with the code? Fix it. Install a linter plugin in your editor so you can get your warnings automatically.#!/bin/sh ## Example: a typical script with several problems for f in $(ls *.m3u) do grep -qi hq.*mp3 $f \ && echo -e 'Playlist $f contains a HQ file in mp3 format' done
- (Advanced) Read about reversible debugging and get a simple example working using
rr
orRevPDB
.Profiling
-
Here are some sorting algorithm implementations. Use
cProfile
andline_profiler
to compare the runtime of insertion sort and quicksort. What is the bottleneck of each algorithm? Use thenmemory_profiler
to check the memory consumption, why is insertion sort better? Check now the inplace version of quicksort. Challenge: Useperf
to look at the cycle counts and cache hits and misses of each algorithm. -
Here’s some (arguably convoluted) Python code for computing Fibonacci numbers using a function for each number.
#!/usr/bin/env python def fib0(): return 0 def fib1(): return 1 s = """def fib{}(): return fib{}() + fib{}()""" if __name__ == '__main__': for n in range(2, 10): exec(s.format(n, n-1, n-2)) # from functools import lru_cache # for n in range(10): # exec("fib{} = lru_cache(1)(fib{})".format(n, n)) print(eval("fib9()"))
Put the code into a file and make it executable. Install
pycallgraph
. Run the code as is withpycallgraph graphviz -- ./fib.py
and check thepycallgraph.png
file. How many times isfib0
called?. We can do better than that by memoizing the functions. Uncomment the commented lines and regenerate the images. How many times are we calling eachfibN
function now? -
A common issue is that a port you want to listen on is already taken by another process. Let’s learn how to discover that process pid. First execute
python -m http.server 4444
to start a minimal web server listening on port4444
. On a separate terminal runlsof | grep LISTEN
to print all listening processes and ports. Find that process pid and terminate it by runningkill <PID>
. -
Limiting processes resources can be another handy tool in your toolbox. Try running
stress -c 3
and visualize the CPU consumption withhtop
. Now, executetaskset --cpu-list 0,2 stress -c 3
and visualize it. Isstress
taking three CPUs? Why not? Readman taskset
. Challenge: achieve the same usingcgroups
. Try limiting the memory consumption ofstress -m
. - (Advanced) The command
curl ipinfo.io
performs a HTTP request an fetches information about your public IP. Open Wireshark and try to sniff the request and reply packets thatcurl
sent and received. (Hint: Use thehttp
filter to just watch HTTP packets).
Licensed under CC BY-NC-SA.