Using Tshark To View Raw Socket Streams
Why do all packet capture tools do things you never ask them to do? It took me a while to figure out how to get clean streams using just tshark from pcap files. Here's the script:
#!/bin/bash
if [ "$#" -lt 1 ]; then
echo "Usage: tshark_strams.sh <pcap file> [filter rules]"
exit
fi
if [ ! -z "$2" ]; then
STREAMS=$(tshark -r "$1" -R "$2" -T fields -e tcp.stream | sort -n | uniq)
else
STREAMS=$(tshark -r "$1" -T fields -e tcp.stream | sort -n | uniq)
fi
for i in $STREAMS
do
INDEX=`printf '%.5d' $i`
echo "Processing stream $INDEX ..."
tshark -r "$1" -T fields -e data -qz follow,tcp,raw,$i | tail -n +7 | tr -d '=\r\n\t' | xxd -r -p > "$1"_stream-$INDEX.bin
tshark -r "$1" -qz follow,tcp,ascii,$i > "$1"_stream-$INDEX.txt
done
This takes a pcap file generated from wireshark, tshark, tcpdump (or anything that outputs libpcap files) and creates two files for each socket stream. First, a .txt file that contains an ASCII representation of the packet, so that non-printable characters are substituted -- unfortunately it also seems to like mixing in packet byte size numbers prefix by a tab before each packet, and it seems this is impossible to disable, so I've also fixed it to generate a .bin file that stores just the raw stream and nothing else. Don't dump the .bin files to the console if you've captured binary data, use xxd, hexdump, hexedit, or even something like vim.
There is also an optional argument which is the filter, so if you run it as ./tshark_streams.sh capture.pcap "http" it dumps only HTTP streams. See Capture Filter Examples for what you can use.
Here's an example run on wget packet capture for this blog, along with a cat for each generated file:
$ ./tshark_streams.sh example.pcap "http" Processing stream 00000 ...
$ cat example.pcap_stream-00000.txt
=================================================================== Follow: tcp,ascii Filter: tcp.stream eq 0 Node 0: 10.9.0.26:40667 Node 1: 174.136.97.90:80 94 GET / HTTP/1.1 User-Agent: wget Accept: */* Host: heapspray.net Connection: Keep-Alive
1324 HTTP/1.1 200 OK Server: Anonymous Date: Mon, 19 May 2014 06:41:40 GMT Content-Type: text/html; charset=UTF-8 Transfer-Encoding: chunked Connection: keep-alive Vary: Accept-Encoding [...]
$ cat example.pcap_stream-00000.bin | head -n 20 GET / HTTP/1.1 User-Agent: wget Accept: */* Host: heapspray.net Connection: Keep-Alive
HTTP/1.1 200 OK Server: Anonymous Date: Mon, 19 May 2014 06:41:40 GMT Content-Type: text/html; charset=UTF-8 Transfer-Encoding: chunked Connection: keep-alive Vary: Accept-Encoding
3fa0
As can be seen, the .txt has a header and random packet size numbers mixed in throughout, whereas the .bin file is purely the raw stream. Since wget uses HTTP 1.1 by default the server responds in with chunked mode transfer encoding, so the hexadecimal numbers are the chunk headers and sizes, but the offset decimal numbers (i.e., the "94" and " 1324" in the first file) represent the sizes of the received TCP packets. Useful for debugging your networking applications, but they get in the way if you're just trying to analyze the raw stream, so I've had it generate the raw .bin file as well.
For completeness, the command I used to capture the wget request was:
$ tshark -i tun0 -w example.pcap Capturing on tun0 53 ^C
Simply CTRL+C when you are finished capturing, tshark ends the session gracefully. The -i option specifies the interface. On most systems it will be either eth0 if you use a wired, ethernet connection, or wlan0 if you use a wireless connection. Use ifconfig to check. I use a VPN so therefore my device is tun0, for network TUNnel.