Stats from packet captures

One of the most common questions that comes up when investigating an application blip is: what else was going on at the same time? Generally we focus our packet capture analysis on the host experiencing the problem but I have often wanted a better view of what else is going on in terms of net traffic around the same time. For example you might wonder if there was a spike in tcp resets or perhaps ARPs, things that might give a clue to what’s going on.

To get that insight I recently wrote a program to read capture files and generate summary info on the traffic in a given time interval. My program generates a CSV file that I import into Excel (PowerQuery _–> PowerPivot –> PivotChart )  to view. This turned out to be quite insightful as I was quickly able to filter the view down to the spikes you see in the clip below. From that I could go back to capture files and find out they are the result of a router loop.

This chart has Packet/s on the Y axis and time of day on the X. Its filtered to ICMP only. What you see is a ping to two IPs every 5 minutes. These pings fall in to a router loop and we get spike of 7K packets in one second. The capture is from the core, between the firewall and the router.

ICMP

TCP Performance Inside an ESP Tunnel

This problem took a little more time to analyze as the users tcp session was inside an ESP tunnel, from the Office to the Data Center. I didn’t have much experience with ESP but with some work was able to develop a spreadsheet to visualize the problem. Of course I did this using PowerQuery and PowerPivot.

The first problem was reordering of the ESP packets. I knew we were getting reordering as the Checkpoint Firewall was often dropping packets due to “replay attack” detection.  It seems the replay window on the FW was 64, that is, a packet can be late or early by 64 before a replay attack is detected. We fixed this by white listing the ESP traffic in the IPS. That fixed the replay attack messages but the file transfer speed did not improve.  I needed a way confirm packet reordering was not still occurring so I developed these charts.

The top chart is sum of ip length / second over time of  day. The bottom chart is reordering count (early and late) per second over time of day.

ESPReordering

Using this I could see reordering was still occurring but to a lesser degree. Another call the IPS vendor, another white list of the ESP traffic but this time at layer 2 (previous was just in the inspection engine) and the reordering was gone and TCP performance shot up to 30 MBytes/s. All good but occasionally we the file transfer speed would drop from 30 down to 3 again. In this case we were getting packet loss of the ESP packets.

Now the bottom chart shows ESP packet loss. Each step is the number of packets lost in a given second. This is basically gaps in the ESP sequence numbers. Top is sum of ip length/s.

ESPPacketLoss

Here are the steps to get your own chart.

  1. Get a packet capture of a file transfer. You are capturing the ESP packets.
  2. Convert the capture to CSV: tshark.exe” -r <fileName>.cap  -T fields  -E separator=, -e frame.number -e frame.time -e ip.src -e ip.dst -e ip.proto -e ip.len -e esp.spi -e esp.sequence >  <fileName>.csv
  3. Import into Excel using PowerQuery (PQ) and parse the fields.
  4. Push the parsed CSV in to PowerPivot (PP) using PQ. Now you can add a Pivot Table from PP, on rows put esp.spi and SSum Values is Count of esp.spi. The largest esp.spi will most likely be you file copy.
  5. Now go back and filter you capture to the big esp.spi, save, convert to CSV, this is data for the charts. Notice that esp.spi is only used in one direction, you want the from sender direction.
  6. Load this CSV into PQ, parse out the fields. Also convert the time to Hours, Minutes and Seconds. You frames should start at one increasing in order. This reflects the order the frame was received. Do a little math on the esp sequence number so it starts a 1 too. Now we have frame # and the esp seq # aligned. Push this into PP and plot the max and min of esp sequence on a Pivot Chart and you are done.

I hope you find this useful.

-Wes

 

 

 

 

Capturing with Netmon on 10G interfaces

Here are my tips for capturing at 10G with netmon. I have tried many combinations to get a good capture without losing frames.

  • Use nmcap of course
  • Use profile 0, pure capture, this means no filtering at all
  • Keep the max frame length low, 80 works well
  • Save to large files, I use 1024M

With the above on a busy interface you are creating many files and using up disk space fast. Typically I am only interested in a certain IP in my 10G capture so while I am capturing I start a series of jobs to filter the capture files. As the job runs I can delete the 1G files to start recovering disk space.

Here is an example. Say, my capture is creating a series of files of name net1_capture_v1(0).cap, net1_capture_v1(1).cap and so on. To filter these I use a cmd Batch for loop.  The steps are like this:

  1. Get the list of files to process, in this case thats dir /b *v1* > filesv1.txt
  2. Open and check the file names in filesv1.txt looks right.
  3. Run you filter job (looking for IP 160.35.69.103, 0x0a234567 in this example):
    1. for /f %i in (filesv1.txt) do nmcap /useprofile 1 /disableconversations /inputcapture %i /capture Blob(FrameData, 26, 4) == 0x0a234567 OR Blob(FrameData, 30, 4) == 0x0a234567 /file %filtered.cap:2G

Now the blob filter above is fast but it will still take awhile to run through a 1G file so you may want to get a couple of filtering jobs running in parallel.  I find the easiest way to do that is to stop the capture after it has created 20 or so files and immediately restart it with a new file name of v2 , v3 and so on. Then I create a for loop for each series of files.

Working with large Event Log files – evtx

If you ever have to work with large evtx files you will love what Message Analyzer can do for you.

Recently I looked into a busy print server that was generating thousands of security audit events  per second. The sysadmin provided me with a 4GB evtx file (3.2 million events) to look at. Now, if you have ever tried filtering a file this big in the MMC you know its a problem. Even trying to export to XML or CSV from the MMC doesn’t seem to work well.

My task was to identify the top event IDs and then for the top events find out what they were about. I used LogParer to get a count of top event count but the real hard part was getting at the EventData, The EventData is unique to a particular Event ID. In my case I knew the top events were object access audit events but I really needed know by who (the SubjectUserName) and what are they touching. Both these fields are in EventData and I needed to export that info so I could get it into PowerPivot to summarize. This is where Message Analyzer came to my rescue.

Using Message Analyzer 1.3.1 I was able to load all of the evtx relatively quickly. Filter to a specific event id (fast), and then right click on the EventData fields I wanted and add as columns (fast).  Just like that I had on the screen the data I needed. From there its a simple export to CSV, load into PowerPivot and quickly I could see that the 90% of the events were from a  program touching dll and log files. We can now adjust our audit policy to fix this.

I hope you find this helpful.

GeoIP for packet captures

Here is something I have been using lately as another interesting way to look at packet captures. It started with reports of intermittent connection drops. In this case we have a central Citrix farm used by hundreds of people from many different office locations across a fairly large geographic region (hundreds of miles). I thought it would interesting to look at net errors by location. For example grouping packets with the tcp reset flag by region over time. The idea being that a surge in resets at a location would indicate a site specific problem. I have also used this with firewall data to look at DoS attacks by region.

In a nutshell we need to get the starting and ending IP (as an integer) address range for each location. We will use PowerQuery (PQ) and the MaxMind data in a CSV file for this. Similarly we will take a netmon capture, export to CSV and get the Src or Dst IP represented as an integer. This too is done with Power Query. Once these two steps are done you have the data needed in  Excel PowerPivot and can easily add a calculated column for the location.

Lets get started.

First lets get the MaxMind data into PowerPivot (PP). The MaxMind file (GeoLite2-Country-Blocks-IPv4.csv) looks like this:

network,geoname_id,registered_country_geoname_id,represented_country_geoname_id,is_anonymous_proxy,is_satellite_provider
1.0.0.0/24,2077456,2077456,,0,0

We need to take 1.0.0.0/24 and convert that to a start and ending IP. I do that with these steps in PQ:

  1. split the IP from the CIDR
  2. split the IP into its 4 octects
  3. IP as an integer is = ( (1stOct * 16777216) + (2ndOct * 65536) + (3rdOct * 256) + 4thOct ) this is the StartIPAsInt
  4. EndIPAsInt is = StartIPAsInt + 2 to the power of (32 – CIDR)

My PQ steps look like this (I call this query IPV4ToRegions):

let
    Source = Csv.Document(File.Contents(“C:\Users\wes\Documents\NetSleuth\GeoIP\MaxMind\GeoLite2-Country-Blocks-IPv4.csv”),null,”,”,null,1252),
    #”First Row as Header” = Table.PromoteHeaders(Source),
    #”Changed Type” = Table.TransformColumnTypes(#”First Row as Header”,{{“network”, type text}, {“geoname_id”, Int64.Type}, {“registered_country_geoname_id”, Int64.Type}, {“represented_country_geoname_id”, type text}, {“is_anonymous_proxy”, Int64.Type}, {“is_satellite_provider”, Int64.Type}}),
    #”Split Column by Delimiter” = Table.SplitColumn(#”Changed Type”,”network”,Splitter.SplitTextByDelimiter(“/”),{“network.1”, “network.2”}),
    #”Changed Type1″ = Table.TransformColumnTypes(#”Split Column by Delimiter”,{{“network.1”, type text}, {“network.2”, Int64.Type}}),
    #”Renamed Columns” = Table.RenameColumns(#”Changed Type1″,{{“network.1”, “network”}, {“network.2”, “cidr”}}),
    #”Duplicated Column” = Table.DuplicateColumn(#”Renamed Columns”, “network”, “Copy of network”),
    #”Renamed Columns1″ = Table.RenameColumns(#”Duplicated Column”,{{“Copy of network”, “net”}}),
    #”Split Column by Delimiter1″ = Table.SplitColumn(#”Renamed Columns1″,”net”,Splitter.SplitTextByDelimiter(“.”),{“net.1”, “net.2”, “net.3”, “net.4”}),
    #”Changed Type2″ = Table.TransformColumnTypes(#”Split Column by Delimiter1″,{{“net.1”, Int64.Type}, {“net.2”, Int64.Type}, {“net.3”, Int64.Type}, {“net.4”, Int64.Type}}),
    #”Added Custom” = Table.AddColumn(#”Changed Type2″, “StartIPAsInt”, each ([net.1]*16777216)+([net.2]*65536)+([net.3]*256)+[net.4] +1),
    #”Changed Type3″ = Table.TransformColumnTypes(#”Added Custom”,{{“StartIPAsInt”, Int64.Type}}),
    #”Added Custom1″ = Table.AddColumn(#”Changed Type3″, “EndIPAsInt”, each [StartIPAsInt]+Number.Power(2,(32-[cidr])) -2),
    #”Changed Type4″ = Table.TransformColumnTypes(#”Added Custom1″,{{“EndIPAsInt”, Int64.Type}})
in
    #”Changed Type4″

It’s important that the start and end IP are of whole number type.

The next step is to get your capture in a good CSV format. Lately I have been using the Wireshark GUI to do this, it’s fast. I create this profile in Wireshark so I get the columns I want in the CSV.

Displayd Title: Field Type
Time: UTC time
DateTimePST: Absolute date, as YYYY-MM DD, and time
RelativeTime: Relative time
Source: Src addr (unresolved)
Destination: Dest addr (unresolved)
Protocol: Protocol
Length: Packet length (bytes)
SrcPort: Src port (unresolved)
DstPort: Dest port (unresolved)
TcpSYN: Custom (tcp.flagssyn)
TcpRST: Custom (tcp.flags.reset)
TcpACK:Custom (tcp.flags.ack)

Again bring this in to Excel using PQ, do the splits and math to get the IPs as integers with type whole number.  My PQ code to do this was(I called this query parseWiresharkExport, I just did the DstIP in this example):

let

    Source = Csv.Document(File.Contents(“C:\Users\wes\Documents\NetSleuth\GeoIP\BlogbridgeCapture_CSV.csv”),null,”,”,null,1252),

    #”First Row as Header” = Table.PromoteHeaders(Source),

    #”Changed Type” = Table.TransformColumnTypes(#”First Row as Header”,{{“DateTimePST”, type datetime}}),

    #”Duplicated Column” = Table.DuplicateColumn(#”Changed Type”, “DateTimePST”, “Copy of Time”),

    #”Duplicated Column1″ = Table.DuplicateColumn(#”Duplicated Column”, “DateTimePST”, “Copy of Time.1”),

    #”Renamed Columns” = Table.RenameColumns(#”Duplicated Column1″,{{“Copy of Time”, “Minute”}, {“Copy of Time.1”, “Second”}}),

    #”To Minute” = Table.TransformColumns(#”Renamed Columns”,{{“Minute”, Time.Minute}}),

    #”To Second” = Table.TransformColumns(#”To Minute”,{{“Second”, Time.Second}}),

    Rounding = Table.TransformColumns(#”To Second”,{{“Second”, each Number.Round(_, 0)}}),

    #”Duplicated Column2″ = Table.DuplicateColumn(Rounding, “Destination”, “Copy of Destination”),

    #”Renamed Columns1″ = Table.RenameColumns(#”Duplicated Column2″,{{“Copy of Destination”, “DstIP”}}),
    #”Filtered Rows” = Table.SelectRows(#”Renamed Columns1″, each ([Protocol] = “HTTP” or [Protocol] = “TCP” or [Protocol] = “TLSv1.2”)),

    #”Split Column by Delimiter” = Table.SplitColumn(#”Filtered Rows”,”DstIP”,Splitter.SplitTextByDelimiter(“.”),{“DstIP.1”, “DstIP.2”, “DstIP.3”, “DstIP.4”}),

    #”Changed Type1″ = Table.TransformColumnTypes(#”Split Column by Delimiter”,{{“DstIP.1”, Int64.Type}, {“DstIP.2”, Int64.Type}, {“DstIP.3”, Int64.Type}, {“DstIP.4”, Int64.Type}}),

    #”Added Custom” = Table.AddColumn(#”Changed Type1″, “DstIPAsInt”, each ([DstIP.1]*16777216) + ([DstIP.2]* 65536) + ([DstIP.3]*256) + [DstIP.4]),

    #”Changed Type2″ = Table.TransformColumnTypes(#”Added Custom”,{{“DstIPAsInt”, Int64.Type}}),

    #”Duplicated Column4″ = Table.DuplicateColumn(#”Changed Type2″, “DateTimePST”, “Copy of Time”),

    #”Renamed Columns2″ = Table.RenameColumns(#”Duplicated Column4″,{{“Copy of Time”, “Hour”}}),

    #”To Hour” = Table.TransformColumns(#”Renamed Columns2″,{{“Hour”, Time.Hour}}),

    #”Reordered Columns” = Table.ReorderColumns(#”To Hour”,{“Time”, “DateTimePST”, “RelativeTime”, “Source”, “Destination”, “Protocol”, “Length”, “SrcPort”, “DstPort”, “TcpSYN”, “TcpRST”, “TcpACK”, “Hour”, “Minute”, “Second”, “DstIP.1”, “DstIP.2”, “DstIP.3”, “DstIP.4”, “DstIPAsInt”})

in

    #”Reordered Columns”

Be sure to load your PQ to the Data Model.

I also brought into Excel the MaxMind GeoLite2-Country-Locations-en file and created a relationship so the region name is displayed in English.

Now we are ready to add a calculated column in PowerPivot to parseWiresharkExport table. The formula is:

=CALCULATE(    
            VALUES(RegionName_English[country_name]),
                    FILTER(
                           IPv4ToRegions,
                           parseWiresharkExport[DstIPAsInt] >= IPv4ToRegions[StartIPAsInt]     &&  parseWiresharkExport[DstIPAsInt] <= IPv4ToRegions[EndIPAsInt]
                                 )
                        )

And that’s pretty much it. Now I have a PivotTable like this showing a count of DstIPs from each location.  You can easily imagine taking the time from the capture, converting that to day, hour, min, sec and adding this to the X axis of a PivotChart with a sum of tcp resets by location. I usually end up making a few PivotCharts looking at the data in different ways.  I often create slicers for the IP octects and this make it easy to filter down the view.

Location    Count of Location
United States    1851
Switzerland    20
Canada    18
Netherlands    11
United Kingdom    6
Germany    5
Singapore    2
Grand Total    1913

Some screen shots in case you got lost along the way. Or maybe not, I can’t seem to get the Blog editor to upload images.

Insights with a netsh trace

The other day I had a customer experiencing a problem where TCP sessions couldn’t be established. We checked the firewall and core and it all looked fine. We could see the source host sending a TCP SYN and the destination was immediately responding with a Reset. Based on the IP TTL the Reset was coming from the destination host so I asked for a packet capture taken on the destination. In this case they didn’t have netmon or wireshark installed on the destination so suggested using the built in capability in 08R2 and newer – netsh trace scenario…

Once I got the capture ( an etl file) and loaded it into netmon with the latest parsers I did a filter on description where it contains the IP address. I found the SYN and Reset but the next two messages were the surprising bonus.

netsh_trace

Notice the message at the end of the message:

Wscore_MicrosoftWindowsWinsockAFD:datagram dropped: 2 (0x2): Process 0xFFFFFA8007E3E060, Endpoint 0xFFFFFA8008149620, Buffer 0xFFFFFA80060E06D0, Length 40 (0x28), Address <IP Address>:0, Seq 10001 (0x2711), Reason Insufficient local buffer space

Given the low buffer space message the decision was made to reboot and that cleared the problem. Of course there is now the task to determine what caused the memory issue but that’s another story.

Convert a netmon capture to CSV for data analysis

I find myself using Power Query all the time now when I need visualize the flow of bytes between hosts. I have been using copy/paste to get the data (time stamp and tcp payloadlength) but this doesn’t work when you have a lot of packets. What you really need is a CSV version of your netmon capture to be the source of your Power Query.

I found that tshark can convert a cap file to a csv nicely. I use this cmd.

“C:\Program Files\Wireshark\tshark.exe” -r “in.cap” -T fields -E separator=, -e frame.time -e ip.src -e ip.dst -e tcp.srcport -e tcp.dstport -e tcp.len > “out.csv”

I then bring that into Power Query and plot using PowerPivot as describe in my earlier post. I end with a chart like the one below and using slicer I can easily look at the data in different ways.

 

BytePerSec_PQ

Max TCP retransmits – how many do you need?

I spent a lot of time trying to figure why our Netbackup connections were failing. We knew we had a lots of packet loss but could never get to the bottom of that. We found that disabling SACK on the Win03 systems fixed it but that wasn’t an option for the Win08 hosts. So, in a effort to just get it fixed and forget about root cause we upped the TCPMaxDataRetransmissions to 25 to see what would happen. Well, the backup job ran for 4 days but finally finished. We rolled out this change to 4 more servers and they all finished thier backup.

In the end we set the TCPMaxDataRetransmissions to 10 on the servers that consistently had failing backups. The default is 5 but in our case it was being set to 3 with Group Policy, this is a common security hardening change.

Traffic analysis with Power Query

I have become quite the fan of the Power Query, the add-in for Excel from Microsoft. Over the last few days we have had concerns about the number of DNS queries our event log collection system was making, Our ArcSight connector appliance tries to enrich events it collects by trying to resolve IP address it finds in the security events. The two ConnApps we focused on were making 600 DNS queries per second. The problem was that the traffic had to go through a firewall to get to the DNS server and all these queries were creating a lot of entries in the firewall connection table.

To analyze the problem we got a packet capture on one of the  ArcSight ConnApps, I loaded the pcap file into netmon and then did a select all in the Frame Summary pane and pasted this into an Excel file. I then created a new Excel file and created a Power Query using the previous Excel file as the data source. Using the split function you can easily get the field you need in its own column (the DNS question). What we are after in this case was what DNS question is being asked and how often. Save the query to the data model (install the Power Pivot add-in too) and create a PowerPivot pivot table. Now you can easily summarize. In our case we had two IPs that were queried for 4000 times in the two minute capture.

What was the fix? Easy. We enabled DNS negative caching. They are now running at 20 queries per second.

Here are some screen shot of the process. I flushed the DNS cache on my laptop to get some data in my capture.

The capture

Image

Pasted in to Excel

Image

Open a Power Query from a blank Excel doc.

Image

Loaded in query editor

Image

After a bunch of splits, filter and column renames got it down to this.

Image

Now we have what we need in the PowerPivot data model. Create your PivotTable (in PowerPivot) and you get this.

Image

 

Problems with SACK

I have been working on a problem where a backup jobs fail with NetBackup Error 24. These tend to be intermittent failures with some hosts failing more than others. Recently I took a second look at a host that was frequently failing and this time I got lucky.

Using the long running capture techniques described below I was able to trace down the connection reset. What I found after studying the trace was that SACK (Selective Acknowledgement) was handling the communication about what data needed to be retransmitted and it didn’t work. I don’t know SACK well enough to tell which side failed to respond correctly but it looks like the Win03 (sender) should have provided the missing data (which should still be in its buffer, I think) to the receiver (Solaris). In any case we disabled SACK on the Win03 host and haven’t had a failure since.

Here is the trace with my comments.

Image