The German Tank Problem

In World War II, Germany’s Panzer Mark V tank was technologically more advanced than those of the allies.  It was strategically important to determine how many of these tanks Germany had.  From this statistic estimates of monthly production could also be made.  Allied intelligence exploited the manufacturing practice of assigning ascending sequence serial numbers to tank components.  Examining the parts of tanks captured in battle provided a sample of the total population of Panzer Mark V’s in service. Attribution: Bundesarchiv, Bild 183-H26258 / CC-BY-SA 3.0

Based on their sample of captured tanks the intelligence officers used the following formula to determine the total Panzer Mark V population: In the code that follows we are testing the validity of this formula.  For demonstration purposes, we assume a population of 3,000 tanks.  1,000 samples of size 100 are made of the tank population.  The population is represented by all integers from 1 to 3,000.  From each sample of 100, the unbiased estimate is computed using this formula.

The results of the following program is written to a file, estimates.csv.

import numpy as np

import csv

# Field names

fields = ['TankEstimate']

filename = "estimates.csv"

# These parameters "cure" double lines, CRLF issues

csvfile = open(filename, 'w', newline = '\n', encoding = 'utf-8')

csvwriter = csv.writer(csvfile, delimiter = ',')

csvwriter.writerow(fields)

x = 1

samples = 1000

samplesize = 100

while x <= samples:

tanksample = np.random.choice(3000, samplesize, replace=False)

tankmax = np.amax(tanksample)

estimate = ((samplesize + 1)/samplesize)*tankmax - 1

fields = (estimate)

csvwriter.writerow([fields])

x+=1

csvfile.close()  from matplotlib import pyplot as plt

import numpy as np

#bins = plt.hist(tanks, bins = 'auto')

bins = plt.hist(tanks, bins = [2825,2850,2875,2900,2925,2950,2975,3000,3025,3050])

plt.ylabel("Frequency")

plt.xlabel("Unbiased Estimates - Number of Panzer V's")

plt.title("Tank Population Estimate")

plt.show()

bins = plt.hist(tanks, bins = [2825,2850,2875,2900,2925,2950,2975,3000,3025,3050])

print ("Maximum Estimate = ", np.amax(tanks))

print ("Minimum Estimate = ", np.amin(tanks))

print ("Actual Tank Population = 3000")

#print (tanks)

#print (tanks)

#print (bins) Maximum Estimate =  3,027.99

Minimum Estimate =  2,597.73

Actual Tank Population = 3,000