Family identification python script

This post presents a script that was used in order to identify families from a database comprising the name of 6116 victims of the Holocaust in italy, as well as, for most of them, the name of their mother, father and spouse.

For any two individuals, the script compares the names of their mother, father, and spouse, and assigns the same family ID to individuals that share the same two parents (siblings), are married to each other (spouses), or for which one’s parent’s spouse matches their second parent (parent and child). In all those cases, if one of the matched individual already has a family ID, the same ID is assigned to the second individual. In the case of an incomplete match between two individuals, for instance if the spouse of an individual has a different name of his or her spouse’s spouse (himself) or if a name is missing, the script returns the inconsistency in the logfile as a possible issue. The script also creates a table of social network relations using code numbers to specify the relationship between any two individuals.

The script is written in python and was originally designed to be used as a tool in ESRI ArcCatalog software. Using it outside of ArcCatalog would only require to change the first few lines of code.

This script is based on the work of Ryan Schuerman, a fellow doctoral student at Texas State University, whom I wish to thank for providing me with an early version of the code.

Leave a comment