Creating a Custom Report using Ruby

This tutorial describes a report that will read data in your file and calculate the average age at which males and females were married and the average age of fathers and mothers when their children were born. It reproduces the AppleScript report using the Ruby language. The report output is to a built-in GEDitCOM II report. When you are done this tutorial, you should be able to create your own custom reports using Python by changing the type of data collected and the format of the output report.

Main Script

Listing 1
#!/usr/bin/ruby
# Generational Ages Report Script
# 20 JUN 2010, by John A. Nairn
#	
# This script generates a report of average ages of all spouses
# when they got married and when their children were born.
# The report can be for all spouses in the file or just for
# spouses in the currently selected family records.

# Prepare to use Apple's Scripting Bridge for Python
require "osx/cocoa"
include OSX
OSX.require_framework 'ScriptingBridge'

# Define the script name in a global variable
scriptName="Generation Ages"

################### Subroutines (see below)

################### Main Script

# fetch application object
gedit = OSX::SBApplication.applicationWithBundleIdentifier_(\
"com.geditcom.GEDitCOMII")

# verify document is open and version is acceptable
if CheckAvailable(gedit,scriptName,1.5)==0
	exit
end

# reference to the front document
gdoc=gedit.documents[0]

# choose all or currently selected family records
whichOnes = gdoc.userOptionTitle_buttons_message_(\
"Get report for All or just Selected family records",\
["All", "Cancel", "Selected"],nil)
if whichOnes=="Cancel"
    exit
end

# Get of list of the chosen family records
if whichOnes=="All"
    fams = gdoc.families()

else
    selRecs = gdoc.selectedRecords()
    fams = []
    selRecs.each do |fam|
        if fam.recordType()=="FAM"
            fams.push(fam)
        end
    end
end

# No report if no family records were found
if fams.length==0
    puts "No family records were selected"
    exit
end

# Collect all report data in a subroutine
CollectAges(gdoc,fams)

# write to report and then done
WriteToReport(gdoc,gedit)

Listing 1 shows the entire main script, although crucial components of the script are done in subroutines that are given below. This section describes the logic of the main script.

The script starts with comment lines beginning in "#". It is a good idea to start all scripts with comments. If you share your scripts with other GEDitCOM II users or revisit a script written a while ago, these comments can document use of the script.

The Prepare to use section must start all Ruby scripts. These commands load modules needed to allow Ruby scripts to interact with GEDitCOM II. The scriptName holds the name of the script. Any place you need to refer to the script by name, use this variable rather than literal text of the name. This approach will make parts of your script more reusable in other scripts.

The first step is to verify it makes sense to run this script. All the work is done in the CheckAvailable() subroutine (see utility subroutines). The subroutine returns 1 if it is OK to proceed or 0 to exit. This script, for example, requires a document to be open and requires version 1.5 or newer of GEDitCOM II (because it uses some commands first defined in version 1.5).

You will often want to run reports on your entire file. But, it can be helpful to focus a report on a subset of your file. To achieve this goal, many scripts will have an option to be on the entire file or on just the selected records. To run a report on a subset of the file, a user selects the records first and then runs the script. The next three sections let the user choose the report target. First, the user option command displays a box with three buttons for "All", "Cancel", or "Selected" to report on the entire file, to abort the script, or to report on the currently selected records, respectively. The "All" option, which is first, is the default option (user can hit return to use that option). The command is sent to the desired document using the gdoc reference defined at the beginning of the script.

Once the user decides which records to use, the next section compiles all needed records into a list variable (fams). This report is reading ages of fathers and mothers and thus only needs to look at family records. If the user selects "All", the list is found by reading families() from the front document (gdoc). If "Selected" is chosen instead, the script fetches the selectedRecords() of the front document. A list of currently selectedRecords() is a standard property of GEDitCOM II documents. This list may have any number or records (including none) and may have any type of record. Because this report only cares about family records, the each loop goes through the list of selected records and adds only the family records to the fams list variable.

Finally, once all family records are in the fams list variable, the length of that list is checked (fams.length). If it has no elements, there is no need to proceed and the script exits with a message that "No family records were selected". Otherwise the script continues.

The final section is the main part of the script, but all work is done in two subroutines. First the CollectAges() subroutine extracts all needed age information from the provided list of family records and stores the results in global variables. Next, a WriteToReport() subroutine formats the report for output to the user and the script is done


CollectAges() Subroutine

Listing 2
# Collect data for the generation ages report
def CollectAges(gdoc,famList)
    # initialize global counters
    $numHusbAge=$sumHusbAge=$numFathAge=$sumFathAge=0
    $numWifeAge=$sumWifeAge=$numMothAge=$sumMothAge=0
    
    # progress reporting interval
    fractionStepSize=nextFraction=0.01
    numFams=famList.length
    i=0

    famList.each do |fam|
        # read family record information
        husbRef = fam.husband()
        wifeRef = fam.wife()
        chilList = fam.children()
        mdate = fam.marriageSDN()
        
        # read parent birth dates
        hbdate = wbdate = 0
        if husbRef != ""
            hbdate = husbRef.birthSDN()
        end
        if wifeRef != ""
            wbdate = wifeRef.birthSDN()
        end
        
        # spouse ages at marriage
        if mdate>0
            if hbdate>0
                $sumHusbAge = $sumHusbAge + GetAgeSpan(hbdate,mdate)
                $numHusbAge = $numHusbAge+1
            end
           
            if wbdate>0
                $sumWifeAge = $sumWifeAge + GetAgeSpan(wbdate,mdate)
                $numWifeAge = $numWifeAge+1
            end
        end
                
        # spouse ages when children were born
        if hbdate > 0 or wbdate > 0
            chilList.each do |chilRef|
                cbdate = chilRef.birthSDN()
                if cbdate > 0 and hbdate > 0
                    $sumFathAge = $sumFathAge+GetAgeSpan(hbdate,cbdate)
                    $numFathAge = $numFathAge + 1
                end
                if cbdate > 0 and wbdate > 0
                    $sumMothAge = $sumMothAge+GetAgeSpan(wbdate,cbdate)
                    $numMothAge = $numMothAge + 1
                end
            end
        end
                    
        # time for progress
        i = i+1
        fractionDone = Float(i)/Float(numFams)
        if fractionDone > nextFraction
            gdoc.notifyProgressFraction_message_(fractionDone,nil)
            nextFraction = nextFraction+fractionStepSize
        end
    end
end

This subroutine (see Listing 2) collects all data on ages from the information in your file. It is where most of the work of this script is done; the work is done by interaction with your data through GEDitCOM II's scripting objects and their properties.

The first section initializes global variables. These variables will be accessed elsewhere in the script to format the report, which is why they need to be global variables. The variables fractionStepSize, nextFraction, numFams, and i are local variables used for tracking progress of the scripts and are discussed more below.

The each loop is over all family records passed to this subroutine. The loop starts by reading data from the family record - namely references to the husband and wife records (in husbRef and wifeRef), a list of all children records (in chilList), and the marriage date (in mdate). The marriage date, like all dates in this script, is read as a serial day number (using built in SDN properties), which is a day number starting with 1 back around 4000 B.C.. Serial day numbers are ideal for date calculations such as finding years between dates. These SDN attributes return the serial day number for a date or return 0 if the date is either not known or if the date in the file has an invalid date.

The next section reads the parents' birth dates. From above husbRef and wifeRef are references to the parents in this family or either could be an empty string meaning the record does not have that spouse. For each spouse that is in the family record, this section reads their birth serial day numbers using properties of their individual records, otherwise the dates will be zero.

The next two sections do the date calculations for this script. First are the calculations for ages of each parent at the time of marriage. This calculation can only be done if both a spouse's birth date and the family's marriage date are known. Thus if both serial day numbers are greater then zero, the age is calculated (using a utility method called GetAgeSpan()). The global variables numHusbAge and numWifeAge count the number of age calculations done. The sumHusbAge and sumWifeAge variables hold a sum of all ages. When this subroutine is done, the sum variable divided by the num variable will be the average age.

The age at child birth section is similar. It contains a loop over all children in the family. For each child, it looks for their birth date. If a birth date is found, the ages of each parent with a known birth date are added to global variables analogous to the num and sum variables in the previous section. This entire section is enclosed in a conditional that says to do these calculations only if at least one parent birthdate is known.

The last section of the loop informs the user of the script progress using the notifyProgressFraction_message_() command.

When the repeat loop is done, the global variables (e.g., numHusbAge, sumHusbAge, etc.) will contain all data needed to output the report. The subroutine ends and returns control to the main script. The next section explains formatting of the output report.


WriteToReport() Subroutine

Listing 3
# Write the results now in the global variables to a
# GEDitCOM II report
def WriteToReport(gdoc,gedit)
    # build report using <html> elements beginning with <div>
    rpt = ["<div>\n"]
    
    # begin report with <h1> for title
    fname = gdoc.name()
    rpt.push("<h1>Generational Age Analysis in " + fname + "</h1>\n")

    # start <table> and give it a <caption>
    rpt.push("<table>\n<caption>\n")
    rpt.push("Summary of spouse ages when married and when children were born\n")
    rpt.push("</caption>\n")
    
    # column labels in the <thead> section
    rpt.push("<thead><tr>\n")
    rpt.push("<th>Age Item</th><th>Husband</th><th>Wife</th>\n")
    rpt.push("</tr></thead>\n")
    
    # the rows are in the <tbody> element
    rpt.push("<tbody>\n")
    
    # rows for ages when married and when children were borm
    rpt.push(InsertRow("Avg. Age at Marriage", $numHusbAge,\
    $sumHusbAge, $numWifeAge, $sumWifeAge))
    rpt.push(InsertRow("Avg. Age at Childbirth", $numFathAge,\
    $sumFathAge, $numMothAge, $sumMothAge))

    # end the <tbody> and <table> elements
    rpt.push("</tbody>\n</table>\n")
    rpt.push("</div>")
    
    # display the report
    theReport = rpt.join
    p = {"name"=>"Generational Ages","body"=>theReport}
    newReport = gedit.classForScriptingClass_("report").\
    alloc().initWithProperties_(p)
    gdoc.reports.addObject_(newReport)
    newReport.showBrowser()
end

Formatting a report for output in GEDitCOM II means to format the data using html elements all enclosed within a single div element. You can use any html methods you want. Here the report title is put in an h1 section element and all results are placed in a table element. The subroutine to create this report is in Listing 3.

The report is stored in the rpt list variable. The script starts by creating a single element list variable with the <div> element (and a return character). Each new text needed for the report will be added as another element at the end of the list using the push() method. When done, the list is converted to a string variable with the command rpt.join, which combines all elements one after another. An alternative method is to use a string variable. These two approaches, side-by-side are:

   rpt = ["text 1"]             rpt = "text 1"
   rpt.append("text 2")         rpt = rpt + "text 2"
     ...                           ...
   theReport = rpt.join

The list version on the left is faster because adding an element to the end of a list is faster then combining a string with itself (e.g., rpt = rpt + "text 2") many times. For this small script the difference would not be noticeable, but it is good practice to use the most efficient methods whenever possible.

The process is straightforward, assuming you understand html elements. A name for the report is put into an h1 element; the name includes the file name. All data is in a three-column table where the first column labels the data and the other two columns give results for husbands and wives. The table starts with a caption for the table. The thead section has header rows to label the three columns. The tbody has two rows to report results for average ages at marriages and average ages when children were born. These rows are formatted using a custom InsertRow() subroutine. Finally, all elements are closed and the report ends with a </div> element.

The final step is to send the report to a GEDitCOM II report and display the report to the user. The report is created with a initWithProperties_(p) command and properties are used to name the report and set the report text to the contents of the theReport, which is created by joining all string elements in rpt using rpt.join. Finally, the report is displayed to the user with the showBrowser() command.

InsertRow(rowLabel, numHusb, sumHusb, numWife, sumWife) Subroutine

InsertRow()
# Insert table row with husband and wife results
def InsertRow(rowLabel, numHusb, sumHusb, numWife, sumWife)
    tr = "<tr><td>" + rowLabel + "</td><td align='"
    if numHusb > 0
        tr = tr + "right'>%.2f" % (sumHusb / numHusb)
    else
        tr = tr + "center'>-"
    end
    tr = tr + "</td><td align='"
    if numWife > 0
        tr = tr + "right'>%.2f" % (sumWife / numWife)
    else
        tr = tr + "center'>-"
    end
    tr = tr + "</td></tr>\n"
    return tr
end

This subroutine formats each row of the table. The input parameters are a label for the row and numerical results to be averaged and displayed in the table. The only catch is that numHusb or numWife might be zero if no individuals suitable for averaging were found in the CollectAges() subroutine. Since we do not want to divide by zero, this special case is trapped and the table cell is loaded with "-" rather then a calculated average. Average ages are displayed using two digits after the decimal by using String class methods.

Another refinement implemented in this subroutine is to select alignment for the table cells. The label is left justified. All averages are right justified. If no data are available, the "-" is centered. When the subroutine ends, it returns the entire text for the row.