Data Quality Check 4

4.  Ensure that offenses coded as occurring at midnight are correct.

Click here to download SPSS code
Download sample data

Does the number of incidents supposedly occurring during the midnight hour seem reasonable given the data set under consideration? The following code creates the variables necessary to perform the data quality check and creates a line graph. When using this code, be sure to insert the path and the filename of the data file to be used. If you need any assistance working with the syntax provided, please contact us.

According to the FBI Data Collection Guidelines, the incident data/hour variable represents the month, day, year, and hourwhen the incident occurred or started, or the beginning of a time period, if appropriate. If the incident date is unknown,thenthe report date is to be used with an indicator of "R". If the incident hour is unknown, then that portionofthe incident date/hour variable is to be left blank. Zero, which is supposed to represent the midnight hour, might beusedmistakenly to report missing or unknown data. The aggregated flat file, in which the existing values for incident hour(INC_HR) were converted to numeric values, is used for this example. The value for the midnight hour after the AGGREGATE command is 0 (zero). See Creating An Incident-Level Aggregate Flat File for more information.

Defining the Midnight Hour and Generating Comparative Line Graphs

This data quality check procedure can be performed against an aggregated flat file of the administrative data segment at the local, state, or national level. For the current example we use the National 1999 Administrative Segment data and focus on only those records for which an actual, as opposed to reported, incident data and time were recorded. Once the data file is retrieved, the first TEMPORARY / SELECT IF excludes those incidents for which the report data and time rather than the actual date and time were recorded. To view the results, a line graph with hour on the x axis is produced. MISSING = EXCLUDE eliminates the records with no data in the time variables.

GET FILE = 'Directory:\Path\Agency Data.sav'.
SELECT IF (rptdate = ' ').

For comparison purposes, a second TEMPORARY / SELECT IF identifies those records for which only incident times greater than zero are reported. This will show incident times other than the midnight hour. A comparable line graph is generated and, again, the records with data in the RPTDATE variable are excluded. This retains only records with the actual date and time reported in the administrative segment.

SELECT IF (rptdate = ' ' and inc_hr gt 0).

Reviewing the Line Graphs

The first line graph below shows that there was a count of 150,629 incidents at the midnight hour, in contrast to 99,019 incidents during the 10 p.m. hour, 91,201 during the 11 p.m. hour, and 57,994 incidents during the 1 a.m. hour. The second line graph eliminates the midnight hour completely and displays the pattern for the remaining time frames.

Graph 1. Incident hours, including zero.

Graph 2. The same incident hours, excluding zero. Result is a close-up of the same trend as above, without the large spike at zero.

Professional judgment must be exercised to determine whether the number of incidents supposedly occurring during the midnight hour seems to be reasonable. Depending on the length of the list of incidents at the zero hour and the resources available to research individual records, a list of the incidents occurring during the midnight hour can be sampled to investigate coding patterns and determine whether zero was used properly and consistently. In this example, the list would show over 150,000 incident records with an actual incident hour of zero. So, it is not included here. If the quality control check is performed on a frequent basis, however, the list will be of a manageable size. The SPSS code to generate the list, which will need to be added to the SPSS code available above, is as follows. The block of code to generate the list would follow directly after the block that generates the first line graph, which shows the counts of incidents at each incident hour, including the midnight hour.

SELECT IF (rptdate = ' ' AND inc_hr = 0).
LIST ori inc_num inc_hr.

If you determine that the result is unreliable and the volume of affected records too large for correction, the simplest way to handle this problem in analysis is to eliminate (and note the elimination of) all incidents with a reported incident time of zero. However, many correct incident records might then be eliminated from any analysis using the variable INCIDENT DATE/HOUR. Other means exist for compensating for this problem and, at the same time, retaining a representation of the zero or midnight hour (e.g., replacing the midnight hour count with an average of the counts between 11 p.m. and 1 a.m.). (See the Tips and Shortcuts section for an example of how to calculate the average.) You can also consult historical data, if available, for trend comparisons.