A Practitioner's Guide to Data Collection

Want to download a PDF of the Data Collection for GEAR UP Programs Best Practices Guide? Click here.
WHY IS DATA COLLECTION SO IMPORTANT?
Data collection may seem like a very tedious task and can often get sidelined in favor of the very important work that you do with students, teachers, and families. However, the nature of Federal grant programs means that an evaluation process is key to receiving funding, improving programs, and spreading awareness of the great work that your program is doing. Your data team consists of you and other people on your team and can also include service providers, school partners or outside evaluators. Everyone is committed to providing the grant with an accurate and meaningful evaluation. To do this, everyone needs to share as much usable data as possible regarding school demographics, participation rosters, and academic records. Below we have detailed some key points to keep in mind while recording your activity data so that you can best see the impact of your work reflected in evaluations.
WHATEVER YOU DO, BE CONSISTENT!
Whatever system you have created for your data collection, making sure it is consistent and organized is the most useful thing you can do as part of your data team. This means that if you have a SharePoint where you upload rosters for one-time activities, as well as spreadsheets that you use for longer-term services, make sure to use them consistently. If you do happen to track a one-time activity in your spreadsheet as well as submit a roster in SharePoint, it could lead to confusion for your data team and perhaps double counting that activity or losing track of other activities. This also means that the more you can use the same format for your dates, (01/01/2024 vs 1 Jan, 2024 vs 01.01.24), the same columns for rosters (First Name, Last Name, Student ID vs Name, Email Address, Phone Number) and the same types of documents (Word doc vs Google Sheet vs pdf vs form completion), the more useful information can be pulled from these sources.

KEY DATA ELEMENTS
In order to have complete and accurate information in commonly used databases like SCRIBE and RGI, data needs to show when an activity happened, what category the activity falls into, who attended the activity, and how long the activity lasted.
The date of the activity does not seem like the most important aspect of the data, but it is crucial for telling the difference between similar activities, knowing what ongoing activities have already been recorded, and making sure each instance of an activity is logged.
Every time you collect and submit the participation records for an activity, you should be clear about what service category that activity falls under. This allows your evaluation team to do more specific analysis of your activities within categories, e.g., whether students who participated in financial aid activities were more likely to submit the FAFSA.
- Family participation is usually a high priority for GEAR UP grants, so it is important to always indicate when a parent/guardian attends an event with their student or on behalf of their student, or when you communicate with them (in preparation for a student trip, etc.).
It can be tricky when a parent/guardian has multiple children in a cohort because participation is typically tracked at the student level. When a parent attends an event for multiple students, it is important to list each student that an adult represents in order to get the participation for all students. This also means that if multiple parents attend an event for the same student, that will only count as one parent/guardian for the total participation number for that event.
- From your data team’s perspective, the least important piece of information that is needed for most databases is the amount of time spent in an activity. Unless there is a specific analysis happening that is based on how much time students spent in a certain activity or unless an event took up a lot of time such as a college tour, this field might not provide as much information. However, time spent on each activity is a required field to report in the APR and FPR, so you should still provide this information.
WAYS TO DO THIS
There are many ways to create an effective and efficient data collection system (just make sure they are FERPA-compliant!). Some helpful tools are SharePoint folders, Google Sheets, OneDrive shared folders, and Microsoft Forms. Overall, there are two types of data that you can collect for activities—static and live. SharePoint or OneDrive folders are useful for sharing rosters for one-time events and activities with only the students who attended each event in each file. Google Sheets or shared Excel files (through SharePoint or OneDrive) are useful for live data because a roster with all students at a particular school can be shared along with ongoing event data like updating when students have advising appointments or tutoring meetings. Your evaluation team can review these on a regular basis and add the new data into your database. These are also handy for viewing which students have not participated in certain meetings or appointments since the entire roster is included. Below are some do’s and don’ts for general data collection along with explanations for each item.
DO’S, DON’TS, AND WHY’S OF FORMATTING YOUR DATA
Below we have some rules of thumb to remember when collecting data. Click on each "DO" or "DON'T" for an explanation!
Why? Identifying unique students that might have the same name by their student ID means that there is less of a chance that the wrong student is logged.
Why? If you are using pdf documents, inserting screenshots of rosters make these very difficult to use since text recognition software is not very accurate and screenshots do not tend to be high resolution. Instead, add text to your pdf and copy and paste the data into the pdf.
Why? It is crucial to have accurate and complete names of students so that your data team can be sure which students actually attended specific events.
Why? Comments are difficult to get data from. In spreadsheets or pdfs, there is no easy way to grab all comments in a column so it is not actually a part of the spreadsheet at all. Comments in pdf documents add another layer of difficulty to text recognition because this software does not see images that have been commented into the document. This increases the difficulty of the analytic process.
Why? When going through a large roster, human error is more likely when more manual work must be done. If date formats are the same, code can identify dates correctly rather than confusing them for numbers or other characters. Other data cleaning is also often necessary, so if columns are consistent, this can also be done through code rather than manually.
Why? Intermixing text and dates in one cell makes it difficult to separate those two fields. Making sure that dates have their own column when you are making notes means that code can be used instead of manual entry to record every activity.
Why? Especially in spreadsheets, if you add a column that you would like to be entered as an activity, be sure to document what the activity was, when it occurred, and who attended (including if parents/guardians were present). This will allow activities to be entered accurately and promptly and avoid any misunderstandings or confusion about data.
Why? As much as mind reading would make data collection easier, that hasn’t been figured out quite yet. Please clarify as much as you can so that your data team knows exactly how to categorize your data. Even if you have a key somewhere that says what your asterisks and mark through text mean, these formatting codes are much more difficult to use than a new column with that information would be.
EXAMPLE STUDENT TRACKER OR SPREADSHEET

For accurate and complete data collection, the most important column here is the Student ID column. Because these are assigned by the district and are unique, there is no chance of getting confused between two students with the same name, such as between John Doe and the other John Doe in this example. If the Student ID column is incorrect or unavailable, it is possible to use the Name column for identification, but any spelling errors or incomplete names make this inaccurate. The example columns in grey as well as any other demographic information or academic records in Senior Trackers can be interesting and useful for more in depth analysis, but they are not typically useful for collection of activity data. The activity data in this example shown in red is useful as long as the date of XYZ event is given somewhere. Empty cells are fine when reporting this as long as you are consistent that all students who did not attend a given event have empty cells in those columns. Note that the Date of contact column and the Tutoring help column have some blank cells and some cells with multiple dates in a single cell. It would be better to have a new column for each meeting or activity, but as long as you are consistent with separating the dates with only a comma or some similar method this example will still work fine. It would be very difficult to get meaningful information if the Notes column and the Date of contact column were combined so that dates and text were in the same cells.

CONCLUSION
Communication, consistency, and clarity are the most important things to keep in mind for any data collection system. Be clear about expectations of where data will be collected, what it will look like, and how often it will be updated. The more consistent the data you collect can be, the less manual data cleaning is necessary so there will be less room for error in the data collection process. If you have any questions about the best way to collect data for an event, it is always easier to ask and explain beforehand than to go back through data and change each individual student’s record. Overall, try to keep your data simple and logical as possible. When in doubt, always err on the side of too much communication, rather than too little, so that everyone is on the same page about expectations as well as what data means (sometimes raw data can become something of a mystery or puzzle so sometimes explanations help)! If you have any questions about how data collection and management works, or if you would like to connect with the RED team for our data services, please feel free to reach out to ceopeval@ku.edu!