A program to build capacity to teach earth and environmental data science at tribal and hispanic serving schools: the earth and environmental data science corps (EDSC)

In 2019, I wrote an NSF proposal to create a program that supported students and faculty at schools that serve communities traditionally underrepresented in STEM. The program focused on supporting hispanic and tribal serving colleges in adding earth and environmental data science curriculum to their courses. It also focused on training students to be leaders in this space.

The program was designed as a hybrid in-person / online program. I had been teaching in this type of hybrid environment for years, and have found this way to teaching to be effective as it can support student-directed learning, if done well.

This program pulled together a solid 15 years of my experience working on and creating different education programs that served a suite of different audiences including under-served communities. And I was lucky enough at the time to be working with a wonderful colleague, Jenny Palomino, who also had deep experience in education and training and was excited to help design this program.

It was my first (and only!) NSF proposal. And it was funded for 3 years.

This award in my mind was not about funding my team (although that was important so we could get the work done). Most of the funding in the proposal, went to faculty and students at partner schools. Schools that served tribal and hispanic students - two groups that have been traditionally under-served and underrepresented in both STEM and data science (probably more so underrepresented in data science!).

We called the program the Earth and Environmental Data Science Corps (EDSC).

Round infographic that describes the elements on the earth data science corps program including hundreds of people who would be trained, 75 students who would receive internships and training, tens of thousands of people who would use website lessons and 5 distinct institutions that would participate in the project.
Conceptual infographic that I made showing the core components of the EDSC program. The program elements did change once we actually implemented the program! First we learned a lot working with faculty and students at the schools and adapted to meet their needs. Second, the COVID-19 pandemic hit the United States just weeks before we launched the EDSC. This forced our entire program into a fully online environment.

About the Earth Data Science Corps

The Earth and Environmental Data Science Corps (EDSC) built capacity to teach and to learn open and reproducible earth and environmental data science skills at schools that serve historically under-represented groups in STEM.

The program had several goals

  1. Train undergraduate students with technical data skills needed to support a strong career in data-intensive science.
  2. Increase awareness of job potential in the data-intensive science space. This supported students “seeing” a career path potential that could be empowered by technical data science skills.
  3. Support and empower faculty in adding data-intensive curriculum into their existing courses. This would then scale data skills being available at each institution rather than depending upon our team to teach. This goal was critical. This goal was the element that would scale the effort empower others.

IMPORTANT: Students and faculty were financially supported to participate in this program. It is critical in any effort focused on under-served communities to make sure their participation is supported. People in these communities are already over taxed for so many different reasons. Staff time was of course also covered. Participant time was simply prioritized.

Components of the program

The program components included:

A student paid summer internship that was comprised of:

  • Weekly data skills workshops
  • Student-driven project based learning (internship style) to reinforce skills through application.
  • Peer to peer mentorship where students from the previous years would return as a mentor/intern that would mentor students in the current cohort.

Faculty support to:

  • Add add technical data skills elements to their courses (train the trainer)


  • Lots of evaluation surrounding both participant satisfaction with the program, sentiment around belonging to the science community and around learning.

Evaluation was a critical part of this work (and the entire program that I build at CU Boulder. As such i’ll talk about it separately in another blog post. I will also talk more specifically about the elements of this program in a separately. Each of the above elements were carefully designed with this audience in mind.

What is earth and environmental data science?

It’s important to place this program in the context of the field of earth and environmental data science. I define earth and environmental data science as skills at the intersection of open science and data science including:

  • technical data science skills
  • the ability to work with different types of data
  • communication skills
  • collaboration skills
  • and finally the ability to create workflows using open reproducible approaches.

These skills are critical because there is a huge job-market demand for them today. Traditional data science skills are useful. But understanding how to apply those skills is even more in-demand. Further, as science and industry becomes increasingly collaborative, and interdisciplinary there is a demand for people who can communicate and collaborate in diverse group settings.

All of those skills are what makes up the field of earth and environmental data science.

An image showing green blocks with the 5 components of earth data science including domain science, communication and collaboration, data skills, using diverse data types and reproducible workflows.
This is a graphic that I made showing the various elements of earth and environmental data science as I defined it in our program. It includes: The ability to use different types of data, technical data science skills, communication skills and collaboration skills. The entire program is founded in the idea that all workflows should be open and reproducible.

I started teaching earth and environmental data science skills (minus the open reproducible elements) in my first science related job at Penn State; and they were core to the program that I built at NEON to support ecologists using NEON data.

Our Partner Institutions

In addition to CU Boulder, we had 3 other partner institutions (actually 4 but one had to drop out after year 1):

Tribal colleges

Hispanic serving

If there is a job market for these skills, why aren’t smaller schools teaching earth and environmental data science already?

You may be wondering: if there is such a demand for these schools, why aren’t all schools, big and small, well-funded and lesson funded, teaching them to their students?

The answer to that is - it’s complicated :). Even at larger universities, most have traditionally taught data science in isolation from science skills(atleast this was the case in at the time). Or perhaps they teach the skills together but don’t talk about reproducibility, and communication and collaboration skills.

And at smaller schools courses offering these skill were even more scarce.

There are many, often complex, reasons for this including:

  1. Lack of faculty and instructors with skills needed to teach earth and environmental data science.
  2. Faculty who might be interested in teaching these skills, often don’t have resources to learn them.
  3. Faculty that have the skills don’t necessarily have the time to modify curriculum.
  4. Lack of institutional funding to develop new programs and curriculum.

Plan a: Give schools a course that’s ready to teach (spoiler: this didn’t work)

Originally when starting this program, my vision was that we could provide schools with a well designed entry-level earth and environmental data science focused course and curriculum that they could just teach at their school.

This idea was extremely naive. Add a new course to the catalog of offerings at these schools was a massive undertaking. Even when provided with the curriculum!

  • the course needed needed to fit into existing curriculum
  • they needed someone with skills that could teach the content year after year
  • finally, and most importantly there needed to be enough students at the school who wanted to take such courses, to warrant resources for such a curriculum!

Students don’t see themselves as data scientists and thus don’t know to pursue such skills

I was surprised at the time that student demand for technical data science programs at some schools also drove the curriculum gap.

I didn’t consider that if a population of people are not represented in a particular field, they may not see themselves as being able to pursue careers in that space. They may not even want to be a part of that space.

Seeing yourself as a potential part of a community matters

In short, often students are unaware of career paths available to them. So, they don’t consider pursuing data science as their peers and role models aren’t pursuing data science.

This was especially the case at the tribal colleges. No demand, no courses or investment from the schools.

Yes, we are talking here about how systemic racism actually impacts demand for curriculum that could have huge job potential.

Having a sense of belonging matters as does confidence

To me, the systemic nature of this issue was the most profound. In working with many students from diverse backgrounds, I have found that many often don’t think they are capable of coding.

Frankly, I was one of those many years ago and I have the inherent systemic “privilege”, that comes with being a white, american female! I lacked confidence. I had imposter syndrome. I still do carry soem of that with me in my daily work.

I’ve seen this a lot in my teaching.

Lack of confidence also tends to be more common with women compared to men.

But, there are so many well-paying jobs for those with domain specific data science skills!! And I knew that these students in our program were capable. They just needed to be empowered in different ways. And they needed support to commit to the process.

Building earth and environmental data science capacity at smaller schools

Back to the EDSC, our program goal was to build capacity at these schools to teach earth and environmental data intensive science. We also wanted to build student awareness of job potential in this space.

While I think we did the former well, I am not convinced about the latter… more on that in a bit.

Graphic showing how we designed the program to be adaptive and meet institutions at their current data science teaching capacity and then build from there.
Capacity building. Our goal was to meet each institution where they were in terms of capacity to teach data skills. We would then build capacity from that starting place to avoid overwhelming faculty.

How The Earth Data Science Corps (EDSC) worked

Our EDSC program was broken up into three sub-programs.

Program 1. Provide earth and environmental data science training to faculty and students

We lead initial workshops. The idea here was that faculty could learn in year 1 from our training. And then in years 2 and 3, they could slowly build skills to teach some of the curriculum (with our support!). The ultimate goal was that this experience would help faculty build skills and confidence to teach this content in their courses.

This program worked on some levels. While faculty never ended up leading the workshops, they did end up setting up work sessions with the students. And through helping in those session they built confidence and skills.

Some direct thoughts from participants are below

I am taking away some more confidence with python and data science as my department is moving into this direction. As for our tribal enrolled students and their projects, it was good to see that they were able to show the importance of doing a practical verse theoretical summer project.

Program 2. Supporting faculty in adding earth and environmental data science curriculum to their courses

This involved faculty-specific mentoring sessions where they developed curriculum in small groups that they could teach in their programs. We provided a lot of curriculum through our online textbooks on https://www.earthdatascience.org..

I aim to build on the data science and coding skills I learned this summer and implement them into my research program at my institution.

And we also provided Jupyter notebooks through Google Colab (in year two) that served as resources to support teaching.

Program 3. Build career awareness

The third activity involved career awareness webinars. The idea was to connect students to people in both industry and academia who were early career and pursuing earth and environmental data science related careers.

This worked for some who said things like:

It was a great sampling of people who have found their way into the field of EDSci, including someone who does freelance database management, which I hadn’t even considered as a possibility. Eye opening.

However for others it was not enough. Our panelists often had PhDs. Our panelists came from very different backgrounds. However, none were tribal. Nor hispanic. There were women, and women of color. But, it was difficult for some students to connect with those in the webinars.

Why? Because students didn’t see themselves in the panelists! (Note that common theme!)

Finally for job hunting the guest should not all have PhDs and having a person who actually had some difficulty would round the guest out better as well.

There was a certain level of privilege that is associated with getting a Ph.D. For these students getting an undergraduate degree was a big deal!

I didn’t think about that. And as such our students felt disconnected and didn’t see themselves in the panelists.

I can admit it, while parts of year 1 were great, that part was not quite as successful as it could have been. Lucky for us we had two more years after that first round of activities to try again!

The importance of evaluation and adaptive program design

Evaluation and adaptive program design is critical for any program’s success. It is particularly important for programs serving groups that have been traditionally under-represented and underserved. And the quotes above were obtained because of consistent surveys that were filled out after each training informally (using mentimeter) and formally using Qualtrics.

Even though I entered into this program design with years of experience, there was (and will always be) a lotto learn. I took notes. Lots of notes.

And we improved both iteratively in year one on the fly, as we could. We also modified the program year-to-year. This is how all diversity focused programs need to run.

Be flexible.

Adapt to your audience.

These programs need to adapt through constant learning and asking questions.

I’d design this program much differently if I were to do it again. But I’ll save that topic for another blog post…

Wrapping up part 1 of a blog series on working with tribal and hispanic-serving colleges

This is part one of a series of posts on the earth and environmental data science corps program. In part two, I’ll share some more details about the program’s core design, with a focus on the infrastructure that we used to support the program.

I’ll talk about what worked, and what didn’t work. And i’ll cover some core lessons learned.

Stay tuned for more…

Tags: data science , deia , earth and environmental data science , open-education , teaching

Categories: deia