Blog

Google Summer of Code 2023

Google Summer of Code with
MDAnalysis 2023

MDAnalysis has been accepted as an organization for Google Summer of Code 2023! If you are interested in working with us this summer and you are new to open source, please read the advice and links below and write to us on the GSoC with MDAnalysis mailing list.

We are looking forward to all applications from any new and beginner open source contributors over 18 years old or students. Projects are scoped as either 175-hour (medium) or 350-hour (long) size. The duration can be extended from the standard 12 weeks to 22 weeks.

The application window deadline is April 4, 2023 - 18:00 UTC. As part of the application process you must familiarize yourself with Google Summer of Code 2023.

If you are interested in working with us please read on and contact us on our GSoC with MDAnalysis mailing list. Apply as soon as possible at https://summerofcode.withgoogle.com; the application window opens on March 20, 2023 but potential GSoC Contributors are expected to familiarize themselves with application requirements and mentoring organizations as soon as possible. It’s also never too early to discuss application ideas with us!

Project Ideas

If you have your own idea about a potential project we’d love to work with you to develop this idea; please write to us on the mailing list to discuss it there.

We also have listed several possible projects for you to work on. Our initial list of ideas (see summaries in the table below) contains various projects of different scope and with different skill requirements. However, check the ideas page — we might add more ideas after the posting date of this post.

Our experience shows that having the listed skills increases the chances that a project will be completed successfully, so we use them as part of our decision criteria in choosing GSoC contributors.

project name difficulty project size description skills mentors
1 Generalise Groups hard 350 hours Generalise concept of groups Python, NetworkX, Molecular modelling @fiona-naughton, @richardjgowers, @yuxuanzhuang, @RMeli
2 Extend MDAnalysis Interoperability medium 350 hours Extend converters module to other relevant packages Python, Molecular Modelling @fiona-naughton, @hmacdope, @yuxuanzhuang, @RMeli
3 Benchmarking and performance optimization medium/hard 175 hours write benchmarks for automated performance analysis and address performance bottlenecks Python, Molecular Modelling @hmacdope, @orbeckst, @RMeli
4 Transport property calculations medium 350 hours write analysis code to calculate physical transport properties Python, Physics/Mathematics @orionarcher, @hmacdope
5 Implementation of parallel analysis framework hard 350 hours implement parallel framework in MDAnalysis Python, Parallel Programming, Molecular Modelling @yuxuanzhuang, @orbeckst, @RMeli

Information for prospective GSoC Contributors

You must meet our own requirements if you want to be a GSoC Contributor with MDAnalysis this year (read all the docs behind these links!). You must also meet the eligibility criteria. Our GSoC FAQ collects common questions from applicants.

The MDAnalysis community values diversity and is committed to providing a productive, harassment-free environment to every member. Our Code of Conduct explains the values that we as a community uphold. Every member (and every GSoC Contributor) agrees to follow the Code of Conduct.

As a start to get familiar with MDAnalysis and open source development you should follow these steps:

Watch the MDAnalysis Trailer

The MDAnalysis Trailer on YouTube is a one minute introduction to MDAnalysis.

Complete the Quick Start Guide

We have a Quick Start Guide explaining the basics of MDAnalysis. You should go through it at least once to understand how MDAnalysis is used. Continue reading the User Guide to learn more.

Introduce yourself to us

Introduce yourself on the mailing list. Tell us your GitHub handle, what you plan to work on during the summer or what you have already done with MDAnalysis.

Close an issue of MDAnalysis

You must have at least one commit in the development branch of MDAnalysis in order to be eligible, i.e., you must demonstrate that you have been seriously engaged with the MDAnalysis project. We have a list of easy bugs and suggested GSOC Starter issues to work on in our issue tracker on GitHub. We only accept one GSOC Starter issue per applicant so that everybody gets a chance. If you want to dive deeper, we encourage you to tackle some of the other issues in our issue tracker.

We also appreciate contributions which add more tests or update/improve our documentation.

Final remarks

We recommend you start your application by working on an issue. It will give you a better understanding of MDAnalysis as a project and improve the quality of your application.

To start developing for MDAnalysis have a look at our guide on contributing to MDAnalysis and write to us on the GSoC with MDAnalysis mailing list if you have more questions about setting up a development environment or how to contribute. We are also happy to chat on our MDAnalysis Discord server in the #gsoc channel (join with the public invitation link).

We look forward to working with you in GSoC 2023!

— MDAnalysis GSoC mentors (GitHub @MDAnalysis/gsoc-mentors, Discord @gsoc-mentor)

GSoC / Outreachy 2022 wrap-up

This year MDAnalysis participated in two programs with wonderful contributors from around the globe. We continued our participation in Google Summer of Code for another year and also participated in Outreachy for the first time. Huge thanks to these organisations for supporting MDAnalysis and the students.

Google Summer of Code / Outreachy 2022

Google Summer of Code

Our two outstanding GSoC contributors Aya Mohamed Alaa (@aya9aladdin), and Bjarne Feddersen (@BFedder) successfully completed their GSOC projects.

You can read more about their projects and packages in their blog posts:

— GSOC Mentors @jbarnoud @hmacdope @ojeda-e @IAlibay @fiona-naughton @orbeckst @lilyminium @richardjgowers

Google Summer of Code / Outreachy 2022

Outreachy

Uma Kadam (@umak1106) did the very first MDAnalysis Outreachy Internship with her project Improve MDAnalysis by implementing type hinting.

— Outreachy Mentors @jbarnoud @micaela-matta @richardjgowers

Summary

We have immensely enjoyed working with Aya, Bjarne and Uma and look forward to seeing how they continue to contribute to the MDAnalysis community, and wish them luck in their future endeavours.

Outreachy Report - 2022 Improve MDAnalysis by implementing type hinting

About Me

I am Uma Kadam , a Computer Science and Engineering undergraduate at Indian Institute Of Information Technology Guwahati. My interests primarily lie in exploring ML & AI which is evident from my research internship experience on NLP . Participating in numerous hackathons, some of which I won, allowed me to explore new technologies and domains in computer science. My involvement with Outreachy provided me with a first-hand introduction to Open Source and set me on the path to a successful career in technology.

Linkedin LinkedIn   GitHub GitHub

Outreachy

Outreachy is a 12+ week internship program where contributors work with an open-source organization under the guidance of experienced mentors. By providing opportunities to work with participating organizations, Outreachy supports people from underrepresented groups in technological sector. Providing mentorship to build technical skills and establishing an inclusive community that has no room for systematic bias or discrimination, it aims to help minority members pave the way into the tech industry.

Motivation

The object-oriented Python library MDAnalysis analyzes trajectory data derived from molecular dynamics (MD) simulations in many popular formats. Python’s dynamic nature enables us to develop with speed, flexibility, and ease of use but if you are not careful, you may trade short-term expedience for long-term lack of maintainability because of the dynamic nature of Python. Type hints are implemented with the help of typing module.

While type hints and type annotations do hint towards or indicate the appropriate types they do not enforce them. By utilizing typecheckers such as mypy, I made sure that the code is performing what it should regarding the types passed around between functions, and the annotated function signatures only boosted the readability of the code as well as improved communication within it.

Introducing type annotations and type hints provided a multitude of benefits like:

  • The need to document the type in the docstring got eliminated.

  • Datatypes got clearly defined in the code, removing any potential datatype ambiguity.

  • It was beneficial in catching errors, improving linting and providing a more cleaner architecture for MDanalysis .

  • It was helpful in making the code more organized and speeding up the debugging process .

  • It provided us with optional static typing to leverage the best of both static and dynamic typing.

Contributions made during Outreachy Project

  • Addition of Mypy requirements to CI pipeline: #3705

    Addition of mypy to github actions workflow provided us the ability to run a mypy check on every new pull request made and to raise appropriate warnings whenever the type hints provided were incorrect or erroneous. Added customizations like running mypy checks only on more prioritized modules and so on . The type checking in CI was made to be blocking wherever required.

  • Providing Type Hints for lib module and annotations for init file: #3823 #3729

    The first PR deals with task of providing type annotations for the init file.

    The second PR deals with the task of providing type annotations and type hints for the lib module and usage of mypy type checker to ensure all the functions and variables are correctly type hinted. Usage of typing module and numpy.typing module for providing type hints. In the lib module I provided type hints for NeighbourSearch, PeriodicKDTree, Mathematical_helper_functions and so on.

  • Providing type hints for core module: #3719 Docs

    Dealt with the issue caused by circular imports and provided type hints for the core module.

  • Providing type hints for visualization module, auxiliary module, topology module, analysis module and converters module

    #3781 #3746 #3782 #3744 #3752 #3784

    Usage of np.ndarray and other type hints from the typing module for type hinting and annotating the streamlines and streamlines_3D files in the visualization module.Provide type hints, as well as use mypy to efficiently perform type checks for the different parsers present in the Topology module, and see if any unit tests have been broken as a result.Providing type hints for converters, auxiliary and other modules.

The annotations are not all merged yet, and they do not cover the full module. However, spending efforts on type annotations gave us an idea of the challenges it represents for MDAnalysis and will lead to better code in the future. Already, we identified corner cases that could have led to bugs.

Example :

    def search(self, atoms, radius, level='A'):

With the help of type hints this is tranformed to:

    def search(self, atoms: AtomGroup, radius: float, level: str = 'A') -> Optional[Union[AtomGroup, ResidueGroup, SegmentGroup]]:

Here the type hints used for the input parameters of the function search inform us that the type of Atoms is Atomgroup, radius is a float value and level is a string which takes default value ‘A’.

The return type of the function can be None or AtomGroup or ResidueGroup or SegmentGroup. Here the Optional keyword suggests that the type can be None or the type enclosed within its brackets and the Union keyword suggests that the type could be anything from the types conatined within its brackets.

Conclusion :

In my opinion, Outreachy provided an incredible learning opportunity for a newcomer to open source development like me. There is no doubt that the internship was an exceptional experience and I would like to extend my sincere gratitude to Outreachy, my mentors, and the MDAnalysis community for that experience. During my time working with the MDAnalysis library, I have learned a great deal not only from the technical side of working with the library, but also from experiencing the rich set of policies and guidelines that are created by the MDAnalysis community, and how they shape MDAnalysis’ inner workings.