Scraps from my internship part 1: programming concepts28 Jan 2017
I am so behind with my blog that I haven’t even gotten round to talking about Enthought yet, where I interned from September last year until a couple of weeks ago. Enthought write scientific software as well as running programming training courses. They do almost everything in Python, and there are some exceptionally skilled Pythonistas working there.
I actually originally learned Python using Enthought’s training videos and exercises, so it was very cool to get to work there. For my internship I worked on adding a new feature to the Canopy Geoscience application, a piece of software that helps geoscience researchers analyse their data.
I tried to keep a list of new concepts/tools as I came across them during the course of the internship. Rather than blog about the project I thought I’d jumble together some things I learned. Enthought very kindly gave me a copy of Fluent Python as a going away gift, so I’ve referenced it a few times here; it’s a very good book.
This was threatening to turn into a monster essay so I’ve decided to break it down into three separate posts:
Read on for part 1!
Part 1: Programming concepts
Navigating a huge codebase
The hardest thing I found about my project was learning how to work with a big codebase. I imagined the codebase like a big and complex mechanism, with lots of interconnecting cogs, cams and pulleys. A bit like the inside of this watch:
(Although unlike a watch, codebases tend to be sprawling and idiosyncratic, more like an organism that evolved over time than an elegant piece of machinery, so they are rarely neat and predictable. Anyway.)
I didn’t need to be intimately au fait with each and every part of the mechanism to be able to add a new feature, but I needed to be able to hold the general architecture in my head, and at the same time zoom in locally to the part I was working on. I found this really challenging and tbh I think it’s probably one of those skills that takes years to hone. Fortunately for would-be tinkerers, the danger of breaking something can be mitigated somewhat if you or someone else has written good …
When I was working on personal software or data science projects, I knew I was supposed to write tests, but it was really more of an aspiration than a necessity. In a grown-up software environment, tests are a necessity if you want to have any kind of safety net. Whenever I wrote a new feature, I got into the habit of writing tests for it using Python’s unittest module. Sometimes writing the tests took longer than writing the code itself.
I also used mocking a bit, which lets you replace parts of your system that you want to test with mock objects. You might want to do this if the real objects are impractical to include in the test e.g. perhaps you want to test a method that calls another method to open an internet page. You don’t want to actually open the page during the test, so you mock the sub-method and check that it got called at the right time. You could do this within a context manager (see below).
NB: You should avoid overzealous mocking. Ideally you should always be testing the behaviour of the original code pattern, rather than mocking your problems under the rug.
This is a broad programming concept that gets used a lot for designing user interfaces. The idea is that you have three discrete parts: the Model, which is the underlying data or information that you want to display; your View, which the display itself; and the Controller, which combines the model and the view. In the canonical Model/View/Controller (MVC) system, the Model and the View should never know that each other exist; it’s all up to the controller to manage the two-way data flow between them. In practice, the boundaries can be more fuzzy.
This separation of concerns is supposed to be make your code more easily re-usable and testable: you can swap out the model for another and the view shouldn’t care, and vice versa. I used MVC via Enthought’s TraitsUI package. MVC seems to be one of those contentious programming concepts that makes people very cross, as evidenced by the comments on this article (warning to sensitive readers: the word ‘hogwash’ gets thrown around).
Interfaces and ABCs
Interfaces are another programming concept that you can be using in Python without even realising it. Fluent Python defines interfaces as:
The subset of an object’s public methods that enable it to play a specific role in the system.
To look at it another way, an interface is the essential set of functions of attributes that make your object … objecty.
For example, in Python the only methods that a class needs to be considered a sequence are the
__getitem__ methods. A class that implements these methods is a sequence, regardless of what classes it inherits from, or whatever other methods it has. This set of methods comprise a sequence’s interface. In this context where the interface is not formally declared, it is known as a protocol. You might be familiar with language such as ‘file-like object’ or ‘callable’ to describe Python protocols - objects that behave in a certain way in certain contexts.
The process of operating with objects regardless of their types, as long as they implement certain protocols, is known in the programming community as duck typing (never mind if it is a duck – does it quack like one?). This is a central concept in a dynamically-typed language such as Python, when type-checking is done at runtime.
There are also more formal ways of implementing interfaces in Python. These require you to explicitly define the interface and register any classes that implement it. When I was at Enthought I did this via the Traits library, which has its own custom syntax for interfaces (scroll down to ‘Implementing an Interface’). Core Python has an equivalent called ‘Abstract Base Classes’ (ABCs), which are nicely introduced in Fluent Python chapter 11.
Following on from duck typing is operator overloading. This is a concept that just means that certain operators (such as + - =) can have different behaviours based on the type of arguments. So, if I define a custom class
Book with an attribute
pages, I can
+ two or more instances of
Book by defining the methods
__add__ on my class
(example from here).
Now we can happily use the
+ operator, and with the magic of operator overloading we need never check the type of what we’re adding (it could be a number, a Book or a Duck, we don’t care).
Finally for this post, mixins are classes that offer methods to other classes but are not themselves ever designed to be instantiated. Apparently this is a common design pattern in object-oriented languages. In Python mixins are used via multiple inheritence: a class that needs to use the mixin methods should inherit from the mixin class. However, the subclass should also inherit from another non-mixin class. This is not a formal rule, but if you follow it (and others in chapter 12 of Fluent Python!) you can bring order to the complexity of multiple inheritence.
Here’s an example I wrote involving cute furries. Imagine you have the superclasses
Cat that each have their own attributes that are specific to their species.
Now imagine that we want to make subclasses of
Cat that have eating behaviour. We can create a mixin with eating methods:
Now we can use the mixin when creating the subclasses, using multiple inheritence (note
super will only work without arguments like this in Python 3):
Now if we instantiate
Manx, they will inherit the same eat methods from
We shouldn’t be inheriting from just
Eat_Mixin because this was never designed to be used as a concrete class.
That’s it for my first post in this series. Post 2 will look at some things about Python that I learned.