Loïc Dachary · 7513545f
--- a/software-developer-03.txt
+++ b/software-developer-03.txt
+Q: For how long have you been a developer?
+A: I have been a developer since I was 11, so 40 years.
+
+Q: What percentage of your paid work involves developing software?
+A: 100%
+
+Q: Do you develop software on your own time or is it just for work?
+A: From time to time I also work on private projects on my own. Right now I am not currently working on any non-work software development project but I am thinking about it.
+
+Q: When you work on a software development project, would you be able to describe in steps, what is your routine as a free software developer?
+A: Much of the development work that I do in my job is maintenance-related, meaning that software I develop is tooling for automating parts of software maintenance workflows. My work differs from what a lot of other software developers do if they are not involved in maintenance. When I am working on code then the rhythm of the work is as follows: I come up with an idea in my head and there is a hierarchy of complexity of these ideas. If it's a complicated or high-level idea, then I try to break it down into smaller pieces by analyzing it. Once I get an actionable idea then I go into the git repository of my project and I start a "work-in-progress branch". I try to give this branch a name that describes the idea I am trying to implement. At the same time, I might also open an "issue" in the project's issue-tracking system to describe the idea in more detail. Whether or not an issue gets created often depends of how far long the project is: so, if it's a mature project, I typically write up the idea as an issue, and that's were the forge sometimes comes into play, because if there are other people involved in the project, then I want to communicate about my idea with other people. The forge is useful for that: the people interested in the project can visit the forge: it's a kind of interactive website. If I open an issue there, then other people can see my idea. By getting the idea "out there", also it forces me to write it out in words. That helps me to be more disciplined about what the idea is and how I am going to implement it. On the other hand, if it's just me working on the project, then I don't bother with a formal issue tracking system, and just write my ideas in a text file called "TODO" which I store in the top-level directory of the project repo. But even then I do often write up the idea, especially if I don't plan to implement it immediately. After the idea write-up and the creation of the "work-in-progress" branch, I start editing source code files in the git repository. I use an editor called "Vim". As I develop the implementation of my idea, I try to turn it into a series of source code changes that can be committed into the git public repository in a way that makes sense. I try to make the commits atomic and write up a real description of each commit - a description that makes sense, at least to me. It depends if there are other people involved in the project how exactly I write the commit message: when writing, I try to always be cognizant of the audience - i.e., the people who might read what I am writing. Once the idea has been turned into a series of commits - it could be as little as one commit or as many as 10 or 20 commits - in my local work-in-progress branch, I then push it to the forge and open a "pull/merge request" (GitHub uses the term "pull request", while Gitlab uses "merge request" to denote the same thing). Once opened, the pull/merge request slive a life of its own: other people review the code changes I am proposing and getting the request accepted can be a long and complicated process. If I am working alone one a project, I don't open a pull request: I just merge the work-in-process branch into the master branch.
+
+If the project that I am working on is a complicated program, then while I am writing the code, I will be continually running it through a set of unit tests (which are developed concurrently with the code and are actually part of the code) and a linter. I typically come up with some kind of automated way to lint my code changes locally. For example if it's a bash script I use a linting tool called shellcheck, which catches syntax errors and other typical coding mistakes that humans make. As a human, I made mistakes and the computer can flag those for me. If it's a Python program I use pylint and also unit tests with tox. Each programming language has its own testing frameworks and linting tools. Once the code is built I also add integration/functional tests which actually deploy the software in a virtual machine and simulate a user interacting with the project. That would be another level of tests. Either those have to be run locally or there might be in the forge some kind of automation which triggers the tests on a virtual machine or a container and shows the results of the tests in the pull request.
+
+Q: In the end your change goes into the main branch of the project and do you consider that it's part of your work as a developer to release the software to the user or is that more outside of the scope of the work of a developer?
+A: Like I said, I am not a typical developer. A lot of my paid work focuses on releasing/delivering the software to users. I am less involved in writing the software itself and more involved in packaging it and releasing it. Then I maintain stable branches or releases in various build systems/operating systems. So I definitely see the release workflow as something that I as a developer need to be aware of and work on to improve and streamline it.
+
+Q: When you're saying you're not a typical developer, do you mean by that there are few people doing that compared to the higher number of developers?
+A: That's the feeling I get, yeah.
+
+Q: You don't say that because you think that it's not part of a developer work?
+A: It depends on how big the project is. There are very big projects - like the Linux kernel, or something like Ceph, OpenStack or Kubernetes - which have tens or hundreds or even thousands of developers working on them. In such a project, you necessarily have the developers specializing in various components: maintenance and release are just one component of the project in that case. So a database programmer or someone whose specialty is optimizing databases that run only in memory for example, that person would also say "I am not a typical software developer". If I am working on a project where I am the only developer or occasionally there might a contributor or two, then I have to do everything: I have to write tests, I have to do the releases, I have to write all the code, I have to know the whole project inside and out. That's a completely different mode of work.
+
+Q: Which forges do you use?
+A: I currently use GitHub and a private instance of GitLab.
+
+Q: Do you consider Redmine to be a forge? Why?
+A: No, I don't see Redmine as a forge. I wasn't aware that Redmine is able to store source code. I am not aware of any project using it that way.
+
+Q: Is there a project that you have in mind that involves you using more than one forge at the same time or sequentially in the time?
+A: One project that I authored began on SourceForge, and then later moved to GitHub.
+
+Q: Are there other projects where your project lives on a forge and has dependencies that live on other forges?
+A: Yes. I am involved in the Ceph project which uses heavily both GitHub and Redmine at the same time.
+
+Q: Can you describe how they fit together?
+A: The code is maintained in GitHub; all changes to the code take place using GitHub pull requests. Branches are pushed to GitHub. The main branches are living in GitHub: GitHub is the single source of truth for anything regarding code. However, bug reports and certain other workflows use Redmine. Issue tracking is done via Redmine and due to this "dual-forge" system, some pieces of information have to be duplicated on both forges. For example, software versioning: the actual version of the software is determined by a version tag which is pushed to git. Then the version tag has to be duplicated in Redmine in order to enable developers to mark an issue as being fixed in a certain version or as being present in certain versions.
+
+Q: How does that duplication happen?
+A: As far as i know it's a manual process: the versions are created manually, whether in GitHub (by creating and pushing tags) or in Redmine (using the Redmine web GUI or REST API).
+
+Q: Are there other aspects of the project that involve that kind of interactions between GitHub and Redmine?
+A: Some documentation is maintained in Git and other documentation is maintained in Redmine. It's not always clear where the documentation that one is looking for resides.
+
+Q: What about backports? Backports and the interactions between GitHub and Redmine?
+A: They do involve such interactions, yes. The backporting process is a bit schizophrenic because of these two forges. The actual backport itself is opened as a GitHub pull request but everything involved in tracking the backport (flagging bug fixes for backport, tracking which PR backported a particular fix, tracking which releases a fix has been backported to, etc.) takes place on Redmine. We have to cross-link the pull request on GitHub with the actual issue in Redmine which describes the bug. The backport pull requests also have to link back to the respective Redmine issues. Keeping this in order is complicated and, frankly, a bit noisy.
+
+Q: Is it complicated because Redmine and GitHub are not integrated for that work?
+A: Exactly. They are not integrated at all.
+
+Q: So how does it happen? Is it manual?
+A: We have some scripts to help developers automate repetitive tasks such as opening a backport issue in Redmine and opening a backport pull request in GitHub. The scripts also automate the process of cross-linking those two together. However, use of these scripts is optional and entirely at the discretion of the person doing the backport. In many cases, the scripts are not used and in those cases the linking becomes a manual process. Also, newcomers to the project typically do not have sufficient permissions in Redmine to do this cross-linking at all.
+
+Q: Say you have a piece of software and it depends on other modules. Ceph, for example, uses rocksdb and other dependencies that are outside of Ceph. Do you have in mind a project where the software is on one forge and one of the dependencies - or maybe a lot of the dependencies -  are on other forges?
+A: I guess you're referring to external dependencies that get pulled into the release and bundled into the packaged software, as opposed to dependencies that are packaged separately. Yes, I think many projects have such dependencies these days. JavaScript applications are particularly well known for having large numbers of external dependencies that get pulled in from the Internet using a tool called "npm", and then rolled into a larger "source code archive" file (i.e. what we call a "tarball") together with the project's own source code. I don't have any particular project in mind that does this on a regular basis - other than Ceph, that is - but I'm sure there must be quite a few.
+
+Q: I suppose one of the projects you work on or you worked on has dependencies that are either on the same forge or on another forge. What happens when you find a bug in such a dependency? Do you have an example in mind?
+A: As I just explained, there are two types of dependencies: those that get pulled into the software's source code tarball and those that are packaged separately and merely referenced in the project's packaging. In an ideal world, software developers would not be involved in packaging and they would leave the work of packaging to system integrators, or "package maintainers" as they are called in the Linux world. But the free software projects I work on tend to get involved in packaging, because there is what I would call a "vested interest" in having the software available on different Linux system platforms (both community platforms such as Debian, and ones backed by for-profit companies). Such projects even get involved in writing regression tests for various platforms and automating the process of running those regression tests on the code: i.e., testing the builds on various Linux distributions to see if they actually work in a real-life scenario. If we find that a certain build doesn't work in a certain environment then it could be indeed - it often is - a problem with a dependency.
+
+Q: What happens when the dependency has a bug that prevents the software from running properly?
+A: To use the RocksDB example, bugs in that dependency get fixed by applying a patch to the Ceph project's fork of the RocksDB git repo, and then triggering a new build. But since that means the fork has diverged from the upstream RocksDB project's source code, the patch has to be submitted to the RocksDB project, where it might end up being accepted in a modified form, or even rejected.
+
+Q: How do you track that problem?
+A: It is tracked in multiple places. First, in the Ceph issue tracking system (Redmine), and then later in the upstream RocksDB issue-tracking system once we start trying to get the fix accepted there. Many bugs are also tracked separately by for-profit system integrators, each of which have their own issue tracking systems. Full disclosure: I work for one such company: SUSE.
+
+Q: How does SUSE track the relationship between their issue and the Ceph issue?
+A: By posting a link (URL) to the Ceph Redmine issue tracker as a comment in their issue.
+
+Q: Are their specific fields or facilities to do that in Bugzilla or Redmine?
+A: Not really. SUSE uses Bugzilla as its issue tracker. Neither Ceph's Redmine instrance nor SUSE's Bugzilla instance has any special field for linking the issue to other bug trackers.
+
+Q: How do you get notified about the progress of one issue or the other? Do you have to subscribe to both? How would someone commenting the issue be able to figure out that the two issues are linked?
+A: The trackers themselves do not have any built-in awareness of other trackers. If a person were able to figure out that issues were linked, they would do so by reading the comments, but this assumes that someone actually took the trouble fully cross-link the two issues by posting the URL of the "other tracker" in a comment, and it also assumes that the person has the ability to open the URL. Not all bugs are public: some require special permissions to access. 
+
+Q: There is one additional place where you mention that changes can happen: within the package itself. So for the Ceph package that doesn't behave as it should or doesn't build, maybe you could add a patch in the package itself? The same kind of question: how do you track the relationship between this patch in this package? How is the life cycle of this patch?
+A: In my job I work with RPM spec files. A spec file is a kind of glorified script which is used to build packages. And, indeed, the RPM spec file can refer to one or more patches and apply them to the source code before starting the build. The RPM package is the responsibility of its maintainer, and that extends to any patches that are applied by the RPM spec file. That means: each patch that is added to the spec file increases the workload of maintaining the package. Since that is *my* workload, I try to avoid having any patches like that. But occasionally I do have to deal with such "downstream" patches. The downstream patches might actually come from the upstream repository, which has plenty of patches that have not yet been integrated into the stable branch for various reasons. It's difficult to find a generalized answer to your question regarding how these patches are tracked, other than to say that the packages are listed in the spec file or in the downstream git history, and that it is the package maintainer's responsibility to know about them - i.e. "track" them.
+
+Q: When you build this package with this specific patch which is actually a slightly different version of the software you're packaging, where do you push these changes?
+A: Either I push the downstream patch directly to the downstream git repo, or I add it to the spec file. In the former case, that means I push them to GitHub, and in the latter case to the openSUSE Build Service.
+
+Q: When you do that how do you keep track of the fact that this particular patch was integrated upstream? You mention that you may take one because it's not yet in the stable branch but how do you keep track of the fact that now it is?
+A: I personally rely on Git as much as possible. When I use Git to prepare a new branch for an update, and one or more downstream patches have made it into the upstream repo since the last update, I do it in such a way that Git automatically drops those patches. Git is smart: it sees that applying the patch would not actually change the source code at all (because the patch got accepted upstream and is now present in the base).
+
+Q: Essentially you are relying on Git to remove duplicates and therefore make sure that you don't carry something for longer than necessary.
+A: Yes. But when the patch is stored in the build service and applied by RPM, then Git is not involved and the maintainer of the package is responsible for keeping track of all the patches and manually removing those that have already been upstreamed. RPM does not automatically drop redundant patches.
+
+Q: You mention at the beginning of the interview: migrating a project from SourceForge to GitHub. Do you remember how it went? What are the steps that you took?
+A: It didn't go well. In fact, it wasn't really a "migration" at all. I just took a snapshot of the latest state and pushed that to GitHub in a commit. In other words, none of the git history that was stored in SourceForge survived the migration to GitHub. Later, I found out that it is possible to migrate files from one git repo to another while preserving previous revisions of those files from the git history.
+
+Q: Was there issues associated to the software or mailing list or any sort of tooling around the project that stayed on SourceForge or did you migrate them or parts of it?
+A: No, the easiest thing to do was to start from clean state. I didn't try to migrate anything, and whatever was on SourceForge just stayed there. It was a one-man project, so I was the only one who got disrupted by this change.
+
+Q: After you did that, did you keep the old project open in read-only mode or something like that or did you close it?
+A: I kept it open. I think it still there. I think SourceForge still exists. But at some point it became very annoying to use. It suddenly became commercialized. It felt like it was no longer there to support me as a developer of free software but instead it was there to make money. At that time I had become familiar with GitHub because I was asked to use it in my job. In those days, GitHub was very focused on supporting free software and it had lots of features, like pull requests for example, that were attractive. I started perceiving the advantages of GitHub to be outweighing the pain of switching. So I did it. And later GitHub then became acquired by Microsoft and it's now no longer so clear. Although Microsoft is clearly trying very hard not to alienate free software developers, you can kind of feel the "corporate greed" in the background, which was not there before. That's the feeling I had: it's just a feeling, though. I can't really point to anything in particular. Oh, wait. GitHub actions for example: I don't like that feature. It feels like a trap.
+
+Q: Is there a forge that you like better than other?
+A: Yeah, the GitHub experience is the smoothest, although it has deteriorated since Microsoft took it over. GitHub is easy to use. That's my preferred forge. I am using GitLab as well and gradually getting used to it but in some ways seems unnecessary complicated compared to GitHub. But I like the fact that GitLab is free software that anyone can install, whereas GitHub is more like a service that you can use without paying for it in money.
+
+Q: Is there a project that you participate in without being a member of the forge in which it is hosted? Is there a forge that hosts a project and you tried to participate but you did not use the forge at all?
+A: No.
+
+Whenever I have pushed fixes to software projects, it always involved jumping through hoops to become a member of the forge where the project's source code is maintained. For example, once I fixed a minor bug in a piece of software that is an external dependency of Ceph. The project's source code is hosted in GitHub, but patches have to be submitted for review through a completely different service - called Gerrit. In order to get my patch into the project's source code on GitHub, I found that I had to first become a registered user of Gerrit in order ot present it there for review. This was annoying because all I wanted to do was submit a small code change, not type my personal data into a form, click with my mouse on a button saying I agree to "terms and conditions", etc.
+
+If I wanted to submit a patch to the Linux kernel, the experience would be very different. Contributing to the Linux kernel doesn't involve a forge at all: I would just send my patch to their mailing list.
+
+Q: Do you know people who contribute to a project without using the forge that the project is hosted on?
+A: The Ceph project used to have an alternative (non-GitHub) way of contributing: patches could be submitted to the developer mailing list, similar to how the Linux kernel contribution process works. But a year or so ago this option was eliminated. So nowadays the only people I know who fall into this category are Linux kernel developers (since to my knowledge the Linux kernel does not accept patches via forges at all).
+
+Well, actually, I do occasionally proxy for colleagues who come to me with patches for the Ceph spec file. Since the spec file is maintained upstream, these patches have to be submitted by someone with a GitHub account.
+
+Q: Do you know of any forge that excludes people based on where they live?
+A: I can't think of anything. Does GitHub exclude people based on their location?
+
+Q: Do you need to trust the forge for the security of the software that is hosted on the forge?
+A: Yeah, that's pretty clear. By using the forge you are resigning yourself to a situation where a large portion of the history of the project is stored on the forge service. You have the Git repo and the Git history locally, but everything else  - all of the pull requests, all of the review correspondence, etc. - is stored elsewhere and there is no obvious way of even downloading it.
+
+Q: How do you ensure that your commits are not rewritten? Is there a way for you to prevent that to happening?
+A: I remember actually when I was first exposed to GitHub, I was surprised that any developers of free software were using it at all. It seemed like an obvious "vendor lock-in" type of situation. But, over time, I ended up "drinking the Kool-aid" myself and using it and falling into the same trap. It's like a drug. The idea that GitHub might change the code in the hosted Git repos, it's possible. If there is a way, I am not really sure what it is. I think I am completely vulnerable to that.
+
+Q: Are you aware of the SHA1 collision attack that would allow someone to change one commit in the history without anyone noticing?
+A: I learned about it but only by reading the questions for this interview. So no, I wasn't aware. I read the article, and I can envision how that might work. It was hard for me to imagine that being possible and I guess the answer is to use more bits to make the computation more expensive.
+
+Q: In the context of using multiple forges if you had one wish to create something that would help you, what would you wish for? It doesn't need to be realistic or anything.
+A: It's kind of a wistful nostalgic wish: I wish we could go back to the days before these forges, go back in time when you had a git repo and a mailing list. I think the Linux kernel is doing it the right way: they're using Git the way it was designed to be used: as a decentralized repository of code revisions. I think it's a mistake to centralize Git.
+
+Since forges are useful, if there was a way to backup the data that is on a forge, related to my project (the history of all discussions, the issues, the pull requests) and have it offline that would be a magic wand kind of wish. Taking the idea further, if there was a way to upload that data to a different forge and then you could even get into mirroring a project over multiple forges.
+
+Q: Is there anything in using multiple forges that actually work for you? That you're happy about?
+A: [Unanswered]