Any tips to help a scientist become a better programmer?

Hey there!

I’m a chemical physicist who has been using python (as well as matlab and R) for a lot of different tasks over the last ~10 years, mostly for data analysis but also to automate certain tasks. I am almost completely self-taught, and though I have gotten help and tips from professors throughout the completion of my degrees, I have never really been educated in best practices when it comes to coding.

I have some friends who work as developers but have a similar academic background as I do, and through them I have become painfully aware of how bad my code is. When I write code, it simply needs to do the thing, conventions be damned. I do try to read up on the “right” way to do things, but the holes in my knowledge become pretty apparent pretty quickly.

For example, I have never written a class and I wouldn’t know why or where to start (something to do with the init method, right?). I mostly just write functions and scripts that perform the tasks that I need, plus some work with jupyter notebooks from time to time. I only recently got started with git and uploading my projects to github, just as a way to try to teach myself the workflow.

So, I would like to learn to be better. Can anyone recommend good resources for learning programming, but perhaps that are aimed at people who already know a language? It’d be nice to find a guide that assumes you already know more than a beginner. Any help would be appreciated.

MajorHavoc,

The O’Reilly “In a Nutshell” and “Pocket Guide to” books are great for folks who can already code, and want to pick up a related tool or a new language.

The Pocket Guide to Git is an obvious choice in your situation, if you don’t already have it.

As others have mentioned, you’re allowed to ignore the team stuff. In git this means you have my permission to commit directly to the ‘main’ branch, particularly while you’re learning.

Lessons that I’ve learned the hard way, that apply for someone scripting alone:

  • git will save your ass. Get in the habit of using if for everything ASAP, and it’ll be there when you need it
  • find that one friend who waxes poetic about git, and keep them close. Usually listening politely to them wax poetically about git will do the trick. Five minutes of their time can be a real life saver later. As that friend, I know when you’re using me for my git-fu, and I don’t mind. It’s hard for me to make friends, perhaps because I constantly wax poetically about git.
  • every code swan starts as an ugly duck that got the job done.
  • print(f"debug: {what_the_fuck_is_this}") is a valid pattern that seasoned professionals still turn to. If you’re in a code environment that doesn’t support it, then it’s a bad code environment.
  • one peer who reads your code regularly will make you a minimum of 5x more effective. It’s awkward as hell to get started, but incredibly worth it. Obviously, you traditionally should return the favor, even though you won’t feel qualified. They don’t really feel qualified either, so it works out. (Soure: I advise real scientists about their code all the time. It’s still wild to me that they, as actual scientists, listen to me - even after I see how much benefit I provide.)
rolaulten,

Along a similar vain to making a git friend, buy your sysadmins/ops people a box of doughnuts once in a while. They (generally) all code and will have some knowledge of what you are working on.

IonicFrog,
@IonicFrog@lemmy.sdf.org avatar

print(f"debug: {what_the_fuck_is_this}") is a valid pattern that seasoned professionals still turn to. If you’re in a code environment that doesn’t support it, then it’s a bad code environment.

I’ve been known to print things to the console during development, but it’s like eating junk food. It’s better to get in the habit of using a logging framework. Insufficient logging has been in the OWASP Top 10 for a while so you should be logging anyway. Why not logger.debug(“{what_the_fuck_is_this}”) or get fancy with some different frameworks and logger.log(SUPER_LOW_LVL, “{really_what_the_fuck_is_this}”)

You also get the bonus of not going back and cleaning up all the print statements afterward. All you have to do is set the running log level to INFO or something to turn all that off. There was a reason you needed to see that stuff in the first place. If you ever need to see all that stuff again the change the log level to whatever grain you need it.

MajorHavoc,

Absolutely true.

And you make a great point that: print(f"debug: {what_the_fuck_is_this}") should absolutely be maturing into logger.log(SUPER_LOW_LVL, “{really_what_the_fuck_is_this}”)

Unfortunately I have found that when print(“debug”) isn’t working, usually logging isn’t setup correctly either.

In a solidly built system, a garbage print line will hit the logs and raise several alerts because it’s poorly formatted - making it easy for the developer to find.

Sadly, I often see the logging setup so that poorly formatted logs go nowhere, rather than raising alerts until they’re fixed. This inevitably leads to both debug logs being lost and critical but slightly misformatted logs being lost.

Your point is particularly valuable when it’s time to get the system fixed, because it’s easier to say “logging needs to work” than “fix my stupid printf”, even though they’re roughly equivalent.

Edit: And getting back to the scripting scientist context, scripting scientists still have my formal official permission to just say “just make my print(‘debug’) work”.

Turun, (edited )

As a researcher: all the professional software engineers here have no idea about the requirements for code in a research setting.

I recommend you use

  • git. It’s nice to be able to revert changes without worry.
  • descriptive variable names. The meaning of descriptive is highly dependent on your situation. Single letters can have an obvious meaning, but err on the side of longer names if you’re unsure. The goal is to be able to look at a variable and instantly know what it represents.
  • virtual environments and requirements.txt. when you have your code working you should have pip (or anaconda or whatever) take a snapshot of your current python installation. Then you can install the exact same requirements when you want to revive your code a few months or years down the line. I didn’t do that and it’s kinda biting me in the ass right now.
QuadriLiteral,

As a researcher: all the professional software engineers here have no idea about the requirements for code in a research setting.

As someone with extensive experience in both: my first requirement would be readability. Single python file? Fine with that. 1k+ lines single python file without functions or other means of structuring the code: please no.

The nice thing about python is that your IDE let’s you jump into the code of the libraries you’re using, I find that to be a good way to look at how experienced python devs write code.

Turun,

You can jump to definition in any language. In fact, python may be one of the worst ones, because compiled libraries are so common. “Real signature unknown” is all you will get some times. E.g. Numpy is implemented in C not python.

QuadriLiteral,

My point about the jumping into was that you can immediately start reading the sources. Most alternative languages are compiled in some form or other so all you’ll see is an API, not the implementation.

Turun,

My comment was not asking for clarification, I am contradicting your claim.

Granted, my experience is mostly limited to python and rust. But I find that in python you reach the end of “jump to definition” much much sooner. Fundamental core libraries of Python are written in C, simply because the performance required cannot be reached with python alone. So after jumping two levels you are through the thin wrapper type and your compiler will give you an “I don’t know, it’s byte code”.
In Rust I have yet to encounter this. Byte code is rarely used as a dependency, because compiling whatever is needed is no issue - you’re compiling anyway - and actually can allow a few more optimizations to be performed.

Edit: since wasm is not yet wide spread, JavaScript may be the best language to dig deep into libraries.

QuadriLiteral,

Mostly ML or data processing libraries I would assume, I’ve read tons of REST server and ORM python code for instance, none of that is written in C.

Wrt rust: no experience with that. I do do a lot of C++, there you quickly reach the end as typically you’re consuming quite a bit of libraries but the complete sources of those aren’t part of what is parsed by the IDE as keeping all that in memory would be unworkable.

Turun,

REST server and ORM python code

Fair enough, that can be achieved with pure python.

heeplr,

It’s always good to learn new stuff but in terms of productivity: Don’t attempt to be a programmer. Rather attempt to write better research code (clean up code, revision control, better commenting, maybe testing…)

Rather try to improve cooperation with programmers, if necessary. Close cooperation, asking stupid questions instead of making assumptions etc. makes the process easy for both of you.

Also don’t be afraid to consult different programmers since beyond a certain level, experience and expertise in programming is vastly fragmented.

Experienced programmers mostly suck on your field and vice versa and that’s a good thing.

QuadriLiteral,

Odd take imo. OP is a programmer, albeit perhaps not a very good one. Did a PhD (computational astrophysics), been working as a professional dev for 10 years after that. Imo a good programmer writes code that solves the problem at hand, I don’t see that much of a difference between the problem being scientific or a backend service. It doesn’t mean “write lots of boilerplate-y factories, interfaces and other layers” to me, neither in research nor outside of it.

That being said, there is so much time lost in research institutes because of shoddy programming by researchers, or simply ignorance, not knowing a debugger exists for instance. OP wanting to level up their game would almost certainly result in getting to research results faster, + they may be able to help their peers become better as well.

heeplr,

25 years in the industry here. As I said there’s nothing against learning something new but I doubt it’s as easy as “leveling up”.

Both fields profit a lot from experience and it’s as much gain for a scientist do become a software dev as an architect becoming a carpenter. It’s simply not productive.

there is so much time lost in research institutes because of shoddy programming

Well, that’s the way it is. Scientific code and production code have different requirements. To me that sounds like “that machine prototype is inefficient - just skip the prototype next time and build the real thing right away.”

QuadriLiteral,

To me that sounds like “that machine prototype is inefficient - just skip the prototype next time and build the real thing right away.”

I don’t think you understand my point, which is that developing the prototype takes e.g. 50% more time than it should because of complete lack of understanding of software development.

UFODivebomb,

My advice comes from being a developer, and tech lead, who has brought a lot of code from scientists to production.

The best path for a company is often: do not use the code the scientist wrote and instead have a different team rewrite the system for production. I’ve seen plenty of projects fail, hard, because some scientist thought their research code is production level. There is a large gap between research code and production. Anybody who claims otherwise is naive.

This is entirely fine! Even better than attempting to build production quality code from the start. Really! Research is solving a decision problem. That answer is important; less so the code.

However, science is science. Being able to reproduce the results the research produced is essential. So there is the standard requirement of documenting the procedure used (which includes the code!) sufficiently to be reproduced. The best part is the reproduction not only confirms the science but produces a production system at the same time! Awws yea. Science!

I’ve seen several projects fail when scientists attempt to be production developers without proper training and skills. This is bad for the team, product, and company.

(Tho typically those “scientists” fail to at building reproducible systems. So are they actually scientists? I’ve encountered plenty of phds in name only. )

So, what are your goals? To build production systems? Then those skills will have to be learned. That likely includes OO. Version control. Structural and behavioral patterns.

Not necessary to learn if that isn’t your goal! Just keep in mind that if a resilient production system is the goal, well, research code is like the first pancake in a batch. Verify, taste, but don’t serve it to customers.

wathek,

There’s a certain amount of fundamentals you need, after that point it’s quite easy to hop languages by just looking over the documentation of that language. If you skip those fundamentals, you end up with a bunch of knowledge but don’t realize you could do things way more effectively.

My recommendation: check out free resources for beginners and skip the atuff you already know thoroughly, focusing only on the stuff you don’t know.

source: I’m self-taught and had to go through this process myself.

Aceticon,

Most of the “conventions” (which are normally just “good practices”) are there to make the software easier to maintain, to make teamwork more efficient, to manage complexity in large code-bases, to reduce the chance of mistakes and to give a little boost in productivity.

For example, using descriptive names for variables (i.e. “sampleDataPoints” rather than “x”) reduces the chances of mistakes due to confusing variables (especially in long stretches of code) and allows others (and yourself if you don’t look at that code for many months) to pick up much faster what’s going on there in order to change it. Dividing your code into functions, on the other hand, promotes reusability of the same code in many places without the downsides of copy & paste of the same code all over the place, such as growing the code base (which makes it costlier to maintain) and, worse, unwittingly copying and pasting bugs so now you have to fix the same stuff in several places (and might even forget one or two) rather than just fixing it in that one function.

Stuff at a higher, software design level, such as classes, are mean to help structure the code into self-contained blocks with clear well controlled ways of interaction between them, thus reducing overall complexity (everything potentially connecting to everything else is the most complex web of connection you could have) increasing productivity (less stuff to consider at any one point whilst doing some code, as it can’t access everything), reduce bugs (less possibility of mistakes when certain things can only be changed by only a certain part of the code) and make it easier for others to use your stuff (they don’t need to know how your classes works, only to to talk to them, like a mini library). That said, it’s perfectly feasible to achieve a similar result as classes without using classes and using scope only, though more advance features of classes such as inheritance won’t be possible to easilly emulate like that.

That said, if your programs are small, pretty much one use (i.e. you don’t have to keep on using them for years) and you’re not having to work on the code as a team, you can get away with not using most “conventions” (certainly the design level stuff) with only the downside of some loss in productivity (you lose code clarity and simplification, which increases the likelihood of bugs and makes it slower to transverse and spot stuff in the code when you have to go back and forth to change things).

I’ve worked with people who weren’t programmers but did code (namelly with Quants in Finance) and they’re simply not very good at doing what is but a secondary job for them (Quants mainly do Mathematical modelling) which is absolutelly normal because unlike with actual Developers, doing code well and efficiently is not what their focus has been in for years.

Fal,
@Fal@yiffit.net avatar

Use an IDE if you aren’t already. Jetbrains stuff is great. Having autocomplete is invaluable.

Asudox,
@Asudox@lemmy.world avatar

I’d say go with Go or Rust. Go is like Python (garbage collection) but compiled. Rust is kind of like C++ but not exactly. It does not have garbage collection or manual memory management but something called “ownership and borrowing”. It’s as fast as C++ or even faster and has a modern syntax. Though Rust is harder than Go since it is under the hood a systems programming language. If you want something faster than Python, Go is good. I specifically chose Rust over Go since I wanted performance and just wanted to try how it was. I’m still a beginner in Rust but I wrote a few projects at reasonable scale for my level. And also, Rust’s error messages are extremely nice. It really lives up to the memes.

To learn Rust: www.rust-lang.org/learn

To learn Go: go.dev/learn/

BatmanAoD,

Why not just stick with Python until there’s a need to learn something else?

Turun,

No.

I have written rust for my research (one does not simply calculate 4 million data points in python), but just no.

My main code is still python, because it’s just so much nicer to write and iterate on.

boeman,

The thing to think about is reusability. Are you copying and pasting code into multiple places? That’s a great candidate to become a class. If you have long lived projects (i.e. something you will use multiple times over a lot of years) maintainability is important. Huge functions and monolithic applications are very hard to maintain over time.

Break your functionality out into small chunks (methods and classes). Keep it simple. It may take a while to get used to this, but your time for adding additional functionality will be greatly improved in the long run.

A lot of great programmers were terrible at one time. Don’t let your current lack of knowledge of principles stop you from learning. One of the biggest breakthroughs I had as a programmer is changing how I looked at architecting applications. Following SOLID principles will assist a lot in that. Don’t try to understand and use these principles all at once, take your time. Programming isn’t what you make your living with, it’s a tool to help you be more efficient in your current role.

Realize that becoming a more effective programmer is different for everyone. Like you, I was self taught. I was a systems and network engineer that decided to move into software development. I’ve since moved into a role that takes advantage of all the skills I’ve learned through the years in SRE. like you, a lot of what I write now is about automation and analysis.

Fal,
@Fal@yiffit.net avatar

Careful with this. Not everything needs to be reusable, and copy/paste isn’t inherently bad.

sandimetz.com/blog/2016/…/the-wrong-abstraction

boeman,

You aren’t wrong… But everything with extended use needs to be maintainable. Making a change in 5 places sucks.

Plus, that’s what open-closed principle is all about. Instead of adding additional functionality to current working code, you extend and modify.

Fal,
@Fal@yiffit.net avatar

Making a change in 5 places sucks, making it in 2 could be reasonable. If 2 pieces of code are similar but different enough, I’ve seen way too often people try to force them into a common abstraction. That’s more what the article is about.

REdOG,
@REdOG@lemmy.world avatar

How to think like a computer scientist may help.

www.openbookproject.net/thinkcs/…/english3e/

catacomb,

If you don’t already, use version control (git or otherwise) and try to write useful messages for yourself. 99% of the time, you won’t need them, but you’ll be thankful that 1% of the time. I’ve seen database engineers hack something together without version control and, honestly, they’d have looked far more professional if we could see recent changes when something goes wrong. It’s also great to be able to revert back to a known good state.

Also, consider writing unit tests to prove your code does what you think it does. This is sometimes more useful for code you’ll use over and over, but you might find it helpful in complicated sections where your understanding isn’t great. Does the function output what it should or not? Start from some trivial cases and go from there.

Lastly, what’s the nature of the code? As a developer, I have to live with my decisions for years (unless I switch jobs.) I need it to be maintainable and reusable. I also need to demonstrate this consideration to colleagues. That makes classes and modules extremely useful. If you’re frequently writing throwaway code for one-off analyses, those concepts might not be useful for you at all. I’d then focus more on correctness (tests) and efficiency. You might find your analyses can be performed far quicker if you have good knowledge about data structures and algorithms and apply them well. I’ve personally reworked code written by coworkers to be 10x more efficient with clever usage of data structures. It might be a better use of your time than learning abstractions we use for large, long-term applications.

RandomUser,

All the other comments are great advice. As an ex chemist who does quite a bit of code I’ll add:

Do you want code that works, or code that works?! It’s reasonably easy to knock out ugly code that only works once, and that can be just what you need. It takes a little more effort however to make it robust. Think about how it can fail and trap the failures. If you’re sharing code with others, this is even more important a people do ‘interesting’ things.

There’s a lot of temporary code that’s had a very long life in production, this has technical debt… Is it documented? Is it stable? Is it secure? Ideally it should be

Code examples on the first page of Google tend to work ok, but are not generally secure, e.g doing SQL queries instead of using prepared statements. Doesn’t take much extra effort to do it properly and gives you peace of mind. We create sboms for our code now so we can easily check if a component has gained a vulnerability. Doesn’t mean our code is good, but it helps. You don’t really want to be the person who’s code helped let an attacker in.

Any code you write, especially stuff you share will give you a support and maintenance task long term. Pirate for it!

Code sometimes just stops working. - at least I’m my experience. Sacrifice something to the gods and all will be fine.

Finally, you probably know more than you think. You’ve plenty of experience. Most of the time I can do what I need without e.g. classes, but sometimes I’ll intentionally use a technique in a project just to learn it. I can’t learn stuff if I don’t have a use for it.

I’m still learning, so if I’ve got any part of the above wrong, please help me out.

ericjmorey,
@ericjmorey@programming.dev avatar

“Pirate for it” was probably the wrong phrase. “Plan for it” was probably what you were thinking when your fingers did something else.

xilliah,

I’ve got two tips to add to the pile you’ve already read.

I recommend you read the manuals related to what you are using. Have you read the python manual? And the ones for the libraries you use? If you do you’ll definitely find something very useful that you didn’t know about.

That and, reread your code. Over and over until it makes total sense, and only run it then. It might seem slow, and it’ll require patience at first. Running and testing it will always be slower and is generally only useful when testing out the concept you had in mind. But as long as you’re doing your conceptual work right, this shouldn’t happen often. And so, most work will be spent trying to track down bugs in the implementation of the concept in the code. Trust me when you read your code rigorously you’ll immediately find issues. In some cases use temporary prints. Oh and avoid the debugger.

agent_flounder,
@agent_flounder@lemmy.world avatar

As the other commenter said, you want to learn about programming principles. Like, low coupling or don’t repeat yourself.

How long is your longest program? What would you say is a typical length?

You say your code is “bad” – in what ways? For example:

  • Readability (e.g. going back to it months later so you go “oh I remember” or "wtf does this do?!"
  • Maintainability (go back to update and you have to totally rework a bunch of stuff for a change that seems like it should be simple)
  • Reliability (mistakes, haphazard “testing”, can’t trust output)
  • Maybe something else?
cosmicrose,
@cosmicrose@lemmy.world avatar

Learning new programming languages is an awesome way to expand your programming brain. If you want to stay in the same scientific computation niche, you can check out Julia or Mathematica. If you’re just looking to broaden your horizons, the world is your oyster. For me, learning Clojure really cooked my noodle but made me a much better programmer since it taught me functional programming.

Also, just read other peoples code! You can learn the conventions that way. Though for you it would best to find other products within your niche, because I’m not sure if general web dev code would be super helpful.

There are techniques that are broader than any single language’s conventions, and I think learning those are how you can improve. That’s hard to teach, though, and it comes from experience with a few different languages, in my opinion.

And honestly, I can totally respect the “conventions be damned” attitude, because at the end of the day, you’re trying to make something that works, and if nobody else is reading that code, you’ve made the right trade-off.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • programming@programming.dev
  • kavyap
  • DreamBathrooms
  • cisconetworking
  • osvaldo12
  • ngwrru68w68
  • magazineikmin
  • thenastyranch
  • Youngstown
  • ethstaker
  • rosin
  • slotface
  • mdbf
  • tacticalgear
  • InstantRegret
  • JUstTest
  • Durango
  • tester
  • everett
  • cubers
  • GTA5RPClips
  • khanakhh
  • provamag3
  • modclub
  • Leos
  • normalnudes
  • megavids
  • anitta
  • lostlight
  • All magazines