CodenameTim,
@CodenameTim@fosstodon.org avatar

Coming back at update_or_create for round 2!

If you call update_or_create where the instance already exists and the defaults passed in are already the values on the instance, do you ever want it to actually re-save the instance?

I'm running into the case where I'd prefer it to not update the instance if it doesn't need to. I'm guessing I'm overlooking some race condition problem though.

#Django

CodenameTim,
@CodenameTim@fosstodon.org avatar
webology,
@webology@mastodon.social avatar

@CodenameTim If you haven't looked at django-dirtyfields yet, there might be a few corner cases that aren't quite as obvious until you see how it's evolved.

https://github.com/romgar/django-dirtyfields/

webology,
@webology@mastodon.social avatar

@CodenameTim I guess I either want update_or_create or I don't want it and get_or_create is fine. Otherwise, I want to handle it all by hand.

I'm getting quite a bit of mileage out of bulk_update and bulk_create these days when we have millions of records to shuffle around.

webology,
@webology@mastodon.social avatar

@CodenameTim In general, I wish Django had an optional dirty field that we could natively work with.

I understand why we didn't when Django was with RAM constraints from ~18 years ago. Moving from 128 M of RAM to 1G to 2G per process feels like there is more than enough now to optionally support it.

CodenameTim,
@CodenameTim@fosstodon.org avatar

@webology I'm realizing that this problem is only a major problem when using django-simple-history where it creates a historical record on every save. These empty saves add up if they occur on a regular basis.

ryanhiebert,
@ryanhiebert@fosstodon.org avatar

@CodenameTim @webology that actually is basically the use-case that I am dealing with at the moment. We don’t use update or create, but we don’t want save to write if it hasn’t changed materially.

CodenameTim,
@CodenameTim@fosstodon.org avatar

@ryanhiebert @webology Welp, maybe I should create that issue on the repo for a library specific utility function.

ryanhiebert,
@ryanhiebert@fosstodon.org avatar

@CodenameTim @webology perhaps, although I’m not using that package. We have our own audit model. Having some way in Django to do this, especially if it can be at the database layer doing an upsert, sounds amazing, and seems like it could be a useful addition to Django. I’d wish it was the default, but assume that’s unlikely, and I’m not sure what the API should look like, but it would be nice.

CodenameTim,
@CodenameTim@fosstodon.org avatar

@ryanhiebert @webology Pretty sure bulk_create with update_conflicts is an upsert for postgres now.

ryanhiebert,
@ryanhiebert@fosstodon.org avatar

@CodenameTim @webology oh that’s cool, I’ll have to look into that.

webology,
@webology@mastodon.social avatar

@ryanhiebert @CodenameTim I'm a few clients and over a year removed, but I have struggled with that problem with django-simple-history.

I think we worked around it by looking at self.changed_fields on save() before we'd commit the save() as a workaround.

cc'ing @treyhunner in case he's bored and wants to weigh in at some point.

If changed_fields didn't work, then we would have added our own copy / state in the model and rolled the same logic ourselves.

treyhunner,
@treyhunner@mastodon.social avatar

@webology @ryanhiebert @CodenameTim I thought some folks added functionality to simple history to control whether/when a history snapshot happens.

CodenameTim,
@CodenameTim@fosstodon.org avatar

@treyhunner @webology @ryanhiebert There's the option to disable it before calling a save operation. But that's generally manual. Or you'd have to roll your own comparison to determine whether to set skip_history_when_saving

treyhunner,
@treyhunner@mastodon.social avatar

@CodenameTim @webology @ryanhiebert
It was skip_history_when_saving that I was thinking of. I had thought of using that with django-lifecycle in the past. My data duplication issue didn't end up being too big of a problem though, so I never got around to it.

The biggest upside of django-simple-history is also it's biggest upside: it's automatic. 😬

ryanhiebert,
@ryanhiebert@fosstodon.org avatar

@treyhunner @CodenameTim @webology I've not personally used django-model-utils, but another developer previously showed me that they have a FieldTracker. It doesn't solve for the pure-db-checking upsert as I'd like, but could be useful to dirty check models before saving: https://django-model-utils.readthedocs.io/en/latest/utilities.html#field-tracker

treyhunner,
@treyhunner@mastodon.social avatar

@ryanhiebert @CodenameTim @webology I actually made FieldTracker because I was inspired by django-simple-history and thought django-model-utils might be a good home for that. 😜

Honestly I use django-lifecycle's dirty-checking more often now since it's good enough for my use cases.

fallenhitokiri,
@fallenhitokiri@social.screamingatmyscreen.com avatar

@CodenameTim I would expect DateTimeFields with auto=True to update for example?

CodenameTim,
@CodenameTim@fosstodon.org avatar

@fallenhitokiri That would be a tough one. I'd argue that since nothing else is being updated, you wouldn't want that timestamp being touched.

ryanhiebert,
@ryanhiebert@fosstodon.org avatar

@CodenameTim @fallenhitokiri I think auto is a good example. It’s not the behavior I usually want, but at least a reasonable use-case. I can’t think of a time I want it to save but change nothing, but that’s the way save works in Django, and it would feel like a weird difference to have save and update_or_create have different behavior.

I tend to think Django saves too much by default, but I’m not sure a change in the paradigm of saving is likely to be reasonable.

fallenhitokiri,
@fallenhitokiri@social.screamingatmyscreen.com avatar

@CodenameTim I’m thinking of persisting a value that’s constantly being updated and you want to know when the last write was, even if nothing changed. New values (keys) might come in at any time.

(I had something similar for an industrial control system I worked on. The updated field was one of the heartbeats)

CodenameTim,
@CodenameTim@fosstodon.org avatar

@fallenhitokiri And that's probably the nail in the coffin for the argument to change it.

BUT.

I'd argue the savings of not issuing an update > the heartbeat example. Especially considering you could replace it with update_or_create(id, {'updated': timezone.now()}) whereas you have to re-implement update_or_create to get that performance optimization.

fallenhitokiri,
@fallenhitokiri@social.screamingatmyscreen.com avatar

@CodenameTim It’s a fair argument, but it feels inconsistent with how updates and creates outside of update_or_create behave.

You could make the argument that for consistency it might make sense to drop auto=now for all writes - you can simply pass it the same way as you’d have to do in your example.

CodenameTim,
@CodenameTim@fosstodon.org avatar

@fallenhitokiri I see two reasons for the updated timestamp column. One is for a heartbeat like you described. The other is for a last modified. In that latter case, if nothing has been modified, it's more confusing to have that timestamp changed.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • django
  • DreamBathrooms
  • magazineikmin
  • InstantRegret
  • thenastyranch
  • cubers
  • Youngstown
  • ethstaker
  • slotface
  • mdbf
  • rosin
  • Durango
  • kavyap
  • GTA5RPClips
  • khanakhh
  • JUstTest
  • tacticalgear
  • ngwrru68w68
  • cisconetworking
  • modclub
  • everett
  • osvaldo12
  • tester
  • anitta
  • Leos
  • normalnudes
  • megavids
  • provamag3
  • lostlight
  • All magazines