In the last few days I’m experimenting with substituting CRUD API code with... - WebDev

louis, 5 months ago

In the last few days I’m experimenting with substituting CRUD API code with Stored Procedures which directly produce the endpoints JSON as a single-row scalar value. API is then just a wrapper that authenticates, validates input and streams the DB’s JSON directly to the client.

No ORMs, no SQL generators etc.

All SQL is where it should belong: in the database

API does only single „CALL myfunc(…)“ db calls

A simple centralised error handler can accurately report errors from the database

No weird mixed row/json columns scanning into structs and re-marshalling everything to JSON

Codebase is collapsing to 20% (by LOCs)

Stored Procedures can use wonderfully declarative SQL code

Response times in the microseconds, even for multiple queries, all happens inside the DB

More side effects:

the data model can change and evolve without touching the API at all

Zero deploys mean zero downtime

the API application is so tiny, I could easily switch it to any programming language I want (yes, even Common Lisp) without worrying about available databases libraries, type mapping and rewriting tens of thousands of lines of intermixed language/SQL-code.

The general direction of the dev industry is heading in the opposite direction. More ORMs, more layers, more database abstraction. More weird proprietary cloud databases with each their own limited capabilities and query language.

So you tell me: Is it crazy? Is it wrong? Why do I have doubts despite everything working out beautifully?

#sql #webdev #golang #backend #mysql #postgresql

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ dekkzz76

Image

Image alternative text

louis, 5 months ago (edited 5 months ago)

Intermediate report on my "Stored Procedure" project (long post).

I've now written 35 stored procedures and functions ( in MySQL ) replacing a massive amount of application code.

MySQL's stored procedure syntax is very capable, I had to write only a few functions to add some JSON conversion comfort. I can use multi-result sets and had to modify the MySQL driver to Go to support OUT parameters, not a big issue.

What I can say is that a really good SQL editor that "understands" your code and provides some intellisense while writing helps a lot. So far I stick with #DataGrip, which also supports syncing the DB schema with Git, so every change can be properly tracked. It has a price tag though.

I've tried MySQL Workbench, DBWeaver, TablePlus, Querious. None of those help with writing stored procedures like DataGrip does.

#DevArt's MySQL Studio could be another candidate but is Windows-only and painfully slow running with #Crossover on macOS.

I've replaced really super-ugly dynamically generated SQL code with beautifully pure SQL queries, discovering a lot of bugs during the process. Now, looking at the code, I can see what it does, not just hoping that it will work out. Reminder: I have to work with a complex data model spanning many tables, so dynamically adding JOINs and ORDER BY clauses is not easy and amount to potentially hundreds of test cases. Not with Stored Procedures.

If you are really interested in a big-boy query involving spatial queries (distance from point, bounding box etc.), LATERAL CROSS JOINs and parameterised ordering, check this out:

https://pastebin.com/DKBzR19i

What surprised me most is that MySQL is fast and I mean really fast. In most cases involving complex JOINs over many tables much faster than PostgreSQL (which is everybody's darling, I understand). Having this performance at hand I could replace all hourly-updating MATERIALIZED VIEWs with real-time queries.

Also, MySQL has excellent Spatial functions and a decent FULLTEXT search, I don't miss anything from PostgreSQL right now. Also its native Event scheduler (a cron for SQL), makes table maintenance a breeze. No more separate SQL scripts running in a crontab anymore.

PostgreSQL pg/plsql syntax is bulky, ugly, and hard to read (with::all::those::automatically added::type::casts). MySQL doesn't need this.

What MySQL is not good at: error messages are super-crappy and don't help most of the time ("you have a syntax error, check that your code is correct ..."). So a high level of SQL confidence helps, but documentation is very accessible and will structured.

There is still a long way ahead, I think I've migrated about 50% of the application right now - there are still some complex use cases to solve but I feel confident about this approach.

If you still think about MySQL capabilities in terms of versions 5 or earlier, you will be surprised how far it came with the most recent version 8.

I know Oracle is universally hated in the Fediverse, but: don't forget that most of PostgreSQL recent "let's add hundreds of features every 6 months" party is also mostly driven (funded, developed) by commercial parties looking to lure away Oracle and SQL Server clients.

I can't wait to finish this project and will keep you all posted.

#mysql #sql
(#)PostgreSQL (let's not make half of the Fediverse angry... 🙂 )

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ michael

spudlyo, 5 months ago

@louis There are a couple of LSP servers out there that support MySQL, but I haven't tried any of them. One of the great things about implementing things as server side stored procedures is the round trip query latency is measured in microseconds. That's an excellent thing when you're in a transaction that's holding a lock.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

baron42bba, 5 months ago

@spudlyo @louis good formatting. But do yourself a favor and write comments about your sub-selects. If you encounter bugs in the future it makes it easier. I migrated a complex payoff program from Java into SQL a few years ago. Everything in the database. Just connecting to the DB (no stored procedures at the time) and using temporary tables. Already had lots of comments in the beginning and added more since then. Some of those queries were written over more than one day. Efficient but complex.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

baron42bba, 5 months ago

@spudlyo @louis of course I used Emacs sqli and Orgmode for development. Bonus points for extending babel to support my database. 😀

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

louis, 5 months ago

@baron42bba I'm slowing getting warm with the thought to move dev to sqli mode. Let's see how it works out - would be great to have a good auto-complete feature that also suggests not only table/column names but also local variables.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

baron42bba, 5 months ago

@louis I tinkered with sqls and tried to add support for my database before it got archived.
Currently I am quite happy with hippie-expand:
(setq hippie-expand-try-functions-list
'(try-expand-dabbrev
try-expand-dabbrev-all-buffers
try-complete-file-name-partially
try-complete-file-name
try-expand-all-abbrevs
try-expand-list
try-expand-line
try-expand-dabbrev-from-kill
try-complete-lisp-symbol-partially
try-complete-lisp-symbol))
It's not perfect but fast.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

crmsnbleyd, 5 months ago

@louis one of the main gripes with stored procedures is that they can't be version controlled. Guess that just isn't true! Does tie you to a specific implementation, but then all abstractions are leaky anyway

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

louis, 5 months ago

@crmsnbleyd You can version control your schema by either (automatically) dumping the DDL and just put it in a Git repo or have tooling does this automatically. I chose to put the DDL into my application repo so changes are related in a single commit.

Do you mean with „tie you to a specific implementation“ the choice of the database itself? Well, the SQL standard is only a standard on paper and only simple SQL is portable - so your DDL/DML is usually bound to a specific database anyway, but they are (perhaps) easier ported to another DB than application code is ported between programming languages (or even between ORMs). There is literally no difference between calling a stored procedure from Go, Common. Lisp, JavaScript or any other PL. Heck, I could even use C now (I won’t - or will I? :-))

Stored Procedures are self contained and easily cloned, so when I want to try a specific optimisation I just copy the SP, modify it and call it. Also, you can easily log stuff happening in a SP to a database table and analyse the runtime behaviour. A/B testing of SPs is as simple as it can be. I have a complex SP that constructs a filter out of 20 parameters (coming in via HTTP request params) and now log every call to see which filters are actually still in use. Now, I don’t have to costly implement „OpenTelemetry“ or other APM to do this.

My application code is collapsing by (let’s just say) 100 lines per hour. After application just uses an SP, I could easily modify the database schema and just alter the dependent SPs. I no longer have to fear that schema changes will break application code with dynamically created SQL that is unpredictable and unreadable. This way, a schema can evolve and not be stuck for years.

Also, I no longer have to care about and painfully optimise the number of queries a HTTP request is causing to avoid latency. Interestingly, the SPs I wrote do much less queries than I had in my application code, because I have the full set of SQL available.

That all applies of course only to my specific use case. I wish I would have had the courage to make this move earlier and not just see the database as a storage bucket.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

louis, 5 months ago

Intermediate report 2 on my "Stored Procedure" project (long post).

I think it's time to talk about some of the downsides of Stored Procedures.

I think I've now accumulated enough knowledge to do so. The following applies mostly to MySQL.

Drivers

Support for Stored Procedures in Go is terrible. It is clear that maintainers of the drivers do not care to implement full functionality for Stored Procedures, just because "nobody uses them".

One notable exception is the Go driver for SQL Server, which is now maintained by Microsoft itself and is excellent.

I patched the MySQL driver for Go to support OUT parameters, after studying the MySQL Client/Server protocol and writing a small prototype driver myself. The MySQL driver[1] project is plagued by lack of interest and rudeness of the maintainers ("you are wasting my time" is a common response). What makes the situation even more complex is that there is also MariaDB and both act slightly different and start to diverge more and more.

I think it would be wise for #Oracle to step in and produce an official Go driver for MySQL exclusively.

However, after some hours I figured out the issue and I was able to implement full support for OUT parameters and multi-resultsets.

Tooling

There are many UI tools for macOS but the only one that is capable of serious database development is DataGrip from #JetBrains. Other UI tools either have no support for Stored Procedure development at all or are too rudimentary in that they provide no language support. I've tried them all.

Dev experience

I love writing SQL for Stored Procedures but there are some unique downsides I want to highlight.

– Passing table data between Stored Procedures is only possible either by creating temporary tables and "by convention" use them in the other SP or by creating JSON Arrays. Since tools do not know about these temporary tables, they will regard their usage as a potential error in your code.

TEMPORARY TABLEs are also the only way to collect multi-row SELECT results of a Stored Procedure.

– MySQL does not support the INSERT INTO/UPDATE ... RETURNING clause [unlike PostgreSQL], meaning that if you do multiple inserts/updates and want to know the auto-created IDs of these rows, this is only possible by using a CURSOR with a LOOP and accumulate the results in a TEMPORARY TABLE.

– CURSORs are ugly beasts. You have to DECLARE them (which is only allowed at a specific position in your code), OPEN them, FETCH row by row into user variables (which you are have to declare, too) and CLOSE them.

– Exception handling in Stored Procedures is somewhat convoluted. There is no Try/Catch construct. You can "SIGNAL" errors or warnings and can optionally write HANDLERs for specific exceptions but these do not transfer execution, you need to create and modify variables that you have to check in your code, since the SP just continues to run.

– MySQL gladly accepts SPs with buggy code. It is imperative to write "Test-SPs" to make sure they run as intended.

– Since SPs are defined in the same namespace (database) with your tables, views, functions etc. it is important to find a clear naming scheme, otherwise it will get very messy. There are no "packages" or "schemas" in MySQL.

– MySQL does not have user-defined types. If you declare variables for a specific column over and over, you have to rephrase theirs data types. PostgreSQL has a way to say "this variable is of type [table.column_xyz]". Not so in MySQL.

Having said all that, writing pure SQL with all these limitations provides a sense of clarity to focus on the problem at hand. Instead of thinking about "how" to solve, you focus on "what" to solve. Because you can solve anything with SQL [2] :-). Also, your app will be damn fast with Stored Procedures.

I hope I can cross the finish line next week and have actual results to share.

#mysql #sql

[1] https://github.com/go-sql-driver/mysql
[2] https://emacs.ch/@louis/111533663401601630

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

dekkzz76, 5 months ago

@louis

more stuff to sell

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ tetrislife

spudlyo, 5 months ago

@louis One common argument against this approach is that the development / debugging tools for working with stored procedures are primitive. If you can make it work smoothly with git, and CI, I say go for it. It's not like most programmers use anything more sophisticated than printf debugging anyway.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

suqdiq, 5 months ago

@louis
not sure if you have read the source code for crt.sh but perhaps you should :) it's essentially made in postgresql :D

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

louis, 5 months ago

@suqdiq I've browsed through the source code (fnc/), that's crazy 🙂 I've never seen something like this but I find it awesome.

Thanks for sharing the link, I'll certainly draw some inspiration from this for my project.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

graywolf, 5 months ago

@louis We considered this approach as well, and the limiting factor seems to be tooling. How to properly version the code? How to deploy it? How to rollback? How to make sure you have the same version in all instances of your database? How to do monitoring? Logging? Testing? For all these questions, these is an answer in the form of tooling for the "traditional" container-based applications, however for deploying, reviewing, developing, ... the store procedures, there does not seem to be widely used option (assuming there even is).

There is an other approach, you could still have traditional application, and just ditch the ORM and write SQL by hand. For golang, there is also sqlc, which seems to be fairly reasonable generator (basically just making golang functions from your SQL code, ensuring type safety).

Just to be clear, I am not saying "do not do it", do what you want :) I am just putting out there what did work for us.

Also, if you want to go full crazy, there is also a PostgREST: https://postgrest.org/en/stable/ :)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

encthenet, 5 months ago

@louis

No, you're not crazy. You're using a domain specific language to do what it's good at, and you're moving compute closer to the data which always improves performance.

It just means you need people experienced with DB/SQL, which isn't as common, which is why people aren't doing it as much.

#sql #webdev #golang #backend #mysql #postgresql

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

hszakher, 5 months ago

@encthenet @louis it also means that you're stuck with your database choice for the foreseeable future. I am no fan of ORM, but one of their selling points is the database engine choice flexibility.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

louis, 5 months ago

@hszakher @encthenet I worked on several database migration projects and I’ve never encountered a situation where a database engine could be „easily switched“, no matter if ORMs were used or not. To a degree you are always locked-in and I think it also makes sense to use the available engine to its fullest extent. Arrays, for example, which are only available in PostgreSQL.

Having all SQL inside the database makes it even easier to migrate because you can just rewrite your schema/procedures, and don’t even need to touch your application code.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

louis, 5 months ago

@encthenet I figured. Writing SQL is not as exciting, even considered arcane by young devs. It also forces you to think in terms of Sets and not imperative, which is hard at first but very rewarding in my opinion. 👌

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

hanshuebner, 5 months ago

@louis That is the way I build my web apps nowadays, too.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Ardubal, 5 months ago

@hanshuebner @louis

Interesting. May I ask: how do you manage the stored procedure code? Is it in source control? What does bootstrapping and deployment look like?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

hanshuebner, 5 months ago

@Ardubal @louis I use a schema creation script that is in git. I have been looking into automatic migration upgrade tools like migra and pg-schema-diff, but for my single developer side projects, manual migrations are mostly sufficient.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

louis, 5 months ago

@hanshuebner @Ardubal Same here. I’ve a Makefile that dumps the DB structure into my codebase, so when I commit I’ve the full schema available.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ghard, 5 months ago

@louis Ha! You’ve discovered my secret sauce 😀 Data should always be handled where it resides. Stored procedures are essential and the SP API can be further secured by granting individual procedures only to clients that should have access. Though my DB engine choice - for historical reasons - is different. Common #Lisp + plain ODBC makes a great front-end.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

louis, 5 months ago

@ghard We currently run PostgreSQL but I’ve did a lot of work with MySQL 8 and SQL Server, too. PostgreSQL is powerful but when things get more complex, it is very hard to manage without dedicated DBA resources. Re: dev experience I find PL/pgSQL difficult and at times annoying. Overloaded functions for example, should not exist. As @galdor noted, forced schema binding can be a blessing but for ongoing development it’s a curse. Dropping and re-creating many dependent objects for a simple change leads to very complex migrations even for simple changes.

SQL Server provides the best dev experience and most powerful SP semantics (T-SQL 🥰). For obvious reasons (cost/licensing) it is not an option, so MySQL 8 seems to be right mix between PG and MSSQL for me, and has absolutely great performance. Index-clustered tables, memory table engine, hot row updates, changeable column order, events, auto-indexed FK columns, collations, super-fast table counts, simplified role management and a great full-text search make MySQL a joy.

What DB do you use for historical reasons?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

galdor, 5 months ago

@louis

It is not a bad idea as long as your application is mostly a thin layer above stored data. If you start having to compose objects from a mix of data from the database and data computed from other sources (e.g. network calls), you'll end up having to build final objects in your code anyway.

I would also be concerned by how annoying it is iterate on PostgreSQL during development. In a traditional application, you can just modify the code, recompile and execute. With PostgreSQL functions, you'll need to drop/update the right functions/types all the time, and sometimes it gets really annoying (e.g. remove enum values). But I imagine one can build tooling to help with that.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

louis, 5 months ago

@galdor Mostly a layer above data, and the parts that deal with external sources can still benefit from using SPs that return classic rowsets instead. Re: dev experience, you are correct, PG can be very annoying with dependencies because of schema binding (and other things). Which is why I’m running my evaluation with MySQL 8 (I know I’ll be stoned for that now) but the dev experience is much better - there are no TYPES/DOMAINS (enums are just defined on a column) and no schema bindings between SPs and tables. Dependencies to tables/columns are only checked once when a SP is stored. MySQL 8 has a JSON data type and excellent JSON utility functions. It’s much simpler than PG, but that makes it very attractive for my use case.

Current API (Go) has grown since 2016 and is now a monster of ~50 different endpoints, with a mixture of dynamically created literal SQL (I talk about queries that are 150 lines with complex filtering/ordering and span over 10-12 tables), and SQLboiler-generated Go code based on table/views schema (and recently sqlc.dev which generates Go functions/types based on SQL queries written in .sql files).

Complexity & monitoring became unmanageable and DB schema changes virtually impossible. Which is why I started to evaluate a new, unified approach to get this thing moving again.

Thanks for your thoughts!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Add comment