This post is in relation to the #PGSQLPhriday blogging series. The original post for this discussion is located here. Thank you Ryan Lambert for such a great topic and for hosting this month!
Ryan posted a challenge on how are you using Postgres. I felt like for this post instead of focusing on one question. I would simply answer all of them! I really liked all the questions so I’m just going to answer them!
What non-relational data do you store in Postgres and how do you use it?
In the current company/PG servers, I work with and deal with. We have a great number of JsonB columns that house Non-relational data. I am newer to this company so really didn’t have much of a say in the design of this particular aspect. Currently, it’s primarily used as a logging/settings store in those JSONB columns. It was a pattern that was created with good intent but misused. Turning into the Data Junk Drawer. Unfortunately, the data is needed in many cases and so you have procedures/functions out there trying to query the data out of these fields and use it in various cases. This is not an ideal use for it.
One good example of non-relational data used well. I did some contract work for a document scanning company that would use OCR to read most of the data/metadata about the documents and put them into table structures and then store a link to the actual document in a file store. This link was added to the row so the application could retrieve the document but it wouldn’t be actually stored in the DB. The performance issues they called me for were still related to the data they were trying to scan but it was mostly bad indexing. Once we corrected the indexes everything worked much better and they had a good system that would scale reasonably well.
Have you attempted non-relational uses of Postgres that did not work well? What was the problem?
Yes, Besides the current system, I mentioned above for the “Junk Drawer”. I also ran into another company that did store logos/graphics/images for the website in the DB. The retrieval from the DB wasn’t really horrible since they were pulling exactly a row/column from the DB, the real issue came along with caching. Websites can’t take 2 minutes to pull up images and documents and they depend on the cache to make them run fast. The web servers of the time couldn’t cache the documents since they were DB calls and not static documents. This means that each website load had to re-pull the data. Once this system started scaling and getting lots of requests the web servers couldn’t keep up re-loading the images over and over. In the end, they had to move this out of the DB and into the Webserver cache where it belongs. I strongly urge all of my customers to never store images/graphics in the DB at all if they can avoid it.
What are the biggest challenges with your data, whatever its structure?
Right now the biggest challenge for the current project/company I work for is just related to DB sprawl. We have 10+ years of the DB being designed with a monolithic application that added new tables all into the same DB. Only a few of the functions/design of the application was broken apart into separate pieces, separate schemas, or really separated at all. This is not uncommon in the industry since many companies that have been building software for 10-15 years started out with monolithic applications and rarely do we go back and focus on the technical debt side of things. Moving systems like this to a Service Oriented Architecture or even just breaking it up into different pieces takes time and a lot of cultural change for a company. It’s always a long-term project and usually, in a larger company it will take years to complete. It’s a fun challenge to have and you can find small wins by breaking off smaller pieces and starting on new products/new features in a new system.
How do you define non-relational data?
I like to define data based on how it’s used. I tend to see a lot of various data with companies so I tend to keep an open mind and just base my definitions on how the data is used. An example of this is Search, We all know PG and many RDBMSs can have indexes and search very quickly through millions of rows of data. I’ve found in past experiences that moving heavy search functions off to something like Elasticsearch has been a great way to speed up search even more. RDBMSs are really not great at doing textual style searches vs something like an Elasticsearch engine. It’s an excellent example of using the right tool for how the data is being used.
I really enjoyed this #PGSQLPhriday, I hope more in the community will join in the fun and join us in a future #PGSQLPhriday. Thank you Ryan Lambert for the great topic!