Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Search - "grafana"
-
Go home grafana, you're drunk....
Maybe I shouldn't run 50 containers on a system with 2 cores and 6 GB RAM.4 -
Pro tip:
Make sure you can RECOVER from your backups.
It's all well and good backing this and that up, but make sure that when the shit really hits the fan you can recover.
I've now 4 days into recovering a raspberry pi that ran:
Pi-hole
Snort
DHCP
VSFTP
Logwatch
Splunk forwarder
Grafana
And serveral other things... I've learnt my lesson4 -
Things you don't want to see at night
Ripped out of Netflix-Mode by a Warning notification and currently monitoring further development
Green line is temperature, blue humidity. Temperature rises at ~1°/10min, but seems to flatten just now. ~0.6°C to go and I'll have to head out. I'm thinking one of the ACs failed, but states are fine. Never trust a single information source for critical infrastructure guys15 -
Just released version 1 of my first API! For this project I did everything the way I wanted to, no shortcuts! I documented the shit out of every endpoint and parameter. Everything is throughly tested and it’s dockerized. I also have metrics for each endpoint (with Grafana in the frontend, which I love) as well as alerts in case it would go down for some reason.
I prepared all of this before deploying it out into the wild and damn, it feels so good. Probably no one will use it but I don’t care. It’s one of those projects where you have to force yourself to go to bed at 2 AM.
Just some thoughts. Don’t really have any techie friends so figured maybe someone here recognizes that feeling. Also I wrote it in Python, such a pleasant language.11 -
This begs for a rant... [too bad I can't post actual screenshots :/ ]
Me: He k8s team! We're having trouble with our k8s cluster. After scaling up and running h/c and Sanity tests environment was confirmed as Healthy and Stable. But once we'd started our load tests k8s cluster went out for a walk: most of the replicas got stoped and restarted and I cannot find in events' log WHY that happened. Could you please have a look?
k8s team [india]: Hello, thank you for reaching out to k8s support. We will check and let you know.
Me: Oh, you're welcome! I'll be just sitting here quietly and eagerly waiting for your reply. TIA! :slightly_smiling_face:
<5 minutes later>
k8s team India: Hi. Could you give me a list of replicas that were failing?
Me: I gave you a Grafana link with a timeframe filter. Look there -- almost all apps show instability at k8s layer. For instance APP_1 and APP_2 were OK. But APP_3, APP_4 and APP_5 were crashing all over the place
k8s team India: ok I will check.
<My shift has ended. k8s team works in different timezone. I've opened up Slack this morning>
k8s team India: HI. APP_1 and APP_2 are fine. I don't even see any errors from logs, no restarts. All response codes are 200.
Me: 🤦♂️ .... Man, isn't that what I've said? ... 🤦♂️5 -
All these super expensive and fancy enterprise tools. CloudWatch, AppDynamics, Grafana, Splunk and whatnot. Spent a month trying to figure out why the fuck the app does not perform well.
Took 1 day with tcpdump, awk and gnu utils to figure out why.
Should anyone need a tcpdump analyzer -- try my awk script. Shows response times of each network call w/o impacting app performance :)
https://gist.github.com/netikras/...14 -
online coding exams.
Ask me how to do a rest api, ask me how to do a certain visual in the website, ask me how to setup a docker service running grafana, please just ask me something about the actual job.
Dont ask me to create some mind game that was ambiguously phrased in a timed hackerrank question that expects me to write runnable solutions that pass all test cases.
I have way too much work to play around with hackerrank for weeks so i prepare for your useless test3 -
Attention guys and gals! If you are using grafana in your home setup, update it asap to 4.6.4 or 5.2.3. versions before those two are affected by an authentication bypass vulnerability. CVE 2018-15727
In the meanwhile, my nginx config is blocking everything but the LAN ips :) -
From now on I am administrating multiple servers in our company and monitoring is one thing our infrastucture lacks...almost completely. At least, useful monitoring.
Installing netdata or Grafana and integrate it with chat is definitely a solution, but what happens if the whole server just shuts down (very stupid scenario I know)? Well, it is easy, there will be no alert about the failure.
So, that's where I was wondering if there is a tool or even better plugin for netdata or Grafana, that enables remote monitoring from another server? I surely can write a simple script to check the server availability but having the whole monitoring tool on a single server instead of 5+ would be also easier to maintain and setup.10 -
When IT is like : hey our new grafana is at this place "some URL"
I submit a bug ticket: "I can't see metrics about this server that has been running for a while"
Their comment on the ticket : the URL to the old grafana -
Time to get going properly with ansible, consul and docker swarm.
Idea is first to convert tinc to a container, which automatically sets itself up based on previous consul announced tinc nodes.
Consul to keep track of all the nodes with prometheus too and hopefully auto attach to grafana.
Ansible to set up new nodes right with DO API, announce to consul, pull docker images and join the docker swarm master.2 -
*Frustrated user noises* Whyyyy, Grafana, why don't you implement any actual query forgery checks?!
So long as a user has access to the Grafana frontend, they can happily forge the requests going off to the backend, and modify them to return *whatever* data they want from the datasource.
No matter that they're a read-only user. That only stops them from modifying the dashboard definitions on the frontend, but doesn't enforce any sort of immutability on the BE...
If anyone had any tips on how to further secure it, I'm curious...5 -
Using grafana together with tinc+promotheus, has been a blast.
Initially I wanted to get into ELK with Kibana and all that, but that required 8G of ram, the instructions to get it running in the open source "mode" was nearly non-existent, together with all the ready docker compose stacks out there simply not working or the images being broken.
I'm sure I could've managed around most of those issues, but the fact it is as hungry as gitlab, made it a literal no-go for the usual server resources my clients host or my own scaled down server recently.
Thankfully I remembered that there's grafana and me having experimented some time ago with tinc, so I can have very lightweight beat'esque prometheus agents deployed listening on tinc local net only, with the typical nginx auth and some whitelists to all of the servers I host and all those of my clients.
The dashboard creation was especially great in grafana (tbf promotheus does actually most of it), literally what I always wanted out of those "complicated" solutions, that do it all, but have no proper query language, complex documentation, heavy collectors with no properly named data points, expensive resource runtimes, ..
with grafana I can just easily put dashboards into folders, create users to look only at certain stats or even dashboards (opened up some interesting contracts actually, because now I can also offer proper monitoring for all things delivered), easily drag and drop around stuff to fit more information (most others fix you to a small 3x2 grid, a too big grid for a TV or simply non resizable tiles, making that one counter take up an entire row) and resize to my hearts desire
tinc of course allows me to easily create private networks that are resistant to failure across any region and the routing is done for me, so I don't have to run around it all that much either
P.S: a damn tiny fly went into one of my now 4 monitors and died right in the middle, because I thought it's just some dirt and I pressed it in while trying to wipe it off, so that monitor now serves as the top most on a vesa mount5 -
Storytime.
The Prometheus tales
Part IV - A new FUBAR.
A new and very fascinating problem emerged a few days, after feeding some node definitions to the new titan instance.
It's a storage fuck-up. A major one.
If I'm informed correctly, the latest prometheus should have the same (or even better) log compression algorithms for metrics, as the old one - because these fuckers are so damn good at what they are doing: compress some fucking logs.
The new instance is agregating metrics as planned. Grafana work's like a fucking charm.
Nethertheless, because of very fascinating but unknown reasons, the new instance creates 50GB of metrics in under 4 fucking hours.
Am I missing something here? Some magic parameter that has to be passed to the titan, that enables the hardcore compress-them-fuckers-feature?
Debugging session is tomorrow.
To be continued. -
Thoughts on the idea of including links/query starters for debugging or where the fucking logs are in AWS, grafana etc in repository READMEs?1
-
I had a discussion - no, it was more a lobotomy - with one of our "experts"
I was kinda confused, as he had several grafana tabs open and an query editor...
He explained to me that he debugs and optimizes his query based on the grafana data....
Elasticsearch cluster with several hundred, different indices, > 20 TB data
I explained to him the scrape interval of 5secs, that he cannot distinguish his query from other queries, that there is far too much of an interference... Let alone that a 5 sec scrape interval is a very loooong time.....
Nope. It makes perfect sense to him and he'll continue to work like this. -
What do you use for performance monitoring on your infrastructure?
My company uses zabbix, OpenNMS and Nagios to monitor different parts of our infrastructure (from shared web hosting to OCCAS to IPTV to FutureVoice to Atlassian servers) but has no real-time performance checks.
I’ve set netdata master with prometheus backlog and grafana dashboards to monitor different metrics, however I am not sure whether any better approach could be done. Any suggestions?2 -
Grafana managed alerts are so fucking over complicated it surprises me that it is a professional product.
And if you need help, the grafana community forum is a fucking ghost town
And the docs suck too8 -
After the conversation, the real good way was already provided:
Prometheus exporter: https://github.com/prometheus/... (https://blog.opstree.com/2018/12/... for more details)
Overview: https://devconnected.com/complete-m...1 -
Honest question. When do you consider yourself a "Big data engineer"?
Today I managed to create a system that collects historical metrics from monitoring tools every 5 minutes and do all sorts of crazy transformations to make them ingestible by grafana Mimir in OTLP protocol. Doing 600gb a dat, millions of active time series, .... And I still feel it's, "small"
Thoughts?6 -
Just discovered wizzy ... Wow, freaking sweet!
https://github.com/utkarshcmu/wizzy
I like it for many reasons, just started playing with it, therefore #1 reason so far is saving dashboards and having them in a git version control, yay!!!
Also, if you're not familiar with Grafana, let me blow your mind: http://grafana.org4 -
If anyone is looking for a great tutorial on getting started with a docker cluster check out https://dockerswarm.rocks/
I had a 4 node cluster up on Digital Ocean with Traefik + Lets Encrypt, Prometheus, Portainer, Grafana all that good stuff in under 2 hours. Not much longer to test a basic WP and Next Cloud container with full SSL. Neat stuff. Just burning through $100 credit for testing but it's been fun5 -
I swear, there will come a day when I stop confusing Grafana and Kibana. The two things sound too similar for their own good.3
-
We've got new TV for monitoring, which auto-rotating meme page you like ? Cats, dogs, dank (sfw), dev, testing. Gimme yours !!! :)1
-
Help is needed on observability tools to use.
I’m in the trenches trying to sort out tools for observability.
Did a bit of Googling and ran into Metoro and Groundcover. Both seem pretty slick, but I’m not sure which one to roll with.
Do any of you have experience with these? How do they hold up in real-world scenarios? Would love to hear any war stories or insights.
I've been looking for Grafana as well, but it doesn't fit my budget at all.1 -
I'm tired and stressed and it's friday
all my work is done that is required for monday, i should do testing and code cleanup, but i'm burned out so instead i'm gonna play with grafana and see what I can do with it, seems cool and something more interesting to do than code cleanup and wanting to cry2 -
Some really motivated guy.
He apparently wants to monitore his opensource application on his spare time.
His application is likely to have no users though.
But well, that guy looks like kinda montivated.
For professional purpose, guy already did monitore with newrelic.
Seems like he was not satisfied and switched to datadog 3 years ago.
But liking digging dirt, he migrated to self hosted telegraf/influx/grafana (which he likes to about)
Today that guy is not in his company but on his potatoe machine in the cloud. So he wants to be minimalistic, datadog should do.
Now you got it, random ff*** is me, on a weekend, a shinny saturday for that matter.
Actually now it is night.
Now let's start the fight.
I have datadog scripts!
But datadog be sneaky as well. datadog upgraded to v6 8=)
-> scripts ain't working. outdated.
I check the logs. Too bad!
-> datadog removed dogstatsD.log in v6!
Well I have nothing to do in my life it is too cold outside as they say. I read the (sluggy) datadoc and tries some shell command (given in doc) to upload some events to dogstatsd (via udp).
-> Nothing happens, neither in local nor in remote.
ok maybe command not up to date, so let me try some official library. datadog from python. Feels like a nice try!
-> only available for python >= 3.5. 3.4 on my good ol' jessie. Upgrading os for datadog not acceptable.
Maybe dogstatsD not started... doc says it is by default, but well, not the first time doc is wrong... I put datadog as log verbose. Guess what: as per standard: shitload of error.
Digging... kubexx, docker and whatsoever apparently preventing collector to do its normal stuff
np, I am gonna check that on github! Goog, people have the same errors. They seem to fix it by trying some settings, with. or without luck
-> I am not that warrior to check every stuff
Ok, let's stop the datadog events, it works. It does not anymore. You know that sentence. We all know it.
Still not enough!
How about testing that uber super nice feature of v6. The logs. After all I want to make events out of my applicative logs.
How about reading the log again. Configure the yaml log as they say. Done. Make some pattern. Read the best practive. Done. Configures the yaml. Done. Now testing.
-> remote datadog interface be like: no logs for you dude you need to pay
ff***f*f*f
Fuck datadog, fuck that v6 version, good old tail -Fxx | someaggreate.js|sendmail will do...