What is the most difficult problem that you have fixed in linux?

Image

Image alternative text

PropaGandalf, 1 month ago

cool, now find another distro

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ininewcrow, 1 month ago

Sometimes … usually I just hit a wall because I don’t know enough but I know enough to get myself in trouble … so I just stop, reformat, reinstall and start all over.

About the biggest lesson I’ve learned from Linux is not to mess with too many things unless you want to learn about it and have lots of time in your hands.

Otherwise if you find a good distro for your needs, a stick with it, don’t change it, update and backup regularly.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

GravitySpoiled, 1 month ago

Grub.

Seriously. Tha was some fat as shit because I didn’t know what I was doing.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

nul9o9, 1 month ago

I broke my bootloader fucking with uefi settings. I was in a panic for a few hours because I hadn’t bothered to learn how that shit worked until then.

It sure was a relief when i got back into my system.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

passepartout, 1 month ago

Bricked my pc twice because of the bootloader and couldn’t repair it. From now on i just nuke my system if something is fucky and have a shell script do the installing of packages etc.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

eugenia, 1 month ago

Making a Palm Pilot getting a live connection to the internet through an infrared connection (Red Hat Linux). That was circa 2004, and I spent 10 hours, all night on it.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ChojinDSL, 1 month ago

Around 2003-2004. I was still a bit of a Linux noob, just getting to grips with Gentoo.

Had two no-name WiFi adapters that weren’t directly supported under Linux. Found some obscure forum thread that mentioned them, along with which lines in which source code driver to change to make these adapters work.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

mojo_raisin, 1 month ago

Wow nice one! I don’t think anyone outside of Gentoo or LFS would even go there.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

CrabAndBroom, 1 month ago

I have two, one is actually complicated and one was so obtuse that I never would have figured it out in a million years:

Actually complicated: I still don’t know how it happened, but somehow an update on Arch filled the boot partition with junk files, which then caused the kernel update to fail because of no disk space, which then kind of tanked the whole system. It took ages, but with a boot disk and chroot-ing back into the boot partition I eventually managed to untangle it all. I was determined to see it through and not reinstall.

Ridiculous: One day when using Ubuntu, the entire system went upside-down. As in, everything was working perfectly fine, but literally the screen was upside-down. After much Googling I had no luck figuring it out, then I accidentally found the solution - I’d plugged a PS4 controller into the USB on the laptop to charge it, and for some reason Ubuntu interpreted the gyroscope on the controller as “rotate the screen display” so when I moved it, the screen spun round. I only figured it out by accident when I plugged it back it and it spun back to normal lol.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

scytale, 1 month ago

Ridiculous

I had a similar one. I had a usb-powered fan cooling pad that my laptop was sitting on. My laptop would randomly go into boot loops when I turn it on. I thought it was a grub issue so I always had my usb stick ready to re-install grub. Did some dusting one day and forgot to plug in the cooling fan, then the boot loop never happened again. Turns out it was the fan plugged into the usb that was causing it.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

foggy, 1 month ago

I think this is likely related to USB cables as power cables and USB ports/voltages.

I have seen a lamp completely fry a MacBook. I wouldn’t be surprised to see something similar cause a boot loop.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

curiousPJ, 1 month ago

Semi-related note… displayport cables can cause a no-boot condition too. I think it was the existence of Pin#1. I had to duct tape that one pin and my computer finally booted up.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

evidences, 1 month ago

A couple years ago on Reddit I saw a story where a dude working IT support had to drive to a remote office or replace a workstation that wouldn’t boot. When he got there the lady whose desk it was had some shitty USB fan or maybe an led Christmas tree plugged into one of the USB ports. He unplugged that and the pc booted fine.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Hadriscus, 1 month ago (edited 1 month ago)

This is up there with the redacted (just looked it up it’s called the 500-mile email)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Corr, 1 month ago

This is a phenomenal read. Thank you for sharing lol

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

CrabAndBroom, 1 month ago

Ah I remember that one! Classic. I also remember a story about someone who lost an entire PC in their apartment. It was running and connected to the network, they could ping it, but couldn’t physically find it lol.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Hadriscus, 1 month ago

😂 Please ping me if you find it (the story)…

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

mojo_raisin, 1 month ago

This deserves some sort of funniest Linux problem award.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

bruhbeans, 1 month ago

The controller thing is goddam hilarious

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

0110010001100010, 1 month ago

Ridiculous: One day when using Ubuntu, the entire system went upside-down. As in, everything was working perfectly fine, but literally the screen was upside-down. After much Googling I had no luck figuring it out, then I accidentally found the solution - I’d plugged a PS4 controller into the USB on the laptop to charge it, and for some reason Ubuntu interpreted the gyroscope on the controller as “rotate the screen display” so when I moved it, the screen spun round. I only figured it out by accident when I plugged it back it and it spun back to normal lol.

LMAO what the fuck?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

AeroLemming, 1 month ago

Not quite the same, but amusing peripheral issues can happen on Windows, too.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

poopsmith, 1 month ago (edited 1 month ago)

Learned how drivers worked and fixed a driver for an USB to I2C chip. It’s still buggy but at least it sorta works now.

Some more details: I was using a CH347 (USB to UART/SPI/I2C) and there was an open source driver that used a previous chip version. The original dev had hardcoded the bulk IO endpoints indices. The only change I had to do was just iterate over the endpoints and search for the correct ones. But at first, I didn’t understand anything about how the USB subsystem worked and how drivers were loaded. All I could tell was the USB device was correctly detected but the I2C driver wasn’t being loaded, despite proper udev rules, correct vendor/product IDs, etc.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pixelscience, 1 month ago

I’ve generally had good luck with hardware and things just worked under linux. But one day I upgraded a few machines on my network to 2.5G ethernet. Several already had the ports, but my little NUC NAS box didn’t, so I installed a 2.5G usb ethernet dongle. No matter what I did, I couldn’t get it to work. It would show up and NM would act like it was up and there were no errors or anything, but it just wouldn’t actually function.

Eventually, I found out that it has a built in USB data partition that contains the drivers for windows. The card was coming up as a usb disk first when the hardware was assigned and not a network card which it should have been.

I had to write a blacklist the usb modules first, which I had done before, but I had to also write a udev rule to automatically add the network card and driver on boot. It wasn’t that difficult to actually do, but I had just never had to do anything with udev rules before. Took me a good three days of troubleshooting to finally get everything to work correctly on boot.

ACTION==“add”, ATTRS{idVendor}==“20f4”, ATTRS{idProduct}==“e02c”, RUN+=“/sbin/modprobe r8152” RUN+=“/bin/sh -c ‘echo 20f4 e02c > /sys/bus/usb/drivers/r8152/new_id’”

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Diplomjodler, 1 month ago

Fixed a typo in my /etc/fstab that prevented the NAS from mounting. I am a bear of little brain. But I’m also proof that you don’t have to be some master hacker to successfully run Linux.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

NaoPb, 1 month ago

This is something I’ve had to do a few times.

Saved me from reinstalling. Made me realise that there really should be an alternative to typing into fstab by hand since us humans will make mistake. Either that or make fstab nog crash completly on an error but just skip it.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Jayjader, 1 month ago

I have no idea how widespread it is among other distros, but ArchLinux’s bootable install disk/iso comes with a genfstab command that snapshots your current mount points and outputs it as a fstab.

You still need to figure out where and how to mount everything yourself, but at least it saves you from most typos that could otherwise end up in the fstab file.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

NaoPb, 1 month ago

That’s nice.

I know that the disk utilty in Ubuntu gives you the option to automatically mount a (secondary) disk at boot. It adds it to fstab for you.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

mariusafa, 1 month ago

Upgrading the system I removed glibc from the system (Debian). apt wasn’t working, etc. Had to manually fix dependencies and everything. Currently my working OS so all fixed.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

bruhduh, 1 month ago

Installed fedora on btrfs and upgraded from 38 to 39 week after installation, everything broke so bad, even ssd which was used for it locked, not just filesystem, ssd was new btw

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Samsy, 1 month ago

My first home server would get lost on the network every week, at different times and without any apparent reason. I performed hard resets by unplugging and plugging it back in.

After several months, I decided to connect a screen to it, and I initially thought it had hung up, but it hadn’t. After some investigation, I discovered that every time my router obtained a new dynamic IP address, the server lost its network connection, requiring a reset. I wrote a script to check the network connection every minute, and if it’s lost again, it will be reset.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

folkrav, 1 month ago

Some of the crap I had to do back in the late 00s to get wifi, sleep and power management even barely working on some machines felt like the hardest thing at the time. I wonder how I’d fare with those issues today, 17 years later, knowing quite a bit more about the underlying OS and working with the OS daily… I don’t know that I’d qualify that as difficult more than it was extremely tedious and a bunch of trial and error of configuration options I didn’t know anything about.

If we’re talking about modern day… not so much honestly. btrfs snapshots saved my ass a couple of times, the rare issue I encounter I just rollback and wait for an upstream fix, and the rest I typically ignore or use something else. Everything tends to run quite smooth for me as a general rule, though.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Nilz, 1 month ago

For me it was migrating my Arch install from EXT4 to ZFS. GRUB had to be configured in particular ways to get it to work with ZFS and I didn’t do it properly so it wouldn’t/couldn’t boot.

Then I updated ZFS to a version that wasn’t supported by GRUB yet so I chrooted into my installation to switch to Systemd-boot with Unified Kernel Images. Now I still can’t figure out how to add a boot entry for Windows. I followed the proper steps I think but selecting the Windows entry just reloads Systemd-boot.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Limonene, 1 month ago

A couple months ago, I made a Palworld server box out of a spare motherboard assembly (mobo, processor, ram) from a computer I had recently upgraded.

I didn’t have any spare drives lying around, so I plugged in 7 USB flash drives and made them into a RAID array. Not a true RAID array, but a BTRFS filesystem with volumes spread onto each flash drive, with the data redundancy set to raid1, and the metadata redundancy set to raid1c3.

It worked… in the sense that I never lost any data. It certainly didn’t work in the sense of having good uptime.

The first problem was getting it to boot right. The boot line in GRUB had “root=UUID=…” instead of a specific drive named. That is normal. However, in BTRFS multi-volume filesystems, all the volumes have the same UUID. So the initrd was only waiting for a single drive matching that UUID, then trying to mount it as the root filesystem. This failed, because the kernel had not yet set up the other 6 USB drives, and this BTRFS filesystem needs all 7 volumes present. Maybe 6, if you used the “degraded” mount option.

The workaround was to wait for this boot process to fail, at which point you get dropped into an initrd shell. Then, you look at all the drives and make sure they’re all there. And then… I don’t exactly remember what happened next. I think it was some black magic that erases your mind in the process. I somehow got it booted from the initrd shell.

Installing Steam and the Palworld server worked ok, and it even ran for a few hours before crashing overnight.

The next morning, I tried rebooting it. Unfortunately, the USB drives weren’t all appearing. Turns out the motherboard had some bad USB ports, some sometimes-bad USB ports, and a maybe-bad PCIe bus, because the PCIe USB expansion card I plugged in had weird problem that it had never had before.

I found the most reliable ports and plugged the drives in there. But you can’t just replug them in the initrd. It doesn’t have USB hotplug support. So each time it tried to boot with not all the drives there, I restarted it again until one time I finally had all the drives.

I changed the GRUB boot line to “root=/dev/sdg1” . This made it wait for all the drives to load, in any order, and whichever one was last would be mounted as the root filesystem (but the kernel would automatically include all the others too, since they were successfully initialized).

The bad USB ports kept bringing down the server every day or two. I bought a cheap NVMe drive and added it to the BTRFS filesystem, and then removed all the USB drives except the largest. That fixed the reliability. It’s been like that since.

Now, to boot the server, all I have to do is change the GRUB boot line to “root=/dev/sdb1” . Since the NVMe drive is much faster than the USB drive, it always initializes first. If the initrd waits for sdb2, then it will always have both drives initialized when it tries to mount the root filesystem.

I could add that to the grub.cfg, or come up with some other more permanent solution, but I’m not planning on rebooting this server ever again. My friends fell off Palworld, and I gave a shutdown date that’s about a week away. And the electricity is pretty reliable here.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

MentalEdge, 1 month ago (edited 1 month ago)

I manage a machine that runs both media transcodes and some video game servers.

The video game servers have to run in real-time, or very close to it. Otherwise players using them suffer noticeable lag.

Achieving this at the same time that an ffmpeg process was running was completely impossible. No matter what I did to limit ffmpegs use of CPU time. Even when running it at lowest priority it impacted the game server processes running at top priority. Even if I limited it to one thread, it was affecting things.

I couldn’t understand the problem. There was enough CPU time to go around to do both things, and the transcode wasn’t even time sensitive, while the game server was, so why couldn’t the Linux kernel just figure it out and schedule things in a way that made sense?

So, for the first time I read up on how computers actually handle processes, multi-tasking and CPU scheduling.

As FFMPEG is an application that uses ALL available CPU time until a task is done, I came to the conclusion that due to how context switching works (CPU cores can only do one thing, they just switch out what they do really fast, but this too takes time) it was causing the system to fall behind on the video game processes when the system was operating with zero processing headroom. The scheduler wasn’t smart enough to maintain a real-time process in the face of FFMPEG, which would occupy ALL available cycles.

I learned the solution was core pinning. Manually setting processes to run on certain cores of the CPU. I set FFMPEG to use only one core, since it doesn’t matter how fast it completes. And I set the game processes to use all but that one core, so they don’t accidentally end up queueing for CPU time on a core that doesn’t have the headroom to allow the task to run within a reasonable time range.

This has completely solved the problem, as the game processes and FFMPEG no longer wait for CPU cycles in the same queue.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Waffelson, 1 month ago

This reminded me of how I disabled processor cores in Process Lasso for programs

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

flambonkscious, 1 month ago

Well that’s interesting… I’d have thought, possibly naively, that as long as a thread had work to do it would essentially behave like ffmpeg does?

Perhaps there’s something about the type of work though, that it’s very CPU-bound or something?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

MentalEdge, 1 month ago (edited 1 month ago)

I think the difference is simply that most processes only have a certain amount that needs accomplishing in a given unit of time. As long as they can get enough CPU time, and do so soon enough after getting in line for it, they can maintain real-time execution.

Very few workloads have that much to do for that long. But I would expect other similar workloads to present the same problem.

There is a useful stat which Linux tracks in addition to a simple CPU usage percentage. The “load average” represents the average number of processes that have requested CPU time, but have to queue for it.

As long as the number is lower than the available number of cores, this essentially means that whenever one process is done running a task, the next in line can get right on with theirs.

If the load average is less than the number of cores available, that means the cores have idle time where they are essentially just waiting for a process to need them for something. Good for time-sensitive processes.

If the load average is above the number of cores, that means some processes are having to wait for several cycles of other processes having their turn, before they can execute their tasks. Interestingly, the load average can go beyond this threshold way before the CPU hits 100% usage.

I found that I can allow my system to get up to a load average of about 1.5 times the number of cores available, before you start noticing it when playing on one of the servers I run.

And whenever ffmpeg was running, the load average would spike to 10-20 times the number of cores. Not good.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

flambonkscious, 1 month ago

That makes complete sense - if you’ve got something ‘needy’, as soon as it’s queuing up, I imagine it snowballs, too…

10-20 times the core count is crazy, but I guess it’s had a lot of development effort into parallelizing it’s execution, which of course goes against what your use case is :)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

MentalEdge, 1 month ago (edited 1 month ago)

Theoretically a load average could be as high as it likes, it’s essentially just the length of the task queue, after all.

Processes having to queue to get executed is no problem at all for lots of workloads. If you’re not running anything latency-sensitive, a huge load average isn’t a problem.

Also it’s not really a matter of parallelization. Like I mentioned, ffmpeg impacted other processes even when restricted to running in a single thread.

That’s because most other processes will do work in small chunks that complete within nanoseconds. Send a network request, parse some data, decode an image, poll HID device, etc.

A transcode meanwhile can easily have a CPU running full tilt for well over a second, working on just that one thing. Most processes will show up and go “I need X amount of CPU time” while ffmpeg will show up and go “give me all available CPU time” which is something the scheduler can’t actually quantify.

It’s like if someone showed up at a buffet and asked for all the food that no-one else is going to eat. How do you determine exactly how much that is, and thereby how much it is safe to give this person without giving away food someone else might’ve needed?

You don’t. Without CPU headroom it becomes very difficult for the task scheduler to maintain low system latency. It’ll do a pretty good job, but inevitably some CPU time that should have gone to other stuff, will go the process asking for as much as it can get.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Add comment