@azonenberg@ioc.exchange
@azonenberg@ioc.exchange avatar

azonenberg

@azonenberg@ioc.exchange

Security and open source at the hardware/software interface. Embedded sec @ IOActive. Lead dev of ngscopeclient/libscopehal. GHz probe designer. Open source networking hardware. "So others may live"

Toots searchable on tootfinder.

This profile is from a federated server and may be incomplete. Browse more on the original instance.

azonenberg, to random
@azonenberg@ioc.exchange avatar

Not sure if this is a gcc bug or some weird corner of UB or what...

But I have a packed struct containing a uint32 as the first field. I'm running on ARMv7-M so 32-bit unaligned loads are allowed (but not 64-bit).

This struct is being read directly via casting from a network RX buffer that is likely not aligned to any particular byte boundary. It's a) packed and b) has 32-bit fields in it.

So silly me assumed that gcc would generate either bytewise reads (assuming no alignment at all) or a ldr instruction (accepting that 32-bit unaligned loads are OK).

But for some reason at -O3 it generates a 64-bit read with ldrd, which promptly hard faults. I have no idea why it's doing that given that I was just __builtin_bswap32'ing a single 32-bit field.

Was able to work around the issue with memcpy, but seriously WTF? If I'm using a packed struct I'm explicitly telling the compiler not to make any assumptions about alignment because I'm directly serializing the data from somewhere. Where did it magically get the idea that my packed 32-bit field had 64-bit alignment?

azonenberg,
@azonenberg@ioc.exchange avatar

@whitequark The "fixed" (not crashy) version is

char* tmp = GetPathStart() + GetPathLength() + sizeof(uint32_t);
uint32_t n;
memcpy(&n, tmp, sizeof(n));
return __builtin_bswap32(n);

azonenberg,
@azonenberg@ioc.exchange avatar

@whitequark (note that the overall SFTPOpenPacket object may have any alignment since it's coming out of a network rx buffer, not allocated on the stack or anything).

azonenberg,
@azonenberg@ioc.exchange avatar

@whitequark (this is using Debian bookworm packaged arm-none-eabi-g++ (15:12.2.rel1-1) 12.2.1 20221205)

azonenberg,
@azonenberg@ioc.exchange avatar

@pkhuong @whitequark I mean, this struct was created from a larger buffer (ultimately it's an EthernetFrame which is a fixed MTU-sized static buffer, then I have a struct at the beginning for the fixed-sized fields then offset off the end of that).

Not sure how else you'd handle this?

azonenberg,
@azonenberg@ioc.exchange avatar

@whitequark No explicit -march but -mcpu=cortex-m7

azonenberg,
@azonenberg@ioc.exchange avatar

@whitequark Full CFLAGS (other than macro defines and include search paths) are -g -O3 --std=c++17 -Wall -Wextra -fno-rtti -fno-exceptions --specs nano.specs -mcpu=cortex-m7

azonenberg,
@azonenberg@ioc.exchange avatar

@whitequark Interesting. There's probably some additional context required to trigger the bug? I don't have time to really troubleshoot this at the moment since I have a workaround.

azonenberg,
@azonenberg@ioc.exchange avatar

@whitequark This was the actual offending code as generated

azonenberg,
@azonenberg@ioc.exchange avatar

@whitequark I didn't fully trace out what register had what values, this was optimized code so a pain to debug (a lot of the values I wanted to dump to troubleshoot were optimized out).

Got a workaround and went on with my development, now chasing a different bug that looks to be an actual application logic error of some sort.

azonenberg,
@azonenberg@ioc.exchange avatar

@whitequark If it helps the actual SFTPOpenPacket comes from this

auto payload = reinterpret_cast<SFTPOpenPacket*>(pack->Payload());
payload->ByteSwap();
OnRxOpen(id, state, socket, payload);

azonenberg,
@azonenberg@ioc.exchange avatar

@whitequark where "pack" is a SFTPPacket* which itself comes from

while(IsPacketReady(state))
{
auto pack = reinterpret_cast<SFTPPacket*>(state->m_rxBuffer.Rewind());
pack->ByteSwap();
OnRxPacket(id, state, socket, pack);
state->m_rxBuffer.Pop(pack->m_length + sizeof(uint32_t));
}

azonenberg,
@azonenberg@ioc.exchange avatar

@whitequark There should be no expectation of 64-bit alignment on any of this. Ultimately "pack" in this outer code should be on at least a 32-bit boundary since state->m_rxBuffer was rewound (this is a circular FIFO but "rewind" rotates the buffer so the contents are contiguous and start at address 0 of the buffer) and that buffer is a large array so I would expect it to be aligned to some extent.

But once I start parsing out variable-length fields from within it, all bets are off.

azonenberg,
@azonenberg@ioc.exchange avatar

@pkhuong @whitequark How else can I make a member function that does this?

I can't store a pointer to the outer buffer in the object because I can't have any variables that aren't part of the packet itself.

azonenberg,
@azonenberg@ioc.exchange avatar

@steve @whitequark So, on ARMv7-M it's legal to load a uint32_t (ldr) from any alignment so even if it is unaligned, it shouldn't matter.

What's not legal is to load an unaligned uint64_t (ldrd).

The bug is that gcc took a type that should only have guaranteed 32-bit alignment and emitted an instruction that required 64-bit alignment.

azonenberg,
@azonenberg@ioc.exchange avatar

@steve @whitequark Doesn't matter. The bug isn't that it took an odd-aligned uint32 and treated it as 32-bit aligned.

The bug is that it took a 32-bit aligned value and treated it as 64-bit aligned. Nothing in the code gave gcc permission to assume this uint32_t* was on a 64-bit boundary rather than an odd 32-bit index.

azonenberg,
@azonenberg@ioc.exchange avatar

@steve @whitequark More specifically, given uint32_t* foo = 0xdeadbee4, ldr will work and ldrd will segfault.

There was no reason to issue a 64-bit load in the first place, this is a 32-bit variable.

azonenberg,
@azonenberg@ioc.exchange avatar

@whitequark @steve ... wait, what? Ok now I'm confused.

azonenberg,
@azonenberg@ioc.exchange avatar

@whitequark @steve So if I __builtin_assume_aligned(1) on the pointer would that be a proper way to tell gcc it's not allowed to assume 4 byte alignment?

azonenberg,
@azonenberg@ioc.exchange avatar

@whitequark @steve Ultimately what I want is a way to reliably generate a ldr instruction without worrying about tiny code or toolchain changes turning it into a ldrd.

azonenberg,
@azonenberg@ioc.exchange avatar

@whitequark @steve Preferably without intrinsics but if I have no choice, I guess I will...

azonenberg,
@azonenberg@ioc.exchange avatar

@pmdj @pkhuong @whitequark @steve Yep, just confirmed that is in fact the case in gcc output.

So now I just need to rewrite all of my deserialization code to do this, and write some inline wrappers around this to make it less ugly...

azonenberg,
@azonenberg@ioc.exchange avatar
azonenberg, to random
@azonenberg@ioc.exchange avatar

New thread on my big ongoing embedded project since the other one was getting too big.

To recap, this is a pilot project for a bunch of my future open hardware T&M and networking projects, validating a common platform that a lot of the future stuff is going to run on.

The primary problem it's trying to address is that I have a lot of instrumentation with trigger in/out ports, sometimes at different voltage levels, and I don't always have the same instrument sourcing the trigger every time.

So rather than moving around cables all the time and adding splitters, attenuators, amplifiers, etc. to the trigger signals I decided to make a dedicated device using an old XC7K70T-2FBG484 I had lying around.

Of course, as with any project, there was feature creep.

I'm standardizing on +48V DC for powering all of my future projects as it's high enough to move a lot of power but low enough to be mostly safe to work around live. So I needed to design and validate an intermediate bus converter to bring the 48 down to something like 12 for the rest of the system to use.

The FPGA has four 10G transceiver pairs on it. I used one for 10GbE (not that I need the bandwidth, but I was low on RJ45 ports on this bench and had some free SFP drops) and the rest are hooked up to front panel SMA ports (awaiting cables to go from PCB to panel) to generate PRBSes for instrument deskew.

Since I'm pinning out the transceivers and am planning to build a BERT eventually, I added BERT functionality to the firmware as well (still need to finish a few things but it's mostly usable now).

And since I have transceivers and access to all of the scope triggers, it would be dumb not to build a CDR trigger mode as well. That's in progress.

azonenberg,
@azonenberg@ioc.exchange avatar

I'm now thoroughly confused.

I can now OTA flash the bitstream, but the image is subtly corrupted.

I tried three times with different SPI clock frequencies, and each time the exact same, byte for byte, incorrect bitstream is burned to the flash. So it seems pretty clear this is an application logic bug and not SI (which I would expect to have a more nondeterministic effect).

The bug manifests itself as two corrupted bytes, repeating on 1024 byte boundaries for a while, then something else.

For example, starting at 0x10b60, 10f60, 0x11360, etc. I have 0x60 03 overwriting the intended bitstream values. This persists up to 0x17f60.

Then at 18b44, 18f44, etc. I have 0x44 03 overwriting the intended values.

And so on. Weird 2-byte corruptions repeating at 1 kB boundaries. The flash write blocks are 512 bytes and the SFTP packets are mostly 1024 bytes, so it might have something to do with that... This is gonna be fun to debug.

azonenberg,
@azonenberg@ioc.exchange avatar

Hmm, seems like corruption in the RX circular FIFO. Making the buffer larger fixed it, but my error detection should have logged an overflow. Great, now I have something else to poke at...

azonenberg,
@azonenberg@ioc.exchange avatar

Anyway, SFTP-based OTA flashing of the FPGA now works.

There's no multiboot support yet, so if things go sideways mid-flash you need to repeat the flash attempt without losing power or the board will soft-brick.

At some point I'll work on full multiboot with A/B images, but I think I've been shaving infrastructure yaks for far too long and now is a good time to shift gears to feature development again.

Serial console log showing SFTP login and flash being erased/blank checked
Serial console showing bitstream headers being written to flash

  • All
  • Subscribed
  • Moderated
  • Favorites
  • megavids
  • modclub
  • DreamBathrooms
  • mdbf
  • magazineikmin
  • InstantRegret
  • Durango
  • Youngstown
  • slotface
  • thenastyranch
  • love
  • kavyap
  • GTA5RPClips
  • normalnudes
  • tester
  • khanakhh
  • ngwrru68w68
  • everett
  • osvaldo12
  • rosin
  • ethstaker
  • Leos
  • anitta
  • cubers
  • tacticalgear
  • cisconetworking
  • provamag3
  • JUstTest
  • All magazines