[ECOS] ixp425 pci accesses crash in redboot

Wed Sep 30 23:54:00 GMT 2009

>I am using a IXP425 and a custom PCI target device,
>
>The device enumeration and configuration all go well in redboot, but
>sometimes (one in 200 reboots or so) I get a data abort exception that
>ends up in the GDB stubs when I try to read from the PCI device using
>PCI Memory Window transactions (reading/writing to/from
>(0x49000000+offset)). I I get past this stage, the unit will stay up
>forever happily serving PCI access. Only the reboots are dangerous.
>
>When this happens, I have used gdb to read the pci configuration
>parameters from the PCI target controller (0xC0000000 ->), and all is
>as it should be (except that the PCI fatal error flag is raised) .
>What I wonder is what can cause this behaviour, and how I should
>handle it. Can this be caused by my PCI device doing an abort, by some
>MMU/cache setting in the IXP425 or similar?

I don't have experience with the IXP425 in particular - only with other
IO processors of the same family.

Likely candidates are:
- your device doing a target abort.
- your code accessing a non-existing device - causing a master abort.
- your code accessing your device when it's not ready yet - also
resulting in a master abort.
- If your device can also act as a bus master ... some nasty things
might happen if it already acts as a bus master but your central chip
isn't set up for bus mastering (yet).

The data abort exception itself might also tell you a little more:
- If it's a precise abort the first 3 candidates are more likely.
- If it's an imprecise abort the last candidate might be more likely.

The PCI error registers might contain more detailed info.
The ultimate way to find out:  hook up a PCI Bus (or Logic) Analyzer and
see what causes the fatal PCI error.

If you have to deal with those fatal PCI errors:  setup your own
exception handler that tells your PCI initialization code such a beast
has occurred.  (That's what I did:  I have plug'n'play code.  And
candidate 2 shows up whenever a device is not present.)

MMU / cache settings I consider less likely.  If those are wrong I'd
expect you to see more trouble.

  Kurt

--
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss