[ECOS] bugs in AT91 Ethernet driver (EMAC)

Jürgen Lambrecht J.Lambrecht@televic.com
Wed Jun 4 15:05:00 GMT 2008


Hello guys,

I still have problems with the TX part: sometimes the SW pointer 
(tbd_idx) and the HW counter/pointer get out of sync (see 
http://ecos.sourceware.org/ml/ecos-discuss/2008-06/msg00016.html).
The datasheet is not clear about this.
When there is a TXERR IRQ, then de pointers are/have to be reset.
But any IRQ (also TX and RX OK) calls deliver(). So I have to read the 
TSR register. The problem is that there is no TXERR bit in that 
register... the BEX bit is the most alike.
I don't know how to solve this in a good way - except for trial-and-error.
I will also send an email to my Atmel supplier..

What is your opinion (having read at least the EMAC datasheet) ?

Kind regards,
Jürgen

Lambrecht Jürgen wrote:
> Major bug found in TX part - see below.
>
> Andrew Lunn wrote:
>   
>>> During debugging, I see that 'priv->tx_busy' stays 'true', because the  
>>> TX-Complete bit (COMP of TSR reg) is never set - I also see that the  
>>> TX-Used Bit Read (UBR of TSR reg) is set, which is normal I think  
>>> because the TX buffers have not yet been released (because COMP is not  
>>> set..). But the TX-GO bit is 0, meaning that transmit is not busy.
>>>
>>> I added code to release the TX buffers and to clear 'priv->tx_busy' when  
>>> TX-GO is read 0. Then at each ARP request, there is an ARP reply, but it  
>>> does not get out.. (monitoring with Wireshark)
>>> What's happening - it is like the EMAC TX is blocked, but why?
>>>       
>> It might be worth printing out the value of isr in at91_eth_isr. As
>> the comment says, error conditions are not handled. Maybe you are
>> getting a buffer under run?
>>     
> Indeed Andrew, looking at the value of the ISR lead me to the cause of error,
> and hence the solution. 
>   The TX status of the ISR can be read in AT91_EMAC_TSR. The problem is bit 0
> of this register: UBR = "Used Bit Read". 
>   The datasheet (DS) is very unclear on this point (both the ATSAM9260 DS and
> ATSAM7X DS have completely the same section about their EMAC). But after a full
> day debugging, I'm quite sure I'm right.
>   So, that Used Bit indicates if the concerning packet buffer (with sg_list[i]
> data) has already been used by the EMAC. This Used Bit must be set to 0 by the
> user SW, and is set to 1 by the EMAC. In case of the TX EMAC, the Used Bit is
> set to 1 by the EMAC after transmit of the concerning buffer.
>   The TX EMAC holds a private counter/pointer (read-only in TBQP) to the
> transmit buffer descriptor list - the start location is the write-only
> TBQP. (Only the RX part is explained in the DS). Normally this counter/pointer
> cycles trough the circular transmit buffer descriptor list.
>   Now, after transmit of the last buffer (that last buffer has to be marked by
> the SW with a EOF flag) the TX EMAC increases its TX counter/pointer to already
> point the next TX buffer. But that next TX buffer - that is free of course - is
> marked by the SW as "Used" because the Used Bit is set. And therefore the UBR
> error is flagged. And therefore the TX EMAC *resets its counter/pointer* !!
> Because the SW does not reset its tbd_idx pointer, both pointers are out of
> sync so all further transmits are blocked. 
>   The major bug is that the if_at91.c SW rigorously sets the Used Bit (in the
> buffer descriptor) of all free buffers to 1, meaning "Used". A the end of the
> EMAC DS (36.4.1.3 for SAM9260 vG and 37.4.1.3 for SAM7 vG point 2): "Mark all
> entries in this list as owned by EMAC, i.e. bit 31 of word 1 set to 0."
> Therefore I changed at91_tb_init() and at91_reset_tbd() accordingly.
>   Solving the "Used Bit" issue is not enough. All the TX errors UBR, RLE, BEX
> and UND cause the counter/pointer to be reset. Therefore I changed the
> deliver() function and added an extra argument to at91_reset_tbd() to reset the
> SW tbd_idx pointer. I also added those TX errors to the IRQ IER and ISR.
>
> In attachment a diff - it also contains my previous changes. I also added some
> comments, and added '#ifdef CYGINT_IO_ETH_INT_SUPPORT_REQUIRED' to the start
> and stop functions. And I added more debug printing - even in the ISR..
> I tested this fix during 5 minutes - long enough to see that a ping finally
> works... 
>   There are of course still some problems - the ping reply is not synchronized
> with the ping request: it runs 1 ping iteration behind - more of it in my next
> mail. 
>
> Kind regards,
> Juergen
>
> P.S.: The current code can only work if the driver is initialized after each
> TX, or because each tx uses all buffers so that the pointer is wrapped.
> I guess redboot must work this way??
> Or am I still missing something?
>
>   



-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss



More information about the Ecos-discuss mailing list