[PATCH v5 3/7] start/stop btrace with coresight etm and parse etm buffer. nat independant

Fri May 7 01:52:41 GMT 2021

hi Markus,

thanks for your support. the patch was reworked with your imput. below 
is the status

On 30.04.21 10:33, Metzger, Markus T wrote:
> Hello Zied,
>
>> Would this guarantee that we resume from the same PC where we stopped
>> previously if the reason is TRACE_ON_NORMAL?
>> [Zied] this is not guaranteed, we may land at a different PC. The landing address will be controlled by the filtering and triggering mechanism in ETM block. the user can trigger stopping the traces at a certain address and resuming it on another address.
> We should add a gap whenever the trace is not contiguous.
>
>
>> +      DEBUG ("ocsd_dt_create_decoder failed");
>> +      return false;
>>
>> Is there some error code/message that would help the user understand why we
>> failed to create a decoder?
>> [Zied] The OpenCsd library is not offering such a helper function, so the error to string conversion is done in btrace.c . the issue is reported to opencsd developer.
> We just say that things don't work.  We don't say what the actual
> error was.  Even if there is no string representation, having an error
> number would already help, e.g. in web searches.  Or you could look
> at the decoder sources.
[Zied] error code is now reported in the message. e.g warning 
(_("ocsd_dt_create_decoder failed with error: %d"), errcode);
>> +  for (int i = 0; i < cpu_count; i++)
>> +    {
>> +      ret = cs_etm_create_decoder (&(t_params->at (i)), decoder);
>> +      if (!ret)
>> +	{
>> +	  DEBUG ("cs_etm_create_decoder failed");
>> +	  cs_etm_free_decoder (decoder);
>>
>> We put all the error details in debug messages.  This would be relevant for the
>> user, too.  And if we could get some error code/message from the library that
>> would help, as well.
>> [Zied] cs_etm_create_decoder is reporting all encountered errors through debug messages
> My point was that those errors should not be debug messages but instead
> make it to the user with sufficient detail, even if it is just an integer error code.
[Zied] idem
>> +      warning (_("Error 0x%" PRIx32 " in decoding ARM CoreSight ETM Traces"
>> +      		" at offset %zu."), errcode, consumed);
>> +      if (errcode == OCSD_RESP_FATAL_INVALID_DATA)
>> +	ftrace_new_gap (btinfo, errcode, decoder->gaps);
>>
>> A gap is used to indicate that the trace it not contiguous.  How about the other
>> error codes?
>> [Zied] OpenCsd offers an interface to register a call back for reporting error logs during data processing. I added it.
>> There is a shortage in the api in reporting decoding errors programmatically, to handle it properly and possibly recovering from it. I reported this to the developer of opencsd.
>> this function was reworked to attempt to recover in case of failure.
> We insert a gap for OCSD_RESP_FATAL_INVALID_DATA.  Should we
> insert gaps for other error codes, too?  Or maybe check the PC and
> insert a gap when the trace is not contiguous?

[Zied] this is the list of error codes of data path

typedef enum _ocsd_datapath_resp_t {
     OCSD_RESP_CONT,                /**< Continue processing */
     OCSD_RESP_WARN_CONT,           /**< Continue processing  : a 
component logged a warning. */
     OCSD_RESP_ERR_CONT,            /**< Continue processing  : a 
component logged an error.*/
     OCSD_RESP_WAIT,                /**< Pause processing */
     OCSD_RESP_WARN_WAIT,           /**< Pause processing : a component 
logged a warning. */
     OCSD_RESP_ERR_WAIT,            /**< Pause processing : a component 
logged an error. */
     OCSD_RESP_FATAL_NOT_INIT,      /**< Processing Fatal Error :  
component unintialised. */
     OCSD_RESP_FATAL_INVALID_OP,    /**< Processing Fatal Error :  
invalid data path operation. */
     OCSD_RESP_FATAL_INVALID_PARAM, /**< Processing Fatal Error :  
invalid parameter in datapath call. */
     OCSD_RESP_FATAL_INVALID_DATA,  /**< Processing Fatal Error :  
invalid trace data */
     OCSD_RESP_FATAL_SYS_ERR,       /**< Processing Fatal Error :  
internal system error. */
} ocsd_datapath_resp_t;

OCSD_RESP_FATAL_INVALID_DATA is the one related to invalid/corrupted 
data, I will check if  OCSD_RESP_FATAL_SYS_ERR can also point to a gap

>> -      /* We can only provide the PC register.  */
>> -      if (regno >= 0 && regno != pcreg)
>> +      /* We can only provide the PC or CPSR registers here.  */
>> +      if (regno >= 0 && !(regno == pcreg || regno == ARM_PS_REGNUM))
>> 	return;
>>
>> This is generic code.  ARM_PS_REGNUM may mean something different
>> for other architectures.
>>
>> I see how we handle it below but it is still confusing at this point.  Do we
>> even need these two levels of checks?  We're repeating the check again
>> below since we need to do different things.
>>
>> It would be cleaner to first check the architecture but I see how this is
>> more efficient.  Let's keep it that way until we need to handle more
>> registers or more architectures.
>> [Zied] this piece of code is called often, so I optimized it to return as soon as possible. This will save checking the architecture each time (and doing a string comparison).
>> do you know a better way to check the target (intel or arm or aarch64) without using the string name (an enumeration for example)?
> I see the point in first checking the register number for performance
> reasons.  I was wondering why we want two levels of checks.
>
> How about something like this?
>
> If ((regno < 0) || (regno == pcreg))
>    {
>      <handle pcreg>
>    }
> If ((regno < 0) || (regno == other_reg))
>    {
>      <handle other_reg>
>    }
> ...
[Zied] Changed to use this pattern.
>> +	  /*  Provide CPSR register in the case of an armv7 target.  */
>> +	  const struct target_desc *tdesc = gdbarch_target_desc (gdbarch);
>> +
>> +	  const char *tdesc_name = tdesc_architecture_name (tdesc);
>> +	  if (strcmp (tdesc_name, "arm") == 0)
>>
>> Is that sufficient?
>> [Zied] : tdesc_name can be arm, aarch64, etc ...at this point it should be enough. Do you have other limitations in mind?
> That's what I meant.  Shouldn't we check for aarch64, too?
> Or is this really only relevant for 'arm'?
[Zied]: it is only relevant for arm. the debug IP was improved in armv8 
(aarch64)
>> +/* Parameters needed for decoding ETMv3 traces.
>> +   See open source coresight trace decoder library (opencsd)
>> +   for further details.  */
>> +struct cs_etmv3_trace_params
>>
>> Aren't those structures declared in the decoder's header file?
>>
>> We shouldn't repeat those declarations here.
>> [Zied] This is an intermediate form close to the parameters as collected for the sys file system or the remote protocol fields. They are converted later on to the parameters as required by opencsd library.
> I don't think we should declare things in GDB that are
> declared already somewhere else and maintained in that
> other location.  It's too easy to get out of sync.
>
> If we convert them later, can we already use the final
> structures, here?
[Zied] those structure has more members than the one required by 
opencsd, designed for usage with an RTOS instead of linux.
>> +  /* The size of DATA in bytes.  */
>> +  size_t size;
>> +
>> +  /* Trace id for this thread.  Set to 0xFF to ignore it during parsing.  */
>> +  int trace_id;
>>
>> Does that imply that trace_id's are in the range 0..0xff-1?  In that case, uint8_t
>> would be more appropriate.
>>
>> Is that a realistic limit?  How would we handle more threads than we have id's for?
>> [Zied] on a linux system, trace_id is assigned per cpu. and the kernel copies the traces of each thread in a dedicated ring buffer. by this, the traces belonging to different threads are de-multiplexed. On an RTOS system, especially when routing the traces outside of the SoC, the OS has no other mean for de-multiplexing the traces than the trace_id. The hardware (ETM IP) reserves 7 bits for the trace_id. The system limit, in regards to tracing, is then 111 running threads at the same time (see DDI0314H paragraph 9.6 for the limitations),  but this is not a big issue, since on those small systems, usually few threads are running.
>> on linux system, where we would like to ignore the per trace id demux, we set the trace_id to a not valid value namely 0xFF.
> This should be documented, here.  It's a bit odd why a value
> in the middle of the range is reserved as invalid.  But it's really
> that only 128 values are actually valid.
>
> We may also want to change the type.  Or, if we anticipate this
> limit to grow, leave a bigger type and document allowed values.
>
> If we're free to choose the 'invalid' value, I'd rather pick one at
> the edge of the range.
[Zied] type changed to uint8_t instead of int.
> Regards,
> Markus.
> Intel Deutschland GmbH
> Registered Address: Am Campeon 10, 85579 Neubiberg, Germany
> Tel: +49 89 99 8853-0,www.intel.de  <http://www.intel.de>
> Managing Directors: Christin Eisenschmid, Sharon Heck, Tiffany Doon Silva
> Chairperson of the Supervisory Board: Nicole Lau
> Registered Office: Munich
> Commercial Register: Amtsgericht Muenchen HRB 186928
-- 

*Zied Guermazi*
founder

Trande UG
Leuschnerstraße 2
69469 Weinheim/Germany

Mobile: +491722645127
mailto:zied.guermazi@trande.de <mailto:zied.guermazi@trande.de>

*Trande UG*
Leuschnerstraße 2, D-69469 Weinheim; Telefon: +491722645127
Sitz der Gesellschaft: Weinheim- Registergericht: AG Mannheim HRB 736209 
- Geschäftsführung: Zied Guermazi

*Confidentiality Note*
This message is intended only for the use of the named recipient(s) and 
may contain confidential and/or privileged information. If you are not 
the intended recipient, please contact the sender and delete the 
message. Any unauthorized use of the information contained in this 
message is prohibited.