-
Notifications
You must be signed in to change notification settings - Fork 324
Initial version of transactions tracking implementation for DSM #9899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This comment has been minimized.
This comment has been minimized.
BenchmarksStartupParameters
See matching parameters
SummaryFound 1 performance improvements and 0 performance regressions! Performance is the same for 55 metrics, 9 unstable metrics.
Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.56.0-SNAPSHOT~39426c4fe6, baseline=1.59.0-SNAPSHOT~3d62379c90
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.086 s) : 0, 1086409
Total [baseline] (8.758 s) : 0, 8757586
Agent [candidate] (1.095 s) : 0, 1095454
Total [candidate] (8.758 s) : 0, 8758101
section iast
Agent [baseline] (1.225 s) : 0, 1224792
Total [baseline] (9.351 s) : 0, 9350999
Agent [candidate] (1.228 s) : 0, 1228401
Total [candidate] (9.333 s) : 0, 9333357
gantt
title insecure-bank - break down per module: candidate=1.56.0-SNAPSHOT~39426c4fe6, baseline=1.59.0-SNAPSHOT~3d62379c90
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.193 ms) : 0, 1193
crashtracking [candidate] (1.189 ms) : 0, 1189
BytebuddyAgent [baseline] (652.993 ms) : 0, 652993
BytebuddyAgent [candidate] (656.568 ms) : 0, 656568
GlobalTracer [baseline] (283.763 ms) : 0, 283763
GlobalTracer [candidate] (288.003 ms) : 0, 288003
AppSec [baseline] (32.425 ms) : 0, 32425
AppSec [candidate] (33.224 ms) : 0, 33224
Debugger [baseline] (66.543 ms) : 0, 66543
Debugger [candidate] (66.754 ms) : 0, 66754
Remote Config [baseline] (609.364 µs) : 0, 609
Remote Config [candidate] (622.017 µs) : 0, 622
Telemetry [baseline] (8.806 ms) : 0, 8806
Telemetry [candidate] (8.861 ms) : 0, 8861
Flare Poller [baseline] (4.603 ms) : 0, 4603
Flare Poller [candidate] (4.573 ms) : 0, 4573
section iast
crashtracking [baseline] (1.179 ms) : 0, 1179
crashtracking [candidate] (1.182 ms) : 0, 1182
BytebuddyAgent [baseline] (793.265 ms) : 0, 793265
BytebuddyAgent [candidate] (793.283 ms) : 0, 793283
GlobalTracer [baseline] (256.602 ms) : 0, 256602
GlobalTracer [candidate] (259.639 ms) : 0, 259639
AppSec [baseline] (32.273 ms) : 0, 32273
AppSec [candidate] (32.724 ms) : 0, 32724
Debugger [baseline] (66.638 ms) : 0, 66638
Debugger [candidate] (66.787 ms) : 0, 66787
Remote Config [baseline] (570.633 µs) : 0, 571
Remote Config [candidate] (525.737 µs) : 0, 526
Telemetry [baseline] (8.357 ms) : 0, 8357
Telemetry [candidate] (8.474 ms) : 0, 8474
Flare Poller [baseline] (3.569 ms) : 0, 3569
Flare Poller [candidate] (3.481 ms) : 0, 3481
IAST [baseline] (27.007 ms) : 0, 27007
IAST [candidate] (27.012 ms) : 0, 27012
Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.56.0-SNAPSHOT~39426c4fe6, baseline=1.59.0-SNAPSHOT~3d62379c90
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.101 s) : 0, 1100693
Total [baseline] (10.811 s) : 0, 10810708
Agent [candidate] (1.096 s) : 0, 1096093
Total [candidate] (10.819 s) : 0, 10819424
section appsec
Agent [baseline] (1.265 s) : 0, 1264669
Total [baseline] (11.06 s) : 0, 11059974
Agent [candidate] (1.27 s) : 0, 1269637
Total [candidate] (10.966 s) : 0, 10965848
section iast
Agent [baseline] (1.227 s) : 0, 1227296
Total [baseline] (11.203 s) : 0, 11202672
Agent [candidate] (1.233 s) : 0, 1232604
Total [candidate] (11.291 s) : 0, 11291381
section profiling
Agent [baseline] (1.208 s) : 0, 1207754
Total [baseline] (10.896 s) : 0, 10895673
Agent [candidate] (1.206 s) : 0, 1206433
Total [candidate] (10.868 s) : 0, 10867841
gantt
title petclinic - break down per module: candidate=1.56.0-SNAPSHOT~39426c4fe6, baseline=1.59.0-SNAPSHOT~3d62379c90
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.196 ms) : 0, 1196
crashtracking [candidate] (1.184 ms) : 0, 1184
BytebuddyAgent [baseline] (660.795 ms) : 0, 660795
BytebuddyAgent [candidate] (656.763 ms) : 0, 656763
GlobalTracer [baseline] (287.025 ms) : 0, 287025
GlobalTracer [candidate] (287.876 ms) : 0, 287876
AppSec [baseline] (33.214 ms) : 0, 33214
AppSec [candidate] (33.054 ms) : 0, 33054
Debugger [baseline] (67.698 ms) : 0, 67698
Debugger [candidate] (67.425 ms) : 0, 67425
Remote Config [baseline] (624.295 µs) : 0, 624
Remote Config [candidate] (606.174 µs) : 0, 606
Telemetry [baseline] (8.928 ms) : 0, 8928
Telemetry [candidate] (8.996 ms) : 0, 8996
Flare Poller [baseline] (5.373 ms) : 0, 5373
Flare Poller [candidate] (4.499 ms) : 0, 4499
section appsec
crashtracking [baseline] (1.173 ms) : 0, 1173
crashtracking [candidate] (1.166 ms) : 0, 1166
BytebuddyAgent [baseline] (690.5 ms) : 0, 690500
BytebuddyAgent [candidate] (692.086 ms) : 0, 692086
GlobalTracer [baseline] (259.075 ms) : 0, 259075
GlobalTracer [candidate] (262.564 ms) : 0, 262564
IAST [baseline] (24.43 ms) : 0, 24430
IAST [candidate] (24.753 ms) : 0, 24753
AppSec [baseline] (173.65 ms) : 0, 173650
AppSec [candidate] (173.461 ms) : 0, 173461
Debugger [baseline] (66.74 ms) : 0, 66740
Debugger [candidate] (66.685 ms) : 0, 66685
Remote Config [baseline] (762.979 µs) : 0, 763
Remote Config [candidate] (678.48 µs) : 0, 678
Telemetry [baseline] (9.417 ms) : 0, 9417
Telemetry [candidate] (9.265 ms) : 0, 9265
Flare Poller [baseline] (3.652 ms) : 0, 3652
Flare Poller [candidate] (3.58 ms) : 0, 3580
section iast
crashtracking [baseline] (1.176 ms) : 0, 1176
crashtracking [candidate] (1.187 ms) : 0, 1187
BytebuddyAgent [baseline] (794.124 ms) : 0, 794124
BytebuddyAgent [candidate] (796.034 ms) : 0, 796034
GlobalTracer [baseline] (257.371 ms) : 0, 257371
GlobalTracer [candidate] (259.794 ms) : 0, 259794
IAST [baseline] (27.057 ms) : 0, 27057
IAST [candidate] (27.015 ms) : 0, 27015
AppSec [baseline] (34.055 ms) : 0, 34055
AppSec [candidate] (35.146 ms) : 0, 35146
Debugger [baseline] (65.647 ms) : 0, 65647
Debugger [candidate] (65.382 ms) : 0, 65382
Remote Config [baseline] (576.918 µs) : 0, 577
Remote Config [candidate] (541.774 µs) : 0, 542
Telemetry [baseline] (8.434 ms) : 0, 8434
Telemetry [candidate] (8.511 ms) : 0, 8511
Flare Poller [baseline] (3.572 ms) : 0, 3572
Flare Poller [candidate] (3.456 ms) : 0, 3456
section profiling
crashtracking [baseline] (1.214 ms) : 0, 1214
crashtracking [candidate] (1.218 ms) : 0, 1218
BytebuddyAgent [baseline] (703.468 ms) : 0, 703468
BytebuddyAgent [candidate] (701.212 ms) : 0, 701212
GlobalTracer [baseline] (222.316 ms) : 0, 222316
GlobalTracer [candidate] (224.397 ms) : 0, 224397
AppSec [baseline] (32.473 ms) : 0, 32473
AppSec [candidate] (32.298 ms) : 0, 32298
Debugger [baseline] (68.205 ms) : 0, 68205
Debugger [candidate] (67.858 ms) : 0, 67858
Remote Config [baseline] (683.133 µs) : 0, 683
Remote Config [candidate] (601.76 µs) : 0, 602
Telemetry [baseline] (8.888 ms) : 0, 8888
Telemetry [candidate] (8.802 ms) : 0, 8802
Flare Poller [baseline] (3.77 ms) : 0, 3770
Flare Poller [candidate] (3.651 ms) : 0, 3651
ProfilingAgent [baseline] (96.789 ms) : 0, 96789
ProfilingAgent [candidate] (96.662 ms) : 0, 96662
Profiling [baseline] (97.364 ms) : 0, 97364
Profiling [candidate] (97.233 ms) : 0, 97233
LoadParameters
See matching parameters
SummaryFound 0 performance improvements and 1 performance regressions! Performance is the same for 19 metrics, 16 unstable metrics.
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.56.0-SNAPSHOT~39426c4fe6, baseline=1.59.0-SNAPSHOT~3d62379c90
dateFormat X
axisFormat %s
section baseline
no_agent (1.17 ms) : 1158, 1181
. : milestone, 1170,
iast (3.127 ms) : 3086, 3167
. : milestone, 3127,
iast_FULL (5.764 ms) : 5708, 5821
. : milestone, 5764,
iast_GLOBAL (3.52 ms) : 3469, 3570
. : milestone, 3520,
profiling (2.051 ms) : 2033, 2070
. : milestone, 2051,
tracing (1.796 ms) : 1781, 1812
. : milestone, 1796,
section candidate
no_agent (1.192 ms) : 1179, 1204
. : milestone, 1192,
iast (3.124 ms) : 3086, 3162
. : milestone, 3124,
iast_FULL (5.985 ms) : 5924, 6045
. : milestone, 5985,
iast_GLOBAL (3.575 ms) : 3520, 3630
. : milestone, 3575,
profiling (1.962 ms) : 1945, 1979
. : milestone, 1962,
tracing (1.824 ms) : 1809, 1839
. : milestone, 1824,
Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.56.0-SNAPSHOT~39426c4fe6, baseline=1.59.0-SNAPSHOT~3d62379c90
dateFormat X
axisFormat %s
section baseline
no_agent (18.06 ms) : 17876, 18244
. : milestone, 18060,
appsec (18.082 ms) : 17896, 18268
. : milestone, 18082,
code_origins (17.586 ms) : 17412, 17759
. : milestone, 17586,
iast (17.532 ms) : 17355, 17709
. : milestone, 17532,
profiling (18.996 ms) : 18811, 19182
. : milestone, 18996,
tracing (17.666 ms) : 17487, 17845
. : milestone, 17666,
section candidate
no_agent (19.179 ms) : 18979, 19378
. : milestone, 19179,
appsec (18.255 ms) : 18074, 18437
. : milestone, 18255,
code_origins (17.886 ms) : 17710, 18062
. : milestone, 17886,
iast (17.333 ms) : 17160, 17506
. : milestone, 17333,
profiling (19.735 ms) : 19535, 19935
. : milestone, 19735,
tracing (17.623 ms) : 17447, 17799
. : milestone, 17623,
DacapoParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics. Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.56.0-SNAPSHOT~39426c4fe6, baseline=1.59.0-SNAPSHOT~3d62379c90
dateFormat X
axisFormat %s
section baseline
no_agent (1.467 ms) : 1456, 1479
. : milestone, 1467,
appsec (3.647 ms) : 3432, 3861
. : milestone, 3647,
iast (2.199 ms) : 2134, 2264
. : milestone, 2199,
iast_GLOBAL (2.25 ms) : 2185, 2315
. : milestone, 2250,
profiling (2.071 ms) : 2017, 2125
. : milestone, 2071,
tracing (2.034 ms) : 1983, 2085
. : milestone, 2034,
section candidate
no_agent (1.467 ms) : 1456, 1479
. : milestone, 1467,
appsec (3.642 ms) : 3428, 3857
. : milestone, 3642,
iast (2.195 ms) : 2130, 2260
. : milestone, 2195,
iast_GLOBAL (2.245 ms) : 2179, 2310
. : milestone, 2245,
profiling (2.069 ms) : 2015, 2123
. : milestone, 2069,
tracing (2.039 ms) : 1988, 2090
. : milestone, 2039,
Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.56.0-SNAPSHOT~39426c4fe6, baseline=1.59.0-SNAPSHOT~3d62379c90
dateFormat X
axisFormat %s
section baseline
no_agent (14.923 s) : 14923000, 14923000
. : milestone, 14923000,
appsec (14.98 s) : 14980000, 14980000
. : milestone, 14980000,
iast (18.275 s) : 18275000, 18275000
. : milestone, 18275000,
iast_GLOBAL (17.659 s) : 17659000, 17659000
. : milestone, 17659000,
profiling (15.003 s) : 15003000, 15003000
. : milestone, 15003000,
tracing (14.643 s) : 14643000, 14643000
. : milestone, 14643000,
section candidate
no_agent (15.529 s) : 15529000, 15529000
. : milestone, 15529000,
appsec (14.758 s) : 14758000, 14758000
. : milestone, 14758000,
iast (18.116 s) : 18116000, 18116000
. : milestone, 18116000,
iast_GLOBAL (18.091 s) : 18091000, 18091000
. : milestone, 18091000,
profiling (14.988 s) : 14988000, 14988000
. : milestone, 14988000,
tracing (14.623 s) : 14623000, 14623000
. : milestone, 14623000,
|
Kafka / producer-benchmarkParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 3 metrics, 0 unstable metrics. See unchanged results
|
Kafka / consumer-benchmarkParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 3 metrics, 0 unstable metrics. See unchanged results
|
| } | ||
|
|
||
| // add a new value to cache | ||
| synchronized (CACHE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a ConcurrentHashMap and use computeIfAbsent will be enough without requiring a full synchronized block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! I've updated the code.
| private final ConcurrentHashMap<String, SchemaSampler> schemaSamplers; | ||
| private static final ThreadLocal<String> serviceNameOverride = new ThreadLocal<>(); | ||
|
|
||
| // contains a list of active extractors by type. It is not thread safe, but it's populated only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is still going to leave us open to reordering issues.
I'm wondering if it would be better to just make the variable volatile and set it from a Map produced in the background thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the PR.
| return id; | ||
| } | ||
|
|
||
| public Long getTimestamp() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we boxing the primitives?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks!
| return timestamp; | ||
| } | ||
|
|
||
| public Integer getCheckpointId() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why box the int?
| } | ||
|
|
||
| public static byte[] getCheckpointIdCacheBytes() { | ||
| return CACHE_BYTES.clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that there is no need to clone this array since when it's changed, a new reference is assigned to it so the references that are returned are not directly modified
|
|
||
| protected abstract int status(RESPONSE response); | ||
|
|
||
| protected String getRequestHeader(REQUEST request, String key) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do tend to agree. In general, I tend to strongly prefer methods be either abstract or final, so that implementors have to make a conscious choice. That said - since it just impacts DSM, I'll let DSM make the final decision.
| ] | ||
| }, | ||
| { | ||
| "name" : "datadog.trace.agent.core.datastreams.DataStreamsTransactionExtractors$DataStreamsTransactionExtractorAdapter", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an aside, it would be nice if we generated this using annotation processing, but that's more a task for platform or SDK.
|
|
||
| @Override | ||
| protected String getRequestHeader(final HttpExchange exchange, String key) { | ||
| return exchange.getRequestHeaders().getFirst(key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I agree, that most of the time a header is set only once. I've also seen tickets where making that assumption about HTTP inputs has caused issues, so I do share the reservations about adding this as a general mechanism.
|
|
||
| // contains a list of active extractors by type. Thread-safe via volatile with immutable | ||
| // snapshots. | ||
| private volatile Map<DataStreamsTransactionExtractor.Type, List<DataStreamsTransactionExtractor>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was actually suggesting to not set the map until the values are known.
The thread safety from volatile comes from introducing store_release and load_acquire semantics, but that only works when the map is fully populated locally and then stored into the volatile. Otherwise, you still exposed to write / write reordering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the code. Now the map is only set when there are values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do share the reservations about adding this as a general mechanism.
I will prepare a separate / followup PR to address this.
|
/merge |
|
View all feedbacks in Devflow UI.
This pull request is not mergeable according to GitHub. Common reasons include pending required checks, missing approvals, or merge conflicts — but it could also be blocked by other repository rules or settings. Use
Added to the queue but the mergequeue is not enabled for now. Use ⏳ Processing |
|
PR Blocked - Invalid Label The pull request introduced unexpected labels:
This PR is blocked until:
Note: Simply removing labels from the pull request is not enough - a maintainer must remove the label and delete this comment to remove the block. |
johannbotha
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing the thread safety issue with extractorsByType - the volatile + immutable snapshot approach looks good.
Overall LGTM! Just left a couple of minor comments on the pending items but nothing blocking.
| // update cache bytes | ||
| byte[] checkpointBytes = checkpoint.getBytes(); | ||
| byte[] bytesToAdd = new byte[checkpointBytes.length + 2]; | ||
| bytesToAdd[0] = (byte) id; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This truncates to 0-255, that will potentially cause a collision if someone has more than 256 unique checkpoint names. Should we prevent the confusion in the future by making that potential overflow more obvious?
| } | ||
|
|
||
| public void clear() { | ||
| size = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you also mean to reset the buffer size here as well?
It gets reset after 10 seconds, but it seems like the intention might be to free up the memory here.
|
/merge |
|
View all feedbacks in Devflow UI.
PR already in the queue with status queued |
What Does This Do
Initial transaction tracking implementation.
Motivation
Multiple customers mentioned the need to track individual message across multiple service and environments and log messages which haven't reached the end of the pre-defined pipeline.
See this document for details.
Additional Notes
Transaction tracking is currently in closed beta, so the configuration flags will be documented after public availability.
Technical details
trackTransactionmethod can be used to add transaction information to the DSM payload.Transactions are accumulated in the in-mem container until they reach 512kb in size or until the DSM flush loop executes.
All http requests / kafka produce and consume operations now check for DSM extractors and apply them if any were registered. The overhead for customers without DSM / Extractors should not be noticeable, as we only execute 1 additional EnumMap lookup.