Replies: 3 comments 2 replies
-
|
Since you closed without further comments, anything else to add here? You have a lot of io-wq contention, which is most likely due to either the bigger write sizes or just the storage and file system used. You will probably see better performance if you have the threads share the io-wq backend, potentially, using |
Beta Was this translation helpful? Give feedback.
-
|
Tried running it with 20 threads, each with their own ring connected to a single ring using |
Beta Was this translation helpful? Give feedback.
-
|
I ran the benchmark with different number of threads and queue depths to see how many worker threads it creates (using
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm using liburing via Java Foreign function interface to do random writes across a set of 2200 files. I am seeing some unexpected high CPU usage compared to
fio, and java's filechannel. I was hoping someone could give me some pointers to where to look why this happens.Setup:
IORING_SETUP_SINGLE_ISSUER,IORING_SETUP_COOP_TASKRUN,IORING_SETUP_DEFER_TASKRUN6.14.0-27-genericScaling behavior:
My bindings scale well to 4 threads but each thread doesn't add as much performance as the previous one and after 8 threads it starts performing worse.
Using
perf record -e 'lock:*' -g --call-graph=dwarf -F 997 -p 23523i am seeing some contention:which kind of explains the iostat i am seeing... my bindings:
Vs Fio
Fio job:
I understand Java bindings won’t match fio’s raw speed, but I expected the profile to be more I/O-bound like fio, not the opposite. In essence each benchmark thread in java is doing the following:
I tried different versions of the previous example submitting less/more often, peeking and waiting for n CQE's different queue sizes but nothing seems to make it perform more like fio. I also created a version of this with random reads but that doesn't suffer from a scaling issue.
I’m looking for guidance on how to structure multithreaded buffered I/O with shared files in a way that avoids the CPU bottlenecks I’m seeing.
Any insight into how best to mitigate it would be appreciated.
Thanks
UPDATE:
Some perf top results:
Single threaded:
6 threads:
Beta Was this translation helpful? Give feedback.
All reactions