Hello, the featureyou are requesting was already analyzed by our team with the following results.
There are two ways how to achieve this - either by video/image analysis (this includes ARCore and other AR frameworks) or by reading phone sensors (accelerometer).
First option is precise, but had side effects - fast battery drain and also degraded video quality (low birate) due to CPU being overloaded. This is becoming less of an issue with better phones and better encoding support in browsers, so we will probably re-open this option in the future.
Second option had issues with cumulative error that led to the position being imprecise over time.
That is why we decided to implement the snapshot where you can “freeze” the image and do the annotations there easily and precisely without side effects. You can also enlarge the video in the lower right corner, so you can still see what the client is doing.