drm/amdgpu: Forward soft recovery errors to userspace

commit 829798c789 upstream.

As we discussed before[1], soft recovery should be
forwarded to userspace, or we can get into a really
bad state where apps will keep submitting hanging
command buffers cascading us to a hard reset.

1: https://lore.kernel.org/all/bf23d5ed-9a6b-43e7-84ee-8cbfd0d60f18@froggi.es/
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 434967aadb)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This commit is contained in:
Joshua Ashton
2024-03-07 19:04:31 +00:00
committed by Greg Kroah-Hartman
parent 70275bb960
commit c28d207edf
+1 -2
View File
@@ -262,9 +262,8 @@ amdgpu_job_prepare_job(struct drm_sched_job *sched_job,
struct dma_fence *fence = NULL;
int r;
/* Ignore soft recovered fences here */
r = drm_sched_entity_error(s_entity);
if (r && r != -ENODATA)
if (r)
goto error;
if (!fence && job->gang_submit)